Transforming English

Jed Rembold

September 19, 2025

Announcements

  • Problem Set 3 due Monday night!
    • You should have everything you need after today
  • Project 1: Wordle guide out by the end of tomorrow
    • No worries if you are still working on PS3 this weekend, that is the expectation
    • But try to make sure PS3 gets in on time so that you can get a good start on Wordle
  • I have several long meetings this afternoon unfortunately, so will likely be hard to catch in my office
  • Polling: polling.jedrembold.prof

Sequence Operations

Character Picking

  • A string is an ordered sequence of characters
    • Character positions in the string are identified by an index, which starts at 0

  • You can select individual characters from the string using the syntax

    |||string|||[|||k|||]

    where |||string||| is the desired string (or variable assigned to a string) and |||k||| is the index integer of the character you want

    >>> print("spaghetti sauce"[5])
    e

Back it Up

  • Sometimes it is more useful to count from the end of the string, not the beginning
  • Python gives you a convenient way to do this, using negative indexes


  • A common use case is to grab the last character of the string, using

    |||string|||[-1]

    which is shorthand for

    |||string|||[len(|||string|||)-1]

Slicing

  • Often, you may want more than a single character

  • Python allows you to specify a starting and an ending index through an operation known as slicing

  • The syntax looks like:

    |||string|||[|||start||| : |||limit|||]

    where |||start||| is the first index to be included and everything up to but not including the |||limit||| is included

  • |||start||| and |||limit||| are actually optional (but the : is not)

    • If |||start||| omitted, the slice will begin at the start of the string
    • If |||limit||| omitted, the slice will proceed to the end of the string

and Dicing

  • Can add a third component to the slice syntax, called a stride

    |||string|||[|||start||| : |||limit||| : |||stride|||]
  • Specifies how large the steps are between each included index

  • Can also make the stride negative to proceed backwards through a string

    >>> s = "spaghetti sauce"
    >>> s[4:8]
    hett
    >>> s[10:]
    sauce
    >>> s[:10:2]
    sahti

Understanding Check!

Suppose you have the string x = "consternation" and you’d like to just extract and print the word "nation". Which expression below will not give you the string "nation"?

  1. x[7:len(x)]
  2. x[7:]
  3. x[-6:len(x)]
  4. x[-6:-1]

Comparing Strings

  • Python lets you use normal comparison operators to compare strings

    string1 == string2

    is true if string1 and string2 contain the same characters in the same order

  • Comparisons involving greater than or less than are done similar to alphabetical ordering

    • Start at the beginning and compare a character. If they are the same, then compare the next character, etc
  • All comparisons are done according to their Unicode values.

    • Called lexicographic ordering
    • "cat" > "CAT"

Comparing Substrings

  • Sometimes you want to check if a string shows up as a piece of another string

    • You do have the tools to do this manually:

      def check_if_substr(part, whole):
          L = len(part)
          for i in range(len(whole) - L):
              if whole[i:i+L] == part:
                  return True
          return False
  • Thankfully, Python offers you the in keyword, which accomplishes the same thing, but much simpler:

    if |||part||| in |||sequence|||:
        |||your code here|||

String Looping

  • Because strings are a sequence, they will work directly in a for loop!
  • In general then, you have two options of how you want to loop through a string:
s = "hello"
for i in range(len(s)):
    print(s[i], 'is letter', i)
s = "hello"
for letter in s:
    print(letter)
  • Looping by letter can be very convenient, but you lose positional information
  • You can always select the desired letter if looping by index

Changing Strings

Can’t change a string’s colors

  • Strings are what we call immutable: they can not be modified in place once created.

  • You can “look” at different parts of the string, but you can not “change” those parts without making a whole new string

    s = "Cats!"
    s[0] = "R"   # THIS WILL ERROR!!
  • You can of course create a new string object with the desired traits:

    s = "R" + s[1:]
  • This applies to all functions that act on strings as well: they return a new string, they do not modify the original

Receiver Syntax

  • So far, all operations between or on objects have used symbols to indicate the operation

    • The + sign, for instance
  • Going forward, we will begin to see examples of operations on objects that use receiver syntax

  • In receiver syntax, we specify the object to act on, followed by a dot and then a predefined function (called a method here) name

    |||object|||.|||method_name|||()
    • This is like you are running this special function on the object, so you need the () at the end
    • Some methods also allow arguments, to influence exactly how the operation will proceed

Transforming Methods

Method Description
|||string|||.lower() Returns a copy of |||string||| with all letters converted to lowercase
|||string|||.upper() Returns a copy of |||string||| with all letters converted to uppercase
|||string|||.capitalize() Returns a copy of |||string||| with the first character capitalized and the rest lowercase
|||string|||.strip() Returns a copy of |||string||| with whitespace and non-printing characters removed from both ends
|||string|||.replace(|||old|||, |||new|||) Returns a copy of |||string||| with all instances of |||old||| replaced by |||new|||

Classifying Character Methods

Method Description
|||char|||.isalpha() Returns True if |||char||| is a letter
|||char|||.isdigit() Returns True if |||char||| is a digit
|||char|||.isalnum() Returns True if |||char||| is letter or a digit
|||char|||.islower() Returns True if |||char||| is a lowercase letter
|||char|||.isupper() Returns True if |||char||| is an uppercase letter
|||char|||.isspace() Returns True if |||char||| is a whitespace character (space, tab, or newline)
|||char|||.isidentifier() Returns True if |||char||| is a legal Python identifier

String I/O

F not G

  • Constructing text or a sentences by interleaving strings and other objects comes up a lot in communicating code results to a user

  • For any Python version past 3.6, the nicest and easiest way to do this is with what are called f-strings:

    A = 10
    print(f"The value of A is: {A}!")
  • You can define an f-string anytime you would normally define a string, just be aware that the substitution happens with the values of variable at that point

    A = 10
    s = f"The value of A is {A}"
    A = 12
    print(s)

Getting some input

  • We’ve seen how to display information to a user, but to retrieve data from a user, we can use Python’s built-in input() function

  • The form will generally look like:

    |||variable||| = input(|||prompt_text|||)
    • |||variable||| is the variable name you want to assign the user’s typed input to
    • |||prompt_text||| is the string that will be displayed on the screen to communicate to the user what they should be doing
  • The input() function always returns a string

    • If you want to get an integer from the user, you will need to convert it yourself after retrieving it

      num = int(input('Pick a number between 1 and 10: '))

Working with English

The english.py Library

  • To facilitate working with English words, we can take advantage of the pre-written english module
    • This will be highly useful in the Wordle project!
  • The english module exports two resources:
    • ENGLISH_WORDS: a constant sequence which contains all the valid English words in alphabetical order
    • is_english_word(): a predicate function which takes a string as input and returns True or False depending on if that string is a valid English word

Biggest No-vowel Word

  • Suppose we wanted to determine the longest word in the English language without vowels:

    from english import ENGLISH_WORDS
    
    def find_first_vowel(word):
        for i in range(len(word)):
            if word[i].lower() in "aeiou":
                return i
        return -1
    
    def find_longest_no_vowels():
        best_length = 0
        for word in ENGLISH_WORDS:
            vowel_loc = find_first_vowel(word)
            if vowel_loc == -1 and len(word) > best_length:
                best_length = len(word)
                print(word)
    
    
    if __name__ == '__main__':
        find_longest_no_vowels()

Igpay Atinlay

  • Suppose we wanted to write a script that converted English to Pig Latin
  • Rules of Pig Latin:
    • If the word begins with a consonant, move everything up to the first vowel to the end and append on “ay” at the end
      fleeteetflay
    • If the word starts with a vowel, just append “way” to the end
      orangeorangeway
    • If the word has no vowels, do nothing
  • Our decomposition:
    • Find first vowel
    • Convert a single word

Indingfay Owelsvay

def find_first_vowel_index(word):
    """
    Find the first vowel in a word and return its index,
    or return None if no vowels found.
    """
    for i in range(len(word)):
        index = "aeiou".find(word[i].lower())
        if index != -1:
            return i
    return None

Onvertcay Oneway Ordway

def word_2_pig_latin(word):
    """
    Convert a single word with no special characters from
    English to Pig Latin.
    """
    vowel = find_first_vowel_index(word)
    if vowel is None:
        return word
    elif vowel == 0:
        return word + "way"
    else:
        return word[vowel:] + word[:vowel] + "ay"

When Pig Latin = English?

  • What about when the Pig Latin version of a word is a (different) but valid English word?
  • Lets not count words with no vowels, since clearly they would qualify
def platin_equals_english():
    count = 0
    for word in ENGLISH_WORDS:
        platin = word_2_pig_latin(word)
        if is_english_word(platin) and word != platin:
            print(word, ":", platin)
            count += 1
    return count
// reveal.js plugins