Stringy Pythons

Jed Rembold

Sept 17, 2025

Announcements

  • Homework
    • Problem Set 3 posted!
      • After today you should be good to get started on most of the problems (and you should!)
    • I probably won’t give feedback on PS2 until the weekend
  • Upcoming Project 1: Wordle!
    • Aiming to release the guide for this on Friday
    • If ahead on PS3, you could get an earlier start
  • Don’t forget to attend your section today or tomorrow!
  • TechBytes tomorrow! This room at noon!
  • Polling: polling.jedrembold.prof

Review Question

How could you represent the number of items shown to the right in a binary representation?

  1. 1011
  2. 10110
  3. 10010
  4. 11010

Even More Bases

Other Bases

  • Binary is not a particularly compact representation to write out, so computer scientists will often use more compact representations as well
    • Octal (base 8) uses the digits 0 to 7
    • Hexadecimal (base 16) uses the digits 0 to 9 and then the letters A through F

  • Why octal or hexadecimal over our trusty old decimal system?
    • Both are powers of 2, so it makes it easy to convert back to decimal
      • 1 octal digit = 3 binary digits, 1 hex digit = 4 binary digits

Base(ic) Practice

  • The Java compiler has a fun quirk where every binary file is produces begins with


  • What is this in octal? hexadecimal?

Representation Matters

Representation

  • Sequences of bits have no intrinsic meaning!
    • Just the representations we assign to them by convention or by building certain operations into hardware
    • A 32-bit sequence represents an integer only because we have designed hardware to manipulate those sequences arithmetically: applying operations like addition, subtraction, etc
  • By choosing an appropriate representation, you can use bits to represent any value you could imagine!
    • Characters represented by numeric character codes
    • Floating-point representations to support real numbers
    • Two-dimensional arrays of bits representing images
  • To be useful though, everyone needs to agree on a representation!

Representation Pitfalls

  • How we choose to represent values has consequences!
  • Python represents floating point (fractional) numbers using two integers
    • One to represent the significant digits
    • One to represent the exponent (where the decimal place is)
  • \(1\frac{1}{4}\) Example
    • In decimal: \(\quad\displaystyle 1\frac{1}{4} = \frac{1}{1} + \frac{2}{10} + \frac{5}{100} = 1.25 = (125, -2)\)
    • In binary: \(\quad\displaystyle 1\frac{1}{4} = \frac{1}{1} + \frac{0}{2} + \frac{1}{4} = 1.01 = (101, -10)\)

Floating Binary

  • Say we wanted to convert the value \(\tfrac{7}{8}\) to a binary floating point representation: \[\frac{7}{8} = \frac{0}{1} + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} = 0.111 = (111, -11)\]
  • Now how would we convert \(\frac{1}{10}\) to binary??
    • We run into a problem! An infinitely repeating sequence! \[\frac{1}{10} = \frac{0}{1} + \frac{0}{2} + \frac{0}{4} + \frac{0}{8} + \frac{1}{16} + \frac{1}{32} + \frac{0}{64} + \frac{0}{128} + \frac{1}{256} + \cdots = 0.0001100110011\ldots\]
    • Have to stop the sequence somewhere and approximate it: \[\frac{3}{32} = 0.09375\quad\text{or}\quad\frac{25}{256} = 0.09765625\]

Consequences

  • The best we can do within the range of normal integers \[\frac{3602879701896397}{2^{55}} = 0.10000000000000000555111512312578270\]
  • When doing operations on these numbers, extra decimals will sometimes get rounded off, suddenly making the number look precise, but you might always have a tiny bit of this rounding error showing up in floating point values.
  • So be careful using == for floating numeric comparisons! Rounding might result in unexpected falsehoods
    • 0.1 + 0.1 + 0.1 != 0.3
    • Far better to check if two numbers are within a small margin of one another, or greater or less than the other

Representing Characters

How to Encode Text?

  • We use numeric encodings to represent character data inside the machine, where each character is assigned an integer value.
  • Character codes are not very useful unless standardized though!
    • Competing encodings in the early years made it difficult to share data across machines
  • First widely adopted character encoding was ASCII (American Standard Code for Information Interchange)
  • Originally just with 128 possible characters, even after expanding to 256, ASCII proved inadequate in the international world, and has therefore been superseded by Unicode.

ASCII

image/svg+xml 0 1 2 3 4 5 6 7 8 9 A B C D E F 0x 1x 2x 3x 4x 5x 6x 7x \0 \b \t \n \v \f \r ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
The ASCII subset of Unicode

Abstract Strings

  • Characters (and their Unicode representation) are most often used in programming when combined to make collections of consecutive characters called strings.
  • Internally, strings are stored as a sequence of characters in a sequential chunk of memory.
  • You don’t have to (and generally don’t want to) think of the internal representation.
    • Better to think of the string as a single abstract unit
  • Python emphasizes this abstract view by defining a built-in string data type that already defines a selection of higher-level operations on string objects

A String primer

  • A string in Python is a data type that represents textual data, in the form of a sequence of individual characters
    • Domain: all possible sequences of characters
    • Operations: Many! But we’ll keep in quite simple initially
  • Denoted by placing the desired sequence of characters between two quotation marks
    • 'I am a string'
    • In Python, either single or double quotes can be used, but the ends must match
      • "I am also a string!"
      • "I'm sad you've gone"

Stringy Operations

Meeting chr and ord

  • Python includes two build-in functions to simplify conversion between an integer and the corresponding Unicode character
  • chr takes a base-10 integer and returns the corresponding Unicode character as a string
    • chr(65) gives "A" (capital A)
    • chr(960) gives "π" (Greek letter pi)
  • ord goes the other direction, taking a single character string and returning the corresponding base-10 integer of that character in Unicode
    • ord("B") gives 66
    • ord(" ") gives 32
    • ord("π") gives 960

Lengths

  • The number of characters in a string is commonly called its length

  • The length of a string can be found using the build-in function len()

    >>> len("Totally awesome")
    15
  • In practiced, this function works for any sequence, of which range is also an example. I could also say len(range(5)) for instance.

Concatenation

  • Concatenation is the act of taking two separate objects and bringing them together to create a single new object

  • For strings, concatenation takes the contents of one string and adds them to the end of another string

  • Python overloads the + operator to concatenate sequences like strings

    • This is why keeping track of variable types is important! + will add two numbers, but will concatenate two strings!
    >>> 'fish' + 'sticks'
    'fishsticks'
  • Unlike in addition, order matters here!

    >>> 'sticks' + 'fish'
    'sticksfish'

Repeat again?

  • We’ve seen how we can use addition (+) in Python to concatenate strings
  • In math, adding something many times is the same as multiplying

\[5+5+5+5+5+5 = 6 \times 5\]

  • The same logic holds true for Python strings!
    • You multiply by a integer: the number of times you want the concatenation repeated

      print("Betelguese, " * 3)
    • That this works is just a cute instance of shared logic for this use-case, and does not extend further. You can not multiply two strings together, Python will not understand what you are trying to do

Character Picking

  • A string is an ordered sequence of characters
    • Character positions in the string are identified by an index, which starts at 0

  • You can select individual characters from the string using the syntax

    |||string|||[|||k|||]

    where |||string||| is the variable assigned to the desired string and |||k||| is the index integer of the character you want

    >>> print("spaghetti sauce"[5])
    e

Back it Up

  • Sometimes it is more useful to count from the end of the string, not the beginning
  • Python gives you a convenient way to do this, using negative indexes


  • A common use case is to grab the last character of the string, using

    s[-1]

    which is shorthand for

    s[len(s)-1]

Slicing

  • Often, you may want more than a single character

  • Python allows you to specify a starting and an ending index through an operation known as slicing

  • The syntax looks like:

    |||string|||[|||start||| : |||limit|||]

    where |||start||| is the first index to be included and everything up to but not including the |||limit||| is included

  • |||start||| and |||limit||| are actually optional (but the : is not)

    • If |||start||| omitted, the slice will begin at the start of the string
    • If |||limit||| omitted, the slice will proceed to the end of the string

and Dicing

  • Can add a third component to the slice syntax, called a stride

    |||string|||[|||start||| : |||limit||| : |||stride|||]
  • Specifies how large the steps are between each included index

  • Can also make the stride negative to proceed backwards through a string

    >>> s = "spaghetti sauce"
    >>> s[4:8]
    hett
    >>> s[10:]
    sauce
    >>> s[:10:2]
    sahti

Comparing Strings

  • Python lets you use normal comparison operators to compare strings. For example,

    |||string 1||| == |||string 2|||

    is true if |||string 1||| and |||string 2||| contain the same characters in the same order

  • Comparisons involving greater than or less than are done similar to alphabetical ordering

    • Start at the beginning and compare a character. If they are the same, then compare the next character, etc
  • All comparisons are done according to their Unicode values.

    • Called lexicographic ordering
    • "cat" > "CAT"

String Looping

  • Because strings are a sequence, they will work directly in a for loop!
  • In general, you have two options of how you want to loop through a string:
s = "hello"
for i in range(len(s)):
    print(s[i], 'is letter', i)
s = "hello"
for letter in s:
    print(letter)
  • Looping by letter can be very convenient, but you lose positional information
  • You can always select the desired letter if looping by index

Can’t change a string’s colors

  • Strings are what we call immutable: they can not be modified in place by clients.

  • You can “look” at different parts of the string, but you can not “change” those parts without making a whole new string

    s = "Cats!"
    s[0] = "R"   # THIS WILL ERROR!!
  • You can of course create a new string object with the desired traits:

    s = "R" + s[1:]
  • This applies to all methods that act on strings as well: they return a new string, they do not modify the original

// reveal.js plugins