Getting Literate: Reading and Writing

Jed Rembold

October 18, 2021

Announcements

  • Project 2 due tonight!
  • Graphical Contest deadline is a week from Friday
  • Midterm is on Friday!
    • Study materials and practice tests have been posted
    • Section meetings this week will be ready to talk about specific problems, but you need to have started them to know what problems you have questions about.
    • Test is open-note and printouts of code you’ve written. Just no computers.
    • I’ll include the Python summary pdf with the test.
    • Will be written for just an hour, but I’ll let you use the lab hour afterwards if you need time to finish up.
  • Polling: rembold-class.ddns.net

Review!

What is the second element (index 1) in the below list?

[i * 4 for i in "Oct 18, 2020" if not i.isalpha()]


  1. 16
  2. "8888"
  3. "cccc"
  4. "1111"

Where Strings and Lists Meet

  • There are a handful of methods that provide interactions between strings and lists
Method Description
str.split() Splits the string str into a list of its components using whitespace as a separator
str.split(sep) Splits the string str into a list using the specified separator sep
str.splitlines() Splits the string str into a list of strings at the newline character
str.join(list) Joins the (string) elements of the list into a string, using str as the separator

Reading

  • Programs often need to work with lists that are too large to reasonably exist typed all out in the code
    • Easier to read in the values of a list from some external data file
  • A file is the generic name for any named collection of data maintained on some permanent storage media attached to a computer
  • Files can contain information of many different types and encodings
    • Most common is the text file
    • Contains character data like you’d find in a string

Strings vs Text Files

  • While strings and text files both store characters, there are some important differences:
    • The longevity of the data stored
      • The value of a string variable lasts only as long as the string exists, is not overridden, or is not thrown out when a function completes
      • Information in a text file exists until the file is deleted
    • How data is read in
      • You have access to all the characters in a string variable pretty much immediately
      • Data from text files is generally read in sequentially, starting from the beginning and proceeding until the end of the file is reached

Reading Text Files

  • The general approach for reading a text file is to first open the file and associate that file with a variable, commonly called its file handle
  • We will also use the with keyword to ensure that Python cleans up after itself (closes the file) when we are done with it (Many of us could use a with irl)
with open(filename) as file_handle:
    # Code to read the file using the file_handle
  • Python gives you several different ways to actually read in the data
    • read reads the entire file in as a string
    • readline or readlines reads a single line or lines from the file
    • read alongside splitlines gets you a list of line strings
    • Can use the file handle as an iterator to loop over

Entire file ⟶ String

  • The read method reads the entire file into a string, with includes newline characters to mark the end of lines
  • Simple, but can be cumbersome to work with in some cases, and, for large files, can take a large amount of memory

As an example, the file:

One fish
two fish
red fish
blue fish

would get read as

"One fish\ntwo fish\nred fish\nblue fish"

Line by Line

  • Of the ways to read the file in a string at a time, using the file handler as an iterator and looping is probably best and certainly most flexible
  • Leads to code that looks like:
with open(filename) as f:
    for line in f:
        # Do something with the line
  • Note that most strategies preserve the newline character, which you very likely do not want, so be ready to strip them out before doing more processing

Powers Combined

  • So long as your files are not gigantic, using read and then the splitlines method can be a good option
  • This does remove the newline characters, since it splits the string at them
with open(filename) as f:
    lines = f.read().splitlines()
# Then you can do whatever you want with the list of lines

Example: Name Mangling

  • Let’s look at an example with some more meat to it
  • I have a text file with all your first names. I’d like to:
    • Read in the names
    • Select two at random
    • Combine the first half of one name with the second half of the other
    • Print out both potential hybrid names
  • We’ll practice breaking a problem into steps along the way here

Aren’t you Exceptional

  • When opening a file for reading, it is possible the file does not exist!
    • Python handles this (and many other potential errors that can arise) using a mechanism called exception handling
    • Common in other languages as well
  • An exception is a object that belongs to an overarching hierarchy of exception classes
    • Different classes/types for different purposes
    • File operations, for example, use the exception class IOError
  • If open encounters an error, it reports the error by raising an exception with IOError as its type.
    • Raising an exception generally immediately terminates your program, but sometimes that is undesirable

Ignore Yoda, there is a try

  • Python uses the try statement to indicate an interest in trying to handle a possible exception
  • In simplest form, the syntax is:
try:
    # Code that may cause an exception
except type_of_exception:
    # Code to handle that type of exception
  • type_of_exception here is the class name of the exception being handled
    • IOError for the file reading errors we are discussing
  • Any exceptions arising from within the try block or within functions called within the try block would be “caught” and the lower block of code run instead of terminating the program

Example: Requesting Existing File

  • As an example, the below function will repeatedly ask the user to supply a file name that actually exists.
  • It will not just immediately break should they give it an invalid filename!
def get_existing_file(prompt="Input a filename: "):
    while True:
        filename = input(prompt)
        try:
            with open(filename):
                return filename
        except IOError:
            print("That filename is invalid!")
  • If the open call succeeds, we immediately just return the filename, but if it fails due to a IOError, we display a message and then keep asking
// reveal.js plugins