CS 465 midterm review 2019

LLC 2019 Midterm Review Notes

Using these notes

click on the triangles to the left to expand sections

use the text size feature in your browser to make this text larger and more readable

these notes will be updated throughout the review sessions

About the exam

The exam will be held on Wednesday 3 April at the usual lecture time (1:50-2:50) in the usual lecture room (Ford 204).

The exam will consist of several kinds of questions; typically:

10-15 True-or-False questions at 1-2 points each

8-10 short answer questions at 4-6 points each

2-4 longer answer questions at 8-15 points each

remember to review your lecture notes, materials on the course home page, homeworks, and the labs!

you won’t have to write longer programs, but 2-3 lines of (easier) Haskell is possible

NEW! Links to labs & homeworks

Lab 1: Simple Semantics via Folds

Lab 2: Dice and Discrete Probability Distributions

Lab 3: Regular Languages Through a Haskell Lens

Homework 1: FAST System Homework

Homework 2: Numbers, Numerals & Polynomials Homework

Homework 3: Regular Languages (REs and DFAs)

… and the course syllabus

Course introduction

this is a non-traditional CS theory course, of my own design, evolving

traditional courses “climb a ladder” of formal languages & machines

this course will cover some traditional topics (but less thoroughly)

emphasis here is on interpreted formal languages, programming language theory

influences from functional programming, type theory, denotational semantics, category theory

(also structuralism, intuitionism, homotopy type theory, linear logic & ludics, Montague semantics, categorial grammar,…)

this semester deploying major new intro section on FAST type system

early emphasis on “soft” topics (history, philosophy, cultural aspects, etc.), later more technical

Language and formalism

history of language

purpose(s) of language

language presumably developed primarily for interpersonal communication, but also plays a role in consciousness (internal language), memory (e.g.. externalizing memory) and conceptual clarification

science, language, law, games, conventions, customs, protocols (patterns? rules?)

formal language: math, logic, CS, linguistics

The Spectrum of Language

nature / “stuff” of language (syntax)

the form or structure of language has dual nature: both a linear surface structure (strings), but also a recursive structure built from hierarchical phrases (terms or trees)

kinds of meanings (semantics)

from abstract (based on behavior, like interfaces), denotational (in terms of pre-understood “objects”), or operational (roughly, an “implementation” in terms of data structures)

language architecture

concrete syntax, abstract syntax, semantics (of various kinds), types & validity

object and meta languages

semiotics: the study of signs, with 3 kinds (or aspects):

icon: the sign resembles the meaning (in shape, form, etc.)

index: there is a causal connection between sign and meaning (smoke/fire, thermometer, …)

symbol: the relation between sign and meaning is arbitrary but conventional (stop signs, ‘:’ for cons)

The FAST type system

a simple, familiar system to introduce formal languages & types

builds on basic arithmetic, algebra, and some Haskell experience

the basic language of types is arithemtic: constants (n), plus (+), times (x), powers (↑ or →)

but we say: finite “sets” / enumerated types, sums, products, exponentials

there are (usually) many values associated with a given type

different interpretations: numbers, but also types, grids, logic, …

see architecture diagram here

different variations (for n, +, ×): boolean/2, numeric/n, symbolic

when viewed as abstract types, values are things like tagged choices (constructed), pairs, functions

when viewed as grids, the values are merely locations within the (shaped) grid of the type

when viewed as numbers, or rather numeric expressions, types equivalent to n have all values k<n

in all cases / interpretations, the number of values of a (numeric expression) type is just the value of the expression

types can be “flattened” down to equivalent (but not equal!) types with the same number of values

this is like evaluating the numeric expressions

equivalence of types is captured by

the “high school algebra” axioms

flattening rules: products are repeated sums; functions are repeated products

some visual examples of FAST grids and types

Sums (+), symbols and coding

sums represent the concept of choice

constants might start out binary (void type 0 and unit type 1), but we can “upgrade” to numeric or symbolic later

the void type 0 has no values; the unit type 1 has only one, say “•”

as numbers (i.e., flattened by evaluation), sums are just addition

as grids, we just have a two-element grid (or n-element, or symbol-tagged) with nested types = nested grids

to write (give, construct, express) a choice, we use a tag (followed by a value of the chosen type)

we can upgrade from binary (A+B), to numeric (A+B+…+X+Y), or symbolic (a:A+b:B+ …) “tags”

simple sums over the (void and) unit type 1 reduce to coded finite choices—we can add numeric “constant” types n, with values being just k < n (i.e., {0, …, n-1})

we can also use symbolic alphabets as types and values: types re just strings, values just “characters”

visually, we can view successive binary choices as a binary tree with values at the leaves

the code is just the path to the value

if the tree is complete, there are 2^h values, where h is the height of the tree = length o path to leaves

if the tree is not complete, some binary codes of length h will be invalid as codes

see these diagrams:

biased sums and tally codings

nested sums and binary numeral codings

Products (×), tuples and records

products represent the concept of (independent) combination

as numbers (i.e., flattened by evaluation), products are just multiplication

to write (give, construct, express) a combination, we write both parts (possibly with punctuation)

as grids, we have (e.g.) a 2-dimensional grid as the product of two 1-dimensional grids

similar but more complex for higher dimensions—see homework examples

products of sums are very useful—example: cards as products of ranks and suits

products can be ordered lexicographically–we order the “slots” in the tuple, then order the values first by the major slot, then by the minor slot

when generalized to strings, this yields alphabetical order

the Haskell Ord class does this by choosing the left slot of tuples as major, then next rightmost, etc.

products can be seen as the opposite of sums

Exponentials (finite functions: ↑ or →)

exponentials represent the concept of connection

as numbers (i.e., flattened by evaluation), exponentials are just powers

but note that A→B becomes BA

we can also usefully think of function values as grids: f: A→B is an A-shaped grid of B-values

as a grid … well, it is hard to visualize (esp. at higher-orders), but it is something like a multi-dimensional product (because exponential = iterated product)

even though hard to visualize in higher dimensions, we can see various flattenings of function space grids via functin tables (like truth tables) or listings (see immediately below)

to write (give, construct, express) a function f: A→B, we can write out all the result values (of type B) in the proper order of argument values (of type A)

a function f: A→Bool (i.e., the type 2, roughly) is like a subset of the set of values of type A

the power-set constructor ℘(S) is often written as 2S

think: pizza topping specification = list of booleans in topping-order

Numbers and numerals

numbers are the abstract meanings (semantics); numerals are the names or symbols (syntax)

natural numbers are whole numbers, starting from 0 (usually) and going up by (+1), as high as you like (i.e., without limit)

according to Peano, natural numbers are either zero, or the successor of some natural number

we can represent these as simple terms (or constructed values) built from Zero or Succ (say in Haskell)

data Nat = Zero | Succ Nat

the fold function for the Nat type replaces Succ and Zero with a function call and a value

foldn s z Zero = z

foldn s z (Succ n) = s (foldn s z n)

see code development here

we can write out numerals (symbolic codes for numbers) as tallies (using “base 1”)

tallies almost physically mimic the“piles of stones” used for early counting

we can also write out numerals for “structured numbers” using a mixed radix form, based on products-of-sums (see time format example) and lexicographic order

finally, we can write unstructured numbers as numerals using a fixed base (e.g., 2, 10, or 16)

but we normally put the most-sgnificant (= major) digits on the left, for cultural reasons

we can use Horner’s technique (written as a fold) or its reverse (written as an unfold) to convert numerals to numbers (and vice versa)

we have to use left folds and unfolds (foldl and unfoldl) due to the cultural ordering of digits

a visual key to “reverse Horner” (using div and mod)

non-standard numeral systems are also possible

Smullyan’s bijective numerals, without 0s

… or Fritz’s recursive prime-decomposition form (code and examples)

Algebraic terms

once we have names for specific values (constants or literals) and functions (unary or binary, etc.) on a domain, we can make terms, tree-like structures that express patterns of application

we can represent the tree-like structures directly with Haskell data types

there are natural ways to write out or “pretty-print” these trees as strings

… but perhaps with different orders (prefix, infix, postfix) and punctuation (aprens needed for infix)

“PEMDAS” allows us to eliminate some parentheses in favor of “order of operations” conventions

terms are most easily “evaluated” using folds—this tends to focus us on the changes between applications, due to re-parameterization of the fold for different purposes

we think of terms abstractly/informally as somewhere in-between the tree-like structures and the strings

Polynomial functions

if we allow just sum and product (possibly also their opposites), but add a single distinguished variable (x), terms have meanings as polynomial functions

we can add and multiply polynomials using generalizations of the grade-school algorithms for numerals

coefficients of a polynomial are like the digits of a numeral (in this case, the variable x represents the base)

we can use Horner’s technique to evaluate polynomials

we can also perform (e.g.) differentiation (derivatives) directly on polynomials, rather than on terms-with-variables

see code development here

… and sample calculation here

… and overview diagram here

Regular languages

a regular language is a set of strings with a relatively simple “global” structure

regular languages are an abstract notion (like an interface) that can be “implemented” in several ways

we will see them here via regular expressions and (deterministic) finite automata

later on, we will see a hierarchy where regular languages are the lowest = simplest kind

think of the definition mechanisms below as defining a “cut” between strings in and out of the language

how complex is the cut? here, for regular languages, it is quite simple (linear repetitions)

see code for both REs and DFAs below in Haskell here

Regular expressions

regular expressions (REs) are important in both theory and practice

(we study mostly the theory here, but you should learn the practice, too!)

REs have six forms: null (Ø), epsilon (ε), symbol (a), sum (|), product (·), Kleene star (*)

see the official definition here

denotational meanings of REs are sets of strings, also called languages

null means empty set

epsilon means singleton set of empty string

a character literal means the singleton set with the one-element string (of that character)

sum means union (of sets of strings)

product means every combination (concatenation) of strings, one from each set

star means indefinite (concatenated) repetition … not of one specific string, but of any strings (plural) from the set

essentially, regular expressions are patterns, which individual strings either match or don’t

in addition to the standsard (defining) RE forms, we can define new ones by “abbreviation”

plus (R+) means: R · R* (i.e., “one or more”)

option (?) means: R | ε (i.e., “one or none”)

many more such definitions make practical usage … well, practical!

(Deterministic) finite automata

a more operational approach defines simple “machines” (automata) to process strings

a machine “runs through” the symbols in a string, transitioning from state to state, and either accepting or rejecting the whole string

DFAs have an alphabet, states, a transition function, an initial state and some final states

see the official definition here

we start at the initial state, move between states based on current state and next symbol, and accept if we end in a final state

we can write out DFAs as diagrams (with circles, arrows, etc.) or as extended versions of their transition function tables

DFAs recognize exactly the same class of reguar languages as REs do

in fact, we can construct a DFA for any RE (using “machine-pasting” techniques) or an RE for any DFA (using “state-ripping” techniques)

details not required on the exam, but here is a visual guide to state-ripping

Non-deterministic finite automata (NFAs and GNFAs)

deterministic automata always go through a specific, definite sequence of states

non-deterministic automata can backtrack, or have “multiple futures”

… and they “win” (accept the string) if any accepting run is possible

plain NFAs add two features to DFAs: transition to a set of states, and allow “epsilon” transitions (consumes no input)

generalized NFAs (or GNFAs) allow transition between states on any string matching a specific regular expression

despite the extra features, NFAs are equivalent to DFAs: we can convert back and forth between them

we use NFAs and GNFAs to describe ways to convert between REs and DFAs

for an overview of the relationships among machines, see here

no details of NFAs or GNFAs are on the midterm! (just general points as in this outline)