CS 451/LLC Lab 1: Simple Semantics via Folds

Since we haven’t seen much in lecture yet, this lab will be a simple set of exercises both to get you back in the habit of Haskell and also to get you to start thinking about language and meaning, syntax and semantics. Having seen some list and tree folds in your Functional Programming class, you should be able to write a few short programs using folds to complete the lab … but remember that short does not necessarily mean quick in Haskell! In other words, you should expect to puzzle a bit over these until that “Aha!” moment when you see a nice, concise solution. Along the way, of course, you should try things out in Hugs/ghci, play around, and especially pay attention to the types (they will become more important in this class).

Types for Terms, Expressions, and Operators

The “language” that we will be concerned with here will be just the language of arithmetic, with just numeric constants and three operations: addition, multiplication, and exponentiation. But, since we will want to generalize this soon, to allow other types of constants and other operators, we will first define a general data type for terms (“tree-like pieces of language”) and then instantiate this general definition to more specific types. In particular, we will call these more specific numeric terms (numeric) expressions.

data Term c o = Con c | App o (Term c o) (Term c o)

type Expr = Term Integer Op

data Op = Add | Mul | Exp  deriving (Eq, Ord)

(Note: you should copy this and the other definitions into your program file for the lab.)

The type Op here is just a little enumerated type of plain, atomic “symbols”. That is to say, it is an algebraic data type with several constructors, but each of which is essentially a constant—they take no arguments, but are “complete” as-is. Enumerated types are like a set of constants or, in more linguistic terms, like an alphabet of symbols. In Haskell they are a special, simple case of the more general algebraic data types. But they are also available in many other languages, including Java and C, albeit in slightly different forms. They are usually provided as a type-safe alternative to the kinds of little “tag” values that programmers sometimes use when they need to indicate or store one of a small number of alternatives (e.g., a status number or an error code). The use of raw integers for tags like this is dangerous: if there are other such tag-sets or other integers around, these tags or integers could get confused by a simple mistake. In a type-safe language, a compiler can check to see that the tags are used appropriately and not confused with each other or other integers.

(Note here that we instantiate the Term data type to Expressions using Haskell Integers. We should probably use Strings instead, but the syntax and semantics of simple numbers are not so far apart … besides, we will get more honest about it for one of the exercises below 😀 .)

You should think of these Terms, Expressions, and Operators as an internal form of abstract syntax, as opposed to the external form of concrete syntax, which would normally be represented by a String, or the forms of meanings, which in this case would likely be just Integers or Ints. (Technically, we should distinguish the abstract mathematical concepts of strings, trees, symbols, and natural numbers from their Haskell-specific implementations, but it is OK to leave these lines a little blurry for now.)

Folds for Terms, Expressions, and Operators

Now that we have some algebraic datatypes to represent or numeric expressions and symbols, we can start to think about their meanings, or rather about their semantics, i.e. the way they get their meanings.

The style of semantics we will implement in this lab is called denotational or mathematical semantics. The idea is that the meanings are given as some set of (mathematical) values, in this case the natural numbers (whole numbers, but with no negatives, just 0 and up). A semantics in this style will then supply a natural number meaning for each possible internal form by way of a meaning function. In Haskell, therefore, an Integer or Int will be provided, as a function result, for each Expression argument. Traditionally, denotational semantics are defined in a way that is called compositional: the meaning of the whole is defined in terms of the meanings of its parts. In this context, with the recursive-but-ultimately-symbolic tree forms we call Terms, it means that we should define the meaning function by cases and recursion over the Term type … i.e., using the “template style” from the Functional Programming course.

But, as we learned there, template-style functions are most concisely expressed as folds. And in fact, one of our themes will be that

folds over algebraic types correspond to (denotational) semantics,

at least with regard to Haskell implementations of simple algebraic languages.

What types do we need folds for, and what do those folds look like?

Recall from Functional Programming that a fold is a parameterized, usually recursive function, defined over some (usually) recursive data type. It gets one parameter for each constructor of the data type, and it “replaces” the constructor in the result by the corresponding parameter. In the usual case, the constructor has arguments, and so the parameter to replace it should be a function, applied to the constructor’s arguments … unless one of those arguments represents a recursive sub-structure, i.e., an occurrence in the data type’s definition of the same type being defined. In this case, the fold function we are defining should be called recursively in just the same way—we end up converting the whole structure to some specified result type, corresponding here to the semantic domain, by combining recursive sub-results using the supplied parameter functions.

And, by the way, if a constructor has no structurally-recursive arguments, then the parameter function is just applied to the constructor’s arguments, transforming them into a value of the result type. Finally, if it has no arguments at all, then the corresponding value is not a function, but just some fixed value of the result type, used to replace the fixed value (or constant) of the structural type which is that particular constructor.

(For a somewhat more graphical perspective, you might want to review the recipe for folds from the Functional Programming course.)

So, given all that, what should the folds for our types look like? For the Terms themselves, we get the following definition:

fold f g (Con c)     = f c
fold f g (App o l r) = g o (rec l) (rec r)
                       where rec = fold f g

(Can you guess the type of this fold function, based on its definition? Try to do so, but if you can’t, remember that Haskell (well, Hugs or ghci) can do it for you: just type “:t fold” once the function is defined!)

For example, here are two little functions (perhaps familiar from Functional Programming) which compute the size and height of a Term, thought of here more overtly as a tree (note the use of the const function and lambda notation):

size = fold (const 1) (\_ n m -> 1 + (n + m))
hite = fold (const 1) (\_ n m -> 1 + (n `max` m))

But we can also define a fold for our “alphabetic” enumerated type of arithmetic operators: since they take no arguments, these constructors will be “eliminated” (deconstructed?) by simply replacing each with the appropriate parameter value:

foldOp a m e Add = a
foldOp a m e Mul = m
foldOp a m e Exp = e

For example, a function to replace each of the abstract operators by their usual ASCII string equivalents might look like this:

showOp :: Op -> String

showOp = foldOp "+" "*" "^"

Sample data and lab exercises

Speaking of examples, here is a quick definition of a list of some sample values you can use during lab to test your definitions and check your results:

samples = [  Con 17,                                        --  17
             App Add (Con 2) (App Mul (Con 3) (Con 5)),     --  2 + 3 * 5
             App Add (square (Con 7)) (square (Con 9)),     --  7^2 + 9^2
             App Mul (samples!!1) (samples!!2)              --  (2+3*5) * (7^2 + 9^2)
           ]
            
          where square x = App Exp x (Con 2)

Note how we can use “helper functions” like square to make the definitions a little easier to write. And note that we can define larger samples in terms of smaller ones, even using a bit of recursion, with the help of Haskell’s indexing function (!!).

OK, now it’s time for some actual lab exercises!

Add 3 sample Expressions to the list of samples above. Write them out first on paper or in a text editor the way you would write them as math formulas, or Haskell expressions. Then try to translate them into the internal form we use here, as values of the Haskell type Expr. And feel free to use helper functions or other tricks to make it easier.
Write a function eval :: Expr -> Integer that will evaluate a numeric expression and return its integer value (or use Haskell’s Int type instead). You should do this using the fold function for Terms … as well as the foldOp function for Operators. You will need to supply two function arguments to fold, one each to “eliminate” the Con and App constructors. In the context of Expressions, the first function will act on the integer arguments of Con and the scond will act on the Op and sub-term arguments of App. (Hint: in other words, you will need the meanings of Operators for that second part.)

For example, we might have:
> map eval samples [17,17,130,2210] -- of course, you will have 3 more samples!
Write a function pprt :: Expr -> String (for “pretty-print”) which will show what an Expression would normally look like printed out in infix format, with the operator between its two arguments. You will want to use the Haskell show function to convert Integers into Strings. And to make things easier, just put in all the parentheses —don’t worry about deleting them based on the order of operations (PEMDAS).

For example, we might have:
> pprt (samples !! 3) "((2 + (3 * 5)) * ((7 ^ 2) + (9 ^ 2)))"
Write another pretty-printing function, but this time format the operators and their arguments in prefix format; in other words, the operators come before their arguments, not between them. (In this case, parentheses are not even necessary!) Ideally, this variation on pprt should be a simple modification of the original.

For example, we would have:
> pprt' (samples !! 3) "* + 2 * 3 5 + ^ 7 2 ^ 9 2"
Make a type definition similar to that for Expr, but this time use Haskell Strings in place of Integers. (You can call the new type Expr', read “exper prime”.) Now write a revised evaluation function eval' :: Expr' -> Integer: it should work for these new string-based expressions. You will want to use the read function to convert strings of digits to numbers. If you have trouble with it, try declaring and using this instead:
readInt = read :: String -> Integer
For example, you should get:
> eval' (App Mul (Con "23") (Con "2")) 46
Finally, write a function count :: Expr -> (Integer,Integer) which separately counts the number of constants and operators in a general Term; the first number is the constant count, the second the operator count. You should do this using the fold function—the solution should be similar to size and hite above, but of course the result type is now a pair of numbers. (Hint: and therefore two of the arguments to the App-eliminating argument to the fold will also be pairs!)

For example, you should have:
> count $ samples !! 3 (7,6)
(For class discussion: can you see a relationship that always holds between the two count results? Why does it hold? could you prove it?)