CS 353: Architecture and Compilers—Final Exam Review 2014

About the exam

the final exam will be held on Monday, December 8, 2014, from 8-11 am

resources: lecture notes, class hand-outs, previous exam review, lab assignments, on-line materials (thanks, Wikipedia!)

exam will be three hours, closed book/notes and cumulative (but stressing second half of course)

typical format: some T/F, some short answer, a few longer "conceptual" questions

Topics from the first half of the course (architecture); see previous review sheet!

Numeral systems: binary, hexadecimal, etc.

Data Representation

Boolean Logic

Gates and Circuits

Computer Organization

Instruction Set Architecture and Assembly Programming

Overview of language translation

translation takes place in stages, with distinct data structures associated with each stage (see chart from lecture)

several appropriate theories associated with the various stages can help manage complexity and suggest the "right" way to do things (but we concentrate on practical issues and a small, simple language)

separation into phases makes the process much easier to understand, but they can be combined in practice (making only one or two passes over the code, for example)

the meaning of a program can either be implemented dynamically, as we process the code (interpretation or evaluation) or statically, by way of translation to another form (compilation)

Regular languages and tokenization

lexical analysis or scanning: in this phase, a character string is broken into "chunks" called tokens

each token represents a separate, atomic component of syntax or semantics (the syntactic ones, such as parentheses and other delimiters, will ultimately be removed during parsing)

tokens matter during parsing mainly for their classification (e.g., literal or variable), but during code generation also for their content (e.g., 147 or x)

some tokens, such as variables or literals, might be entered into a symbol table during this phase of processing or the next

theoretically, we define a class of regular languages, capturing sequential repetition, as a means for describing token structure

regular languages can be defined in terms of finite automata (better for machine implementation) or in terms of regular expressions (better for human reading and writing)

scanner generators essentially translate regular expressions into an automaton (but with "hooks" to insert code to track symbols, etc.)

Context-free grammars and parsing techniques

parsing: this phase involves the recognition of hierarchical phrase structure in the language (phrases and sub-phrases, e.g., statements, expressions, etc.)

we describe the hierarchical structure of possible forms using a context-free grammar, which uses variables (or non-terminals) to express structure and terminals (actual pieces of language text) for final content

each context-free grammar describes a language, or set of strings, based on possible expansions starting from its start symbol and ending in a string of terminal symbols

in typical usage, we use two grammars which are equivalent, in the sense that they correspond to the same set of strings, but one of which is more abstract

parser generator programs take a grammar (usually modified) as input and provide a parsing program as output

pros: easier than hand-generated parser, more likely to correspond exactly to the grammar

cons: high learning curve, grammar often has to be "massaged" to fit the technique used (reducing confidence in correctness)

Term representation and interpretation (evaluation)

terms (syntactic phrases) are naturally represented as trees with each node identifying a specific form of expression, and its children representing its immediate sub-phrases

in OO languages, it is natural to use a class/sub-class hierarchy to help organize phrases by types

interpretation proceeds as a pass over the tree, in some order determined by the language semantics, with appropriate actions being taken during the traversal (i.e., dynamically)

be careful to distinguish the syntactic issues of precedence and association with the semantic issues of order of evaluation

a parse tree is one which represents a parse exactly from a grammar: it usually contains a lot of redundant information based on grammar structure

an abstract syntax tree represents only the phrases and sub-phrase structure of interest, without the "residue" of grammar artifacts

Abstract machines and intermediate representations

in order to make code generation easier, we often choose an intermediate representation corresponding to some abstract machine which has features set in between our language and the actual target machine

typical abstract machine features include stacks for evaluating expressions or method calls, environments for variable look-up, or 3-address codes for representing unitary arithmetic and logic operations

an abstract machine can either be implemented with an interpreter itself (as in Java's JVM) or can be used as the basis for further processing

abstract machines and intermediate representations allow for a more flexible back-end to the compiler, since it is easier to re-target their implementation for different actual machines

the main distinguishing feature of an abstract machine used for these purposes is that its code is likely to have a more linear (non-hierarchical) form

Code generation and optimization

code generation follows the same plan as interpretation (an ordered tree traversal), but now generating pieces of code rather than actually performing semantic actions dynamically

when is it compilation (versus interpretation)?:

eliminate tree-like phrase structures for linear object code (with jumps);

eliminate names for addresses;

typical problems involve keeping track of run-time resources (e.g., registers and RAM locations), mapping names to their numeric equivalents and determining proper sequencing of events

we usually depend on support from a run-time system for such services as dynamic memory allocation, communication with I/O devices, etc.