V CS 353: Architecture and Compilers—Final Exam Review 2014
V About the exam
* the final exam will be held on Monday, December 8, 2014, from 8-11 am
* resources: lecture notes, class hand-outs, previous exam review, lab assignments, on-line materials (thanks, Wikipedia!)
* exam will be three hours, closed book/notes and cumulative (but stressing second half of course)
* typical format: some T/F, some short answer, a few longer "conceptual" questions
V Topics from the first half of the course (architecture); see previous review sheet!
* Numeral systems: binary, hexadecimal, etc.
* Data Representation
* Boolean Logic
* Gates and Circuits
* Computer Organization
* Instruction Set Architecture and Assembly Programming
V Overview of language translation
* translation takes place in stages, with distinct data structures associated with each stage (see chart from lecture)
* several appropriate theories associated with the various stages can help manage complexity and suggest the "right" way to do things (but we concentrate on practical issues and a small, simple language)
* separation into phases makes the process much easier to understand, but they can be combined in practice (making only one or two passes over the code, for example)
* the meaning of a program can either be implemented dynamically, as we process the code (interpretation or evaluation) or statically, by way of translation to another form (compilation)
V Regular languages and tokenization
* lexical analysis or scanning: in this phase, a character string is broken into "chunks" called tokens
* each token represents a separate, atomic component of syntax or semantics (the syntactic ones, such as parentheses and other delimiters, will ultimately be removed during parsing)
* tokens matter during parsing mainly for their classification (e.g., literal or variable), but during code generation also for their content (e.g., 147 or x)
* some tokens, such as variables or literals, might be entered into a symbol table during this phase of processing or the next
* theoretically, we define a class of regular languages, capturing sequential repetition, as a means for describing token structure
* regular languages can be defined in terms of finite automata (better for machine implementation) or in terms of regular expressions (better for human reading and writing)
* scanner generators essentially translate regular expressions into an automaton (but with "hooks" to insert code to track symbols, etc.)
V Context-free grammars and parsing techniques
* parsing: this phase involves the recognition of hierarchical phrase structure in the language (phrases and sub-phrases, e.g., statements, expressions, etc.)
* we describe the hierarchical structure of possible forms using a context-free grammar, which uses variables (or non-terminals) to express structure and terminals (actual pieces of language text) for final content
* each context-free grammar describes a language, or set of strings, based on possible expansions starting from its start symbol and ending in a string of terminal symbols
* in typical usage, we use two grammars which are equivalent, in the sense that they correspond to the same set of strings, but one of which is more abstract
V parser generator programs take a grammar (usually modified) as input and provide a parsing program as output
* pros: easier than hand-generated parser, more likely to correspond exactly to the grammar
* cons: high learning curve, grammar often has to be "massaged" to fit the technique used (reducing confidence in correctness)
V Term representation and interpretation (evaluation)
* terms (syntactic phrases) are naturally represented as trees with each node identifying a specific form of expression, and its children representing its immediate sub-phrases
* in OO languages, it is natural to use a class/sub-class hierarchy to help organize phrases by types
* interpretation proceeds as a pass over the tree, in some order determined by the language semantics, with appropriate actions being taken during the traversal (i.e., dynamically)
* be careful to distinguish the syntactic issues of precedence and association with the semantic issues of order of evaluation
* a parse tree is one which represents a parse exactly from a grammar: it usually contains a lot of redundant information based on grammar structure
* an abstract syntax tree represents only the phrases and sub-phrase structure of interest, without the "residue" of grammar artifacts
V Abstract machines and intermediate representations
* in order to make code generation easier, we often choose an intermediate representation corresponding to some abstract machine which has features set in between our language and the actual target machine
* typical abstract machine features include stacks for evaluating expressions or method calls, environments for variable look-up, or 3-address codes for representing unitary arithmetic and logic operations
* an abstract machine can either be implemented with an interpreter itself (as in Java's JVM) or can be used as the basis for further processing
* abstract machines and intermediate representations allow for a more flexible back-end to the compiler, since it is easier to re-target their implementation for different actual machines
* the main distinguishing feature of an abstract machine used for these purposes is that its code is likely to have a more linear (non-hierarchical) form
V Code generation and optimization
* code generation follows the same plan as interpretation (an ordered tree traversal), but now generating pieces of code rather than actually performing semantic actions dynamically
V when is it compilation (versus interpretation)?:
* eliminate tree-like phrase structures for linear object code (with jumps);
* eliminate names for addresses;
* typical problems involve keeping track of run-time resources (e.g., registers and RAM locations), mapping names to their numeric equivalents and determining proper sequencing of events
* we usually depend on support from a run-time system for such services as dynamic memory allocation, communication with I/O devices, etc.