|
 |
CS 353: Architecture and Compilers—Final Exam Review 2014
|
|
|
 |
About the exam
|
|
|
 |
the final exam will be held on Monday, December 8, 2014, from 8-11 am
|
|
|
 |
resources: lecture notes, class hand-outs, previous exam review, lab assignments, on-line materials (thanks, Wikipedia!)
|
|
|
 |
exam will be three hours, closed book/notes and cumulative (but stressing second half of course)
|
|
|
 |
typical format: some T/F, some short answer, a few longer "conceptual" questions
|
|
|
 |
Topics from the first half of the course (architecture); see previous review sheet!
|
|
|
 |
Numeral systems: binary, hexadecimal, etc.
|
|
|
 |
Data Representation
|
|
|
 |
Boolean Logic
|
|
|
 |
Gates and Circuits
|
|
|
 |
Computer Organization
|
|
|
 |
Instruction Set Architecture and Assembly Programming
|
|
|
 |
Overview of language translation
|
|
|
 |
translation takes place in stages, with distinct data structures associated with each stage (see chart from lecture)
|
|
|
 |
several appropriate theories associated with the various stages can help manage complexity and suggest the "right" way to do things (but we concentrate on practical issues and a small, simple language)
|
|
|
 |
separation into phases makes the process much easier to understand, but they can be combined in practice (making only one or two passes over the code, for example)
|
|
|
 |
the meaning of a program can either be implemented dynamically, as we process the code (interpretation or evaluation) or statically, by way of translation to another form (compilation)
|
|
|
 |
Regular languages and tokenization
|
|
|
 |
lexical analysis or scanning: in this phase, a character string is broken into "chunks" called tokens
|
|
|
 |
each token represents a separate, atomic component of syntax or semantics (the syntactic ones, such as parentheses and other delimiters, will ultimately be removed during parsing)
|
|
|
 |
tokens matter during parsing mainly for their classification (e.g., literal or variable), but during code generation also for their content (e.g., 147 or x)
|
|
|
 |
some tokens, such as variables or literals, might be entered into a symbol table during this phase of processing or the next
|
|
|
 |
theoretically, we define a class of regular languages, capturing sequential repetition, as a means for describing token structure
|
|
|
 |
regular languages can be defined in terms of finite automata (better for machine implementation) or in terms of regular expressions (better for human reading and writing)
|
|
|
 |
scanner generators essentially translate regular expressions into an automaton (but with "hooks" to insert code to track symbols, etc.)
|
|
|
 |
Context-free grammars and parsing techniques
|
|
|
 |
parsing: this phase involves the recognition of hierarchical phrase structure in the language (phrases and sub-phrases, e.g., statements, expressions, etc.)
|
|
|
 |
we describe the hierarchical structure of possible forms using a context-free grammar, which uses variables (or non-terminals) to express structure and terminals (actual pieces of language text) for final content
|
|
|
 |
each context-free grammar describes a language, or set of strings, based on possible expansions starting from its start symbol and ending in a string of terminal symbols
|
|
|
 |
in typical usage, we use two grammars which are equivalent, in the sense that they correspond to the same set of strings, but one of which is more abstract
|
|
|
 |
parser generator programs take a grammar (usually modified) as input and provide a parsing program as output
|
|
|
 |
pros: easier than hand-generated parser, more likely to correspond exactly to the grammar
|
|
|
 |
cons: high learning curve, grammar often has to be "massaged" to fit the technique used (reducing confidence in correctness)
|
|
|
 |
Term representation and interpretation (evaluation)
|
|
|
 |
terms (syntactic phrases) are naturally represented as trees with each node identifying a specific form of expression, and its children representing its immediate sub-phrases
|
|
|
 |
in OO languages, it is natural to use a class/sub-class hierarchy to help organize phrases by types
|
|
|
 |
interpretation proceeds as a pass over the tree, in some order determined by the language semantics, with appropriate actions being taken during the traversal (i.e., dynamically)
|
|
|
 |
be careful to distinguish the syntactic issues of precedence and association with the semantic issues of order of evaluation
|
|
|
 |
a parse tree is one which represents a parse exactly from a grammar: it usually contains a lot of redundant information based on grammar structure
|
|
|
 |
an abstract syntax tree represents only the phrases and sub-phrase structure of interest, without the "residue" of grammar artifacts
|
|
|
 |
Abstract machines and intermediate representations
|
|
|
 |
in order to make code generation easier, we often choose an intermediate representation corresponding to some abstract machine which has features set in between our language and the actual target machine
|
|
|
 |
typical abstract machine features include stacks for evaluating expressions or method calls, environments for variable look-up, or 3-address codes for representing unitary arithmetic and logic operations
|
|
|
 |
an abstract machine can either be implemented with an interpreter itself (as in Java's JVM) or can be used as the basis for further processing
|
|
|
 |
abstract machines and intermediate representations allow for a more flexible back-end to the compiler, since it is easier to re-target their implementation for different actual machines
|
|
|
 |
the main distinguishing feature of an abstract machine used for these purposes is that its code is likely to have a more linear (non-hierarchical) form
|
|
|
 |
Code generation and optimization
|
|
|
 |
code generation follows the same plan as interpretation (an ordered tree traversal), but now generating pieces of code rather than actually performing semantic actions dynamically
|
|
|
 |
when is it compilation (versus interpretation)?:
|
|
|
 |
eliminate tree-like phrase structures for linear object code (with jumps);
|
|
|
 |
eliminate names for addresses;
|
|
|
 |
typical problems involve keeping track of run-time resources (e.g., registers and RAM locations), mapping names to their numeric equivalents and determining proper sequencing of events
|
|
|
 |
we usually depend on support from a run-time system for such services as dynamic memory allocation, communication with I/O devices, etc.
|
|
|