CS 353: Architecture and Compilers—Final Exam Review
About the exam
the exam will be held onMonday, December 7th, from 2-5pm via a combination of email (to send the exam) and Zoom link (for proctoring/questions)
study resources: lecture notes, on-line materials (check the homepage), quizzes, lab assignments
the exam will be three hours long, closed book, closed notes, and closed computer (except, of course, for Zoom)
the exam will be comprehensive (covering the whole semester), but with some emphasis on material since the midterm
typical format: some T/F, some short answer, a few longer “conceptual” questions
Look at the course homepage for the handout archive, diagrams, etc.!
Topics from the midterm review
[see the midterm review at this linkfor more details]
Numeral systems: binary, decimal, hexadecimal, etc.
Data Representation
Boolean Logic
Gates and Circuits
Computer Organization
Instruction Set Architecture and Assembly Programming
Machine code and Assembly Language Programming Techniques
Overview of language translation
translation takes place in stages, with distinct data structures associated with each stage (see chart from lecture)
from a string of characters to a list of tokens by scanning
from a list of tokens to an abstract syntax tree by parsing
the tree enhanced with symbol table information by static analysis
[note that interpreters stop here and then just evaluate the (enhanced) tree]
from an enhanced tree to intermediate form or assembly language by code generation
… with possible optimization steps in between
and from assembly language to machine code by an assembler
several appropriate theories associated with the various stages can help manage complexity and suggest the “right” way to do things (but we concentrate on practical issues and a small, simple language)
separation into phases makes the process much easier to understand, but they can be combined in practice (making only one or two passes over the code, for example)
the meaning of a program can either be implemented dynamically, as we process the code (interpretation or evaluation) or statically, by way of translation to another form (compilation)
Grammars and parsing techniques
before (or during) parsing comes lexical analysis or scanning: a phase in which the input string is broken into "chunks" called tokens
tokens matter during parsing mainly for their classification (e.g., literal or variable), but during code generation also for their content (e.g., 147 or x)
some tokens, such as variables or literals, might be entered into a symbol table during this phase of processing or the next
parsing: this phase involves the recognition of hierarchical phrase structure in the language (phrases and sub-phrases, e.g., statements, expressions, etc.)
we describe the hierarchical structure of possible forms using a context-free grammar, which uses variables (or non-terminals) to express structure and terminals (actual pieces of language text) for final content
each context-free grammar describes a language, or set of strings, based on possible expansions starting from its start symbol and ending in a string of terminal symbols
parser generator programs take a grammar (usually modified) as input and provide a parsing program as output
pros: easier than hand-generated parser, more likely to correspond exactly to the grammar
cons: high learning curve, grammar often has to be "massaged" to fit the technique used (reducing confidence in correctness)
Shunting-yard algorithm
the shunting yard algorithm is a simple parsing technique which works well for languages with just atomic tokens and infix operators
(it can be extended to include other features in an ad-hoc fashion)
the algorithm uses two stacks, one for operands and one for operators, plus a boolean to track whether seeking operand or operator next
operators may be shifted from the input onto the op stack, or an operator may be reduced along with its arguments from the arg stack: the resulting term is pushed back on the arg stack
a fundamental aspect of the algorithm is the use of operator precedences to decide whether to shift or reduce
the shunting yard can also transform infix to postfix using just one stack, but can also build trees, or even values if they are statically determinable
Term representation and interpretation (evaluation)
terms (syntactic phrases) are naturally represented as trees with each node identifying a specific form of expression, and its children representing its immediate sub-phrases
in OO languages, it is natural to use a class/sub-class hierarchy to help organize phrases by types
interpretation proceeds as a pass over the tree, in some order determined by the language semantics, with appropriate actions being taken during the traversal (i.e., dynamically)
be careful to distinguish the syntactic issues of precedence and association with the semantic issues of order of evaluation
a parse tree is one which represents a parse exactly from a grammar: it usually contains a lot of redundant information based on grammar structure
an abstract syntax tree represents only the phrases and sub-phrase structure of interest, without the "residue" of grammar artifacts
Abstract machines and intermediate representations
in order to make code generation easier, we often choose an intermediate representation corresponding to some abstract machine which has features set in between our language and the actual target machine
typical abstract machine features include stacks for evaluating expressions or method calls, environments for variable look-up, or 3-address codes for representing unitary arithmetic and logic operations
an abstract machine can either be implemented with an interpreter itself (as in Java's JVM) or can be used as the basis for further processing
abstract machines and intermediate representations allow for a more flexible back-end to the compiler, since it is easier to re-target their implementation for different actual machines
the main distinguishing feature of an abstract machine used for these purposes is that its code is likely to have a more linear (non-hierarchical) form
Code generation and optimization
code generation follows the same plan as interpretation (an ordered tree traversal), but now generating pieces of code rather than actually performing semantic actions dynamically
when is it compilation (versus interpretation)?:
eliminate tree-like phrase structures for linear object code (with jumps);
eliminate names for addresses;
typical problems involve keeping track of run-time resources (e.g., registers and RAM locations), mapping names to their numeric equivalents, and determining proper sequencing of events
we usually depend on support from a run-time system for such services as dynamic memory allocation, communication with I/O devices, etc.
we may be able to separate the parsing and code generation phases for multiple languages and machines to bring (n·m) combinations down to (m+n) combinations