CS 353 final review 2020

CS 353: Architecture and Compilers—Final Exam Review

About the exam

the exam will be held on Monday, December 7th, from 2-5pm via a combination of email (to send the exam) and Zoom link (for proctoring/questions)

study resources: lecture notes, on-line materials (check the homepage), quizzes, lab assignments

the exam will be three hours long, closed book, closed notes, and closed computer (except, of course, for Zoom)

the exam will be comprehensive (covering the whole semester), but with some emphasis on material since the midterm

typical format: some T/F, some short answer, a few longer “conceptual” questions

Look at the course homepage for the handout archive, diagrams, etc.!

Topics from the midterm review

[see the midterm review at this linkfor more details]

Numeral systems: binary, decimal, hexadecimal, etc.

Data Representation

Boolean Logic

Gates and Circuits

Computer Organization

Instruction Set Architecture and Assembly Programming

Machine code and Assembly Language Programming Techniques

Overview of language translation

translation takes place in stages, with distinct data structures associated with each stage (see chart from lecture)

from a string of characters to a list of tokens by scanning

from a list of tokens to an abstract syntax tree by parsing

the tree enhanced with symbol table information by static analysis

[note that interpreters stop here and then just evaluate the (enhanced) tree]

from an enhanced tree to intermediate form or assembly language by code generation

… with possible optimization steps in between

and from assembly language to machine code by an assembler

several appropriate theories associated with the various stages can help manage complexity and suggest the “right” way to do things (but we concentrate on practical issues and a small, simple language)

separation into phases makes the process much easier to understand, but they can be combined in practice (making only one or two passes over the code, for example)

the meaning of a program can either be implemented dynamically, as we process the code (interpretation or evaluation) or statically, by way of translation to another form (compilation)

Grammars and parsing techniques

before (or during) parsing comes lexical analysis or scanning: a phase in which the input string is broken into "chunks" called tokens

tokens matter during parsing mainly for their classification (e.g., literal or variable), but during code generation also for their content (e.g., 147 or x)

some tokens, such as variables or literals, might be entered into a symbol table during this phase of processing or the next

parsing: this phase involves the recognition of hierarchical phrase structure in the language (phrases and sub-phrases, e.g., statements, expressions, etc.)

we describe the hierarchical structure of possible forms using a context-free grammar, which uses variables (or non-terminals) to express structure and terminals (actual pieces of language text) for final content

each context-free grammar describes a language, or set of strings, based on possible expansions starting from its start symbol and ending in a string of terminal symbols

parser generator programs take a grammar (usually modified) as input and provide a parsing program as output

pros: easier than hand-generated parser, more likely to correspond exactly to the grammar

cons: high learning curve, grammar often has to be "massaged" to fit the technique used (reducing confidence in correctness)

Shunting-yard algorithm

the shunting yard algorithm is a simple parsing technique which works well for languages with just atomic tokens and infix operators

(it can be extended to include other features in an ad-hoc fashion)

the algorithm uses two stacks, one for operands and one for operators, plus a boolean to track whether seeking operand or operator next

operators may be shifted from the input onto the op stack, or an operator may be reduced along with its arguments from the arg stack: the resulting term is pushed back on the arg stack

a fundamental aspect of the algorithm is the use of operator precedences to decide whether to shift or reduce

the shunting yard can also transform infix to postfix using just one stack, but can also build trees, or even values if they are statically determinable

see the longer description in the handout archive on the class homepage

Term representation and interpretation (evaluation)

terms (syntactic phrases) are naturally represented as trees with each node identifying a specific form of expression, and its children representing its immediate sub-phrases

in OO languages, it is natural to use a class/sub-class hierarchy to help organize phrases by types

interpretation proceeds as a pass over the tree, in some order determined by the language semantics, with appropriate actions being taken during the traversal (i.e., dynamically)

be careful to distinguish the syntactic issues of precedence and association with the semantic issues of order of evaluation

a parse tree is one which represents a parse exactly from a grammar: it usually contains a lot of redundant information based on grammar structure

an abstract syntax tree represents only the phrases and sub-phrase structure of interest, without the "residue" of grammar artifacts

Abstract machines and intermediate representations

in order to make code generation easier, we often choose an intermediate representation corresponding to some abstract machine which has features set in between our language and the actual target machine

typical abstract machine features include stacks for evaluating expressions or method calls, environments for variable look-up, or 3-address codes for representing unitary arithmetic and logic operations

an abstract machine can either be implemented with an interpreter itself (as in Java's JVM) or can be used as the basis for further processing

abstract machines and intermediate representations allow for a more flexible back-end to the compiler, since it is easier to re-target their implementation for different actual machines

the main distinguishing feature of an abstract machine used for these purposes is that its code is likely to have a more linear (non-hierarchical) form

Code generation and optimization

code generation follows the same plan as interpretation (an ordered tree traversal), but now generating pieces of code rather than actually performing semantic actions dynamically

when is it compilation (versus interpretation)?:

eliminate tree-like phrase structures for linear object code (with jumps);

eliminate names for addresses;

typical problems involve keeping track of run-time resources (e.g., registers and RAM locations), mapping names to their numeric equivalents, and determining proper sequencing of events

we usually depend on support from a run-time system for such services as dynamic memory allocation, communication with I/O devices, etc.

we may be able to separate the parsing and code generation phases for multiple languages and machines to bring (n·m) combinations down to (m+n) combinations