Abstract
The goal of this project is to give you some hands-on experience with implementing a small compiler. You will write a compiler for a simple language. You will not be generating assembly code. Instead, you will generate an intermediate representation (a data structure that represents the program). The execution of the program will be done after compilation by interpreting the generated intermediate representation.
1 Introduction
You will write a small compiler that will read an input program and represent in a linked list as a sequence of instruction. A node of the linked list represents one instruction. A instruction node specifies: (1) the type of the instructions, (2) the operand(s) of the instruction (if any) and, for jump instructions, the next instruction to be executed (the default is that the next instruction in the list is executed). After the list of instructions is generated by your compiler, your compiler will execute the generated list of instructions by interpreting it. This means that the program will traverse the data structure and at every node it visits, it will “execute” the node by changing the content of memory locations corresponding to operands and deciding what is the next instruction to execute (program counter). The output of your compiler is the output that the input program should produce. These steps are illustrated in the following figure
The remainder of this document is organized into the following sections:
1. Grammar Defines the programming language syntax including grammar.
2. Execution Semantics Describe statement semantics for assignment, input, if, while, switch, for and output statements.
3. How to generate the linked list of instructions Explains step by step how to generate the intermediate representation (data structure).
4. Requirements Lists other requirements.
2 Grammar
The grammar for this project is the following:
program ! var section body inputs
var section ! id list SEMICOLON
id list ! ID COMMA id list j ID
body ! LBRACE stmt list RBRACE
stmt list ! stmt stmt list j stmt
stmt ! assign stmt j while stmt j if stmt j switch stmt j for stmt
stmt ! output stmt j input stmt
assign stmt ! ID EQUAL primary SEMICOLON
assign stmt ! ID EQUAL expr SEMICOLON
expr ! primary op primary
primary ! ID j NUM
op ! PLUS j MINUS j MULT j DIV
output stmt ! output ID SEMICOLON
input stmt ! input ID SEMICOLON
while stmt ! WHILE condition body
if stmt ! IF condition body
condition ! primary relop primary
relop ! GREATER j LESS j NOTEQUAL
switch stmt ! SWITCH ID LBRACE case list RBRACE
switch stmt ! SWITCH ID LBRACE case list default case RBRACE
for stmt ! FOR LPAREN assign stmt condition SEMICOLON assign stmt RPAREN body
case list ! case case list j case
case ! CASE NUM COLON body
default case ! DEFAULT COLON body
inputs ! num list
num list ! NUM
num list ! NUM num list
Some highlights of the grammar:
1. Division is integer division and the result of the division of two integers is an integer.
2. Note that if stmt does not have else.
3. Note that for has a very general syntax similar to that of the for loop in the C language
4. Note that the input and output keywords are lowercase, but other keywords are all upper- case.
5. condition has no parentheses.
6. There is no type specified for variables. All variables are int by default.
3 Variables and Locations
The var section contains a list of all variable names that can be used by the program. For each variable name, we associate a unique locations that will hold the value of the variable. This association between a variable name and its location is assumed to be implemented with a function location that takes a variable name (string) as input and returns an integer value. The locations where variables will be stored is called mem which is an array of integers. Each variable in the program should have a unique entry (index) in the mem array. This association between variable names and locations can be implemented with a location table.
The var section contains a list of all variable names that can be used by the program. For each variable name, we associate a unique locations that will hold the value of the variable. This association between a variable name and its location is assumed to be implemented with a function location that takes a variable name (string) as input and returns an integer value. The locations where variables will be stored is called mem which is an array of integers. Each variable in the program should have a unique entry (index) in the mem array. This association between variable names and locations can be implemented with a location table.
4 Inputs
The list of input values is called inputs and appears as the last section of an input program. This list must be read by your compiler and stored in an inputs array, which is simply a vector of integers.
5 Execution Semantics
All statements in a statement list are executed sequentially according to the order in which they appear. Exception is made for body of if stmt, while stmt and switch stmt as explained below. In what follows, I will assume that all values of variables as well as constants are stored in locations. This assumption is used by the execution procedure that we provide. This is not a restrictive assumption. For variables, you will have locations associated with them. For constants, you can reserve a location in which you store the constant (this is like having an unnamed immutable variable).
5.0.1 Input statements
Input statements get their input from the sequence of inputs. We refer to i’th value that appears in inputs as i’th input. The execution of the i’th input statement in the program of the form input a is equivalent to:
mem[location("a")] = inputs[input_index] input_index = input_index + 1
where location("a") is an integer index value that is calculated at compile time as we have seen above. Note that the execution of an input statement advances an input index which keeps track (at runtime) of the next value to read (like in project 1).
5.1 Output statement
The statement
output a;
prints the value of variable a at the time of the execution of the output statement.
5.2 Assignment Statement
To execute an assignment statement, the expression on the righthand side of the equal sign is evaluated and the result is stored in the location associated with the lefthand side of the expression.
5.3 Expression
To evaluate an expression, the values in the locations associated with the two operands are obtained and the expression operator is applied to these values resulting in a value for the expression.
5.2 Assignment Statement
To execute an assignment statement, the expression on the righthand side of the equal sign is evaluated and the result is stored in the location associated with the lefthand side of the expression.
Students succeed in their courses by connecting and communicating with an expert until they receive help on their questions
Consult our trusted tutors.