Create syntax analyser and then the code generator. The syntax for DL0 is
designed so that it is straightforward to write a top-down predictive parser for DL0. For each non-terminal
symbol of the language, write a method or procedure that “recognises” an instance of the symbol. Ensure that
these methods or procedures check for syntax errors, but don’t worry at all about error recovery. So when a
syntax error is detected, you should output an appropriate error message and then it is perfectly acceptable to
stop the compilation, without producing any code. The use of a top-down predictive parser (recursive descent
parser) is recommended but is not mandatory, but note that the use of other parsing techniques may involve a
great deal more work.
Normally, a recogniser will only produce output when an error is detected, so you should include appropriate
code to convince you (and me!) that the syntax analyser works. For example, producing messages of
the form “expression recognised” should be adequate.
Having convinced yourself that this part of the project works, you can then add code to generate a syntax
tree. Take a small sample program and decide on the structure of the corresponding tree. Add some code to
traverse the tree and output the tree data – this should help you ensure that this part of the program is correct.
Remember that there is no need to include constant declarations (introduced by CDecl) in the tree because
these can (and should) be handled completely during lexical and syntax analysis.
Then, code generation can be tackled. The aim is to generate MIPS code. Algorithms to generate registerbased
code from a tree are covered in the lectures and in the notes. Do not worry about producing highquality
code. This is not an exercise about optimisation, so, for example, restricting the code to use just
a small number of registers is perfectly reasonable. This is a very important point since the complexity
of code generation can go up rapidly as code optimisation is introduced. Remember that because the DL0
source language is simple and because there is no need for optimisation, generating a linear intermediate
representation is unnecessary. Code generation can be done directly from the tree. Similarly, there is no need
for a distinct phase of semantic analysis in your compiler.
In tackling this part of the problem, you will have to consider the implementation of a simple symbol
table. Don’t worry about efficiency here – linear search would be perfectly reasonable. A hash table would
be fine too. You can forget about the complexities of scope rules because in DL0 all variables have global
scope, and are declared before they are used. But it would be nice to check that a variable name isn’t being
declared more than once.
A Compiler for DL0
The aim of this project is to write a compiler for the language DL0 producing code for the MIPS architecture.
Program = Block.
Block = "{" [CDecl] [VDecl] Statementlist "}".
CDecl = "define" Identifier "=" Constant {"," Identifier "=" Constant} ";".
VDecl = "integer" Iddef {"," Iddef} ";".
Iddef = Identifier ["=" Expression].
Statementlist = Statement {";" Statement}.
Statement = Assignment | PrintStatement | Empty.
Assignment = Identifier AssignmentOperator Expression.
AssignmentOperator = "=".
PrintStatement = "print" Expression.
Expression = [AddingOp] Term {AddingOp Term}.
AddingOp = "+" | "-".
Term = Factor {MultOp Factor}.
MultOp = "*" | "/".
Factor = Constant | Identifier | "(" Expression ")".
Constant = Digit {Digit}.
Identifier = Letter {Letter | Digit}.
Letter = "a"|"b"|"c"|"d"|"e"|"f"|"g"|"h"|"i"|"j"|"k"|"l"|"m"
|"n"|"o"|"p"|"q"|"r"|"s"|"t"|"u"|"v"|"w"|"x"|"y"|"z".
Digit = "0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9".
Empty has the obvious meaning.