For this assignment, you will write a complete interpreter for a programming language called Sloth (Simple Language Of Tiny Heft). The language has no functions, and the only data type is the double-precision floating point number. However, it does have variables, mathematical expressions and common control structures.
This assignment is broken into three portions: the lexical analyzer, the syntax analyzer and the interpreter. The lexical analyzer takes textual source code and reduces it to a stream of tokens. The syntax analyzer takes the stream of tokens and parses it into a tree structure. The interpreter then takes the parse tree and interprets it to execute the program.
You can use the same set of input files to test your parser:
Program | Description |
hello.sl | Prints hello world |
simple.sl | Prints 1 + 1. |
area.sl | Inputs the radius of a circle and computes its area. |
fact.sl | Inputs a number and computes the factorial of that number. |
fibs.sl | Inputs a number N and displays the first N Fibonacci numbers. |
minmax.sl | Inputs a number N, then inputs N numbers, then displays the ones with the least and greatest values. |
guess.sl | A guess the number game |
table.sl | Prints a 2D multiplication table |
acid.sl | A program that tests out most aspects of the language, and simply outputs 42. |
For this portion of this assignment, you will write code that plugs into ANTLR to take
actions as different parts of the tree are visited. We will now add the -visitor
flag to the ANTLR tool when we give it our grammar. This causes it to setup the visitor
classes we can hook into. You can use the updated build and
clean scripts for this project.
The way ANTLR works, each time we visit a node of the parse tree, we can return back one value for it. Nodes that represent statements will not really use this value, but those that represent expressions, such as an addition node will return back the value produced. We have to return back the same Java type for all nodes, those this may be null.
To fill this purpose, so you should create a class called
SlothValue
which represents any 1 value in a Sloth program. This will be able
to store an int, bool, real, or string which are the four types Sloth supports. In order
to make a class which can handle this, the SlothValue
class should have two
instance varialbes:
type
which tells us which of the four types this SlothValue
is storing. You could use just an int for this, but I'd really recommend making an enum called
SlothType
to make it easier to keep straight. value
which is an Object. Being an Object (the root of all classes in Java)
will allow us to assign Integer, Double, Boolean, or String values into it.You should make four constructors for this class, one for each of the
four types, setting both type
and value
for each. This way we
can make a SlothValue
for any type we need one for.
This class should also have four get methods, each of which tries to return the value
as the requested type. Each of these functions will have a switch based on the actual
type of the SlothValue
and do the thing that makes sense based on this table:
From string | From int | From real | From bool | |
---|---|---|---|---|
To string | - | Convert | Convert | Convert |
To int | Error | - | Truncate | Error |
To real | Error | Convert | - | Error |
To bool | Error | Error | Error | - |
For the error cases, you can throw an exception which halts the program.
We should be able to use our grammar and lexer files from part 2 of this project, with one small change. Now we need to be able to distinguish between different options of grammar rules with multiple choices for production. The way ANTLR handles this is by including tags in the grammar file.
These begin with the # character, followed by a unique identifier. ANTLR uses this identifier in the generated Java code, allowing us to handle the different cases. You can look at the parser of the calculator visitor example to see how this looks.
You should add these tags to the grammar rules you have which include multiple options. If there is only one option, this is not needed as ANTLR uses the name of the non-terminal itself as the name.
With tags, and the SlothValue
class in place, we can now work on the interpreter itself.
We will do this by creating a file called SlothVisitor.java
. This will override
the empty visitor class that ANTLR provides. The declaration at the top of this class can look like this:
public class SlothVisitor extends SlothParserBaseVisitor<SlothValue>
Here we extend the base visitor class, as well as fill in SlothValue for the types of each node in our parse tree. The main work to do here is override each of the methods in the base class (which you can look at after building the grammar) to do the work of interpreting the program via syntax tree.
This is a recursive process in which you will call visit
on the children
nodes to evaluate them, or get their value. We can use the names of the grammar symbols as
methods on the context parameters passed into each method.
For the nodes which represent statements, you will carry out what the statement says to do, returning null for the value. For instance, the print statement node might look like this:
@Override
public SlothValue visitPrint(SlothParser.PrintContext ctx) {
SlothValue expr = visit(ctx.expression());
System.out.print(expr);
return null;
}
We are given the "PrintContext" parameter from which we can get the parse tree node representing the expression to the left of the print. We evaluate that, storing the value which is produced, which we then pass into System.out.println.
For expression nodes, we will compute a value which is returned back from the method. For example, here is a method which visits a times or divide node. Notice that we have two expressions on the right hand side of the grammar rule, so must pass an index into the expression method.
@Override
public SlothValue visitTimesdiv(SlothParser.TimesdivContext ctx) {
SlothValue left = visit(ctx.expression(0));
SlothValue right = visit(ctx.expression(1));
// if both are ints, do int math, otherwise try and do reals
if (left.getType() == SlothType.INT && right.getType() == SlothType.INT) {
if (ctx.op.getType() == SlothParser.TIMES) {
return new SlothValue(left.toInt() * right.toInt());
} else {
return new SlothValue(left.toInt() / right.toInt());
}
} else {
if (ctx.op.getType() == SlothParser.TIMES) {
return new SlothValue(left.toReal() * right.toReal());
} else {
return new SlothValue(left.toReal() / right.toReal());
}
}
}
Also notice that we do integer math when both operands are integers. Otherwise,
we try to get both as real values, returning a real result. If one of them is a bool
or string, the toReal
method will throw an exception, which will chain
up from this method too.
One small detail int eh evaluation of expressions that nearly all languages follow is the "short-circuit" evaluation of boolean operations. If we have code like the following:
if a && b then
print("Hi!");
if c || d then
print("Bye");
Short circuit evaluation means we do not evaluate the other side of an && or ||
operator if we know the result. Here, if a
is false, then we do not even need
to check b — we know the result is false. Likewise, if c
is true, there
is no need to check d.
In simple cases, there is not much importance, but it allows programs to do interesting things such as the following patterns:
if (ptr != NULL && ptr->count > 0)
if (somethingSucceeds() || panicAndQuit())
The first example could trigger a crash without short circuit evaluation, since ptr being null means that even looking at one of its members would be illegal.
The second example will only call the panicAndQuit
function when the first one
returns false.
Sloth doesn't have functions yet, but you should implement short-circuit evaluation.
Another aspect of this assignment is keeping track of the values of any variables. To do this, you will maintain a symbol table which keeps track of variable / value pairs. You can use a Java HashMap for this, which can be an instance variable in the SlothVisitor class. The keys are strings (the variable name) and the values are SlothValue objects.
When an assignment statement assigns a value into a new variable, you must insert that variable into the symbol table. When an assignment is made to an existing variable, you must search for the variable in the symbol table and update its value. When an expression references a variable, you must search for its current value to use.
If a variable is referenced before its assigned to, your program should signal an error.
We no longer need the grun
program to test our work — now we can run
the programs directly!
The following demonstrates the output of the fact.sl program:
Please enter a number: 5 5! = 120
And here is the output for a run of the guess.sl program:
Is your number 50? no Answers are 'yes', 'low', or 'high'. Is your number 50? high Is your number 25? low Is your number 37? high Is your number 31? low Is your number 34? low Is your number 35? low Is your number 36? yes Got it!
Submit the files for your interpreter to Canvas for this assignment.
Copyright © 2024 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.