Sloth Interpreter Assignment

Due Dates

Portion	Date
Lexical Analysis	February 5
Syntax Analysis	March 10
Complete Interpreter	April 9

Introduction

For this assignment, you will write a complete interpreter for a programming language called Sloth (Simple Language Of Tiny Heft). The language has no functions, and the only data type is the double-precision floating point number. However, it does have variables, mathematical expressions and common control structures.

This assignment is broken into three portions: the lexical analyzer, the syntax analyzer and the interpreter. The lexical analyzer takes textual source code and reduces it to a stream of tokens. The syntax analyzer takes the stream of tokens and parses it into a tree structure. The interpreter then takes the parse tree and interprets it to execute the program.

Test Programs

Program	Description
simple.sl	Prints 1 + 1.
area.sl	Inputs the radius of a circle and computes its area.
fact.sl	Inputs a number and computes the factorial of that number.
fibs.sl	Inputs a number N and displays the first N Fibonacci numbers.
minmax.sl	Inputs a number N, then inputs N numbers, then displays the ones with the least and greatest values.
acid.sl	A program that tests out most aspects of the langauge, and simply outputs 42.

Lexical Analysis

For this portion of the assignment, you will write a lexer for Sloth. The goal of the lexer is to break code up into individual tokens as discussed in class. The lexical details of the language are described here, and you will write a lexer using flex.

Identifiers in this language will start with a letter or underscore, followed by any number of letters, digits or underscores.

The only number types in Sloth are double-precision floating point. However, numbers can appear in programs as integers in decimal format. These are simply a sequence of one or more digits.

Floating point numbers consist of zero or more digits, a ".", then one or more digits. There is no scientific notation in Sloth.

Comments in this language start with the % symbol and continue until a line break is encountered.

Your lexer will read programs in this Sloth code and return the tokens it encounters. The following table lists all of the tokens in the language with a token code for each.

Token	Code
Identifier	100
Value	101
+	102
-	103
/	104
*	105
<	106
>	107
<=	108
>=	109
==	110
!=	111
&&	112
\|\|	113
!	114
;	115
:=	116
(	117
)	118
begin	119
end	120
if	121
then	122
else	123
while	124
do	125
print	126
input	127

If you see a character that is not matched by any of these tokens, you should print an error message and exit the program.

Your program should take one command line parameter, so that it can be invoked as:

$ lexer program.sl

You should check that the user passed an argument and that the file exists.

When calling the yylex function, it assumes that input is from stdin. To use another file as input, you must assign it to stdin before calling yylex:


stdin = fopen(argv[1], "r");

Tips:

Use return statements in your flex actions rather than placing printf statements inside of them. Then call yylex inside of a loop until it returns 0.
Create named constants (using #define) for the types of tokens instead of using numbers directly.

The lexer output of "fact.sl" is given below:

Syntax Analysis

For this portion of the assignment, you will write a parser for Sloth that will use the lexer you wrote.

The syntactic details of the language are described here, and you will write a parser using bison.

Programs in Sloth consist of zero or more statements.

Each statement can be:

An assignment
This consists of an identifier, the := operator, an expression, then a semi-colon. Example:
```
number1 := a + 3;
```
An if statement
This consists of the "if" keyword, an expression, the "then" keyword, and then a statement. Example:
```
if a < b then
  a := b;
```
An if/else statement
This consists of the "if" keyword, an expression, the "then" keyword, a statement, the "else" keyword, then another statement. Example:
```
if a < b then
  a := b;
else
  b := a;
```
A while statement
This consists of the "while" keyword, an expression, the "do" keyword, then a statement. Example:
```
while i <= 10 do
  i := i * 2;
```
A print statement
This consists of the "print" keyword, followed by an expression, then a semi-colon. Example:
```
print 1 + 1;
```
A statement sequence
This consists of the "begin" keyword, 0 or more statements, and then the "end" keyword. Example:
```
begin
  x := 5;
  print x;
end
```

Notes:

Semi-colons are only a part of assignment and print statements.
Statements used as part of if and while are single statements. To have more than one, the statement sequence is used.
The if vs. if/else statements will cause a shift/reduce conflict during parsing. This is OK as the default action of shift is correct.

Expressions are the other major part of Sloth syntax. An expression consists of literal values, identifiers and input expressions combined with parenthesis and the following operators (in order of precedence):

Parenthesis
!
* /
+ -
< <= > >= == !=
&&
||

All operators are binary except for ! which is unary. The operators each have the same meaning as they do in C-based languages.

Remember to make your expression grammar unambiguous and enforce the precedence rules as discussed in class.

Your program will parse Sloth programs using Bison. If an input program does not fit the grammar you specify, Bison will produce an error.

Note that having one shift/reduce error is expected! As discussed in class, this is caused by the dangling else issue.

Bison will trigger the actions as it is parsing each grammar rule. Inside of these actions, your parser should construct a parse tree that represents the program.

Trees will be constructed with one child for each sub-portion of a node. For example, the following code:

a := a + 1;

if a > 10 then
  a := 0;

Would be represented with the following tree:

Each set of multiple statements (such as between begin/end) is grouped with a statement node. Each of these has a single statement as the left child and a set of multiple as the right child (or NULL for the last statement in the set).
A statement such as if, else if, or while has children representing the expressions and /or statements that make up the statement.
Binary operators have two children - one for each operand.
Unary operators, such as ! and print, have one child.
Identifiers, literal numbers, and inputs are leaf nodes.

You can use the Node structure, make_node and attach_node functions to build the tree:


#define ID_SIZE 100
#define MAX_CHILDREN 3

/* a tree node definition */
struct Node {
  /* the type of the node */
  int type;

  /* the value of the node if it can have one */
  double value;

  /* the id of the node (used for identifiers only) */
  char id[ID_SIZE];

  /* at most three children nodes */
  int num_children;
  struct Node* children[MAX_CHILDREN];
};


/* creates a new node and returns it */
struct Node* make_node(int type, double value, char* id) {
  int i;

  /* allocate space */
  struct Node* node = malloc(sizeof(struct Node));

  /* set properties */
  node->type = type;
  node->value = value;
  strcpy(node->id, id);
  node->num_children = 0;
  for(i = 0; i < MAX_CHILDREN; i++) {
    node->children[i] = NULL;
  }

  /* return new node */
  return node;
}

/* attach an existing node onto a parent */
void attach_node(struct Node* parent, struct Node* child) {
  /* connect it */
  parent->children[parent->num_children] = child;
  parent->num_children++;
  assert(parent->num_children <= MAX_CHILDREN);
}

Each parser action should construct the tree that represents that portion of the program. The tree should be assigned to the $$ variable. Don't forget to attach any subtree nodes.

To illustrate this, the rule below matches an if statement in Sloth (where IF and THEN are tokens from the lexer). The action creates an IF node, (the value and id are only used for literals and identifiers respectively). It then attaches the expression node as the first child, and the statement node as the right child.


IF expr THEN stmt {
  $$ = make_node(IF, 0, "");
  attach_node($$, $2);
  attach_node($$, $4);
}

The first rule of your grammar should assign the complete parse tree to a variable that you can use in your main program.

To test your program, call yyparse in main to parse the program and produce the parse tree. Then call this function to print the parse tree to the screen:


void print_tree(struct Node* node, int tabs) {
  int i;

  /* base case */
  if(!node) return;

  /* print leading tabs */
  for(i = 0; i < tabs; i++) {
    printf("    ");
  }

  switch(node->type) {
    case IDENTIFIER: printf("IDENTIFIER: %s\n", node->id); break;
    case VALUE: printf("VALUE: %lf\n", node->value); break;
    case PLUS: printf("PLUS:\n"); break;
    case MINUS: printf("MINUS:\n"); break;
    case DIVIDE: printf("DIVIDE:\n"); break;
    case TIMES: printf("TIMES:\n"); break;
    case LESS: printf("LESS THAN:\n"); break;
    case GREATER: printf("GREATER:\n"); break;
    case LESSEQ: printf("LESS EQUAL:\n"); break;
    case GREATEREQ: printf("GREATER EQUAL:\n"); break;
    case EQUALS: printf("EQUALS:\n"); break;
    case NEQUALS: printf("NOT EQUALS:\n"); break;
    case AND: printf("AND:\n"); break;
    case OR: printf("OR:\n"); break;
    case NOT: printf("NOT:\n"); break;
    case ASSIGN: printf("ASSIGN:\n"); break;
    case IF: printf("IF:\n"); break;
    case WHILE: printf("WHILE:\n"); break;
    case PRINT: printf("PRINT:\n"); break;
    case INPUT: printf("INPUT:\n"); break;
    case STATEMENT: printf("STATEMENT:\n"); break;
    default:
      printf("Error, %d not a valid node type.\n", node->type);
      exit(1);
  }

  /* print all children nodes underneath */
  for(i = 0; i < node->num_children; i++) {
    print_tree(node->children[i], tabs + 1);
  }
}

Below is the correct output for the parser on the fact.sl program:

STATEMENT:
    ASSIGN:
        IDENTIFIER: x
        INPUT:
    STATEMENT:
        ASSIGN:
            IDENTIFIER: fact
            VALUE: 1.000000
        STATEMENT:
            WHILE:
                GREATER:
                    IDENTIFIER: x
                    VALUE: 1.000000
                STATEMENT:
                    ASSIGN:
                        IDENTIFIER: fact
                        TIMES:
                            IDENTIFIER: fact
                            IDENTIFIER: x
                    STATEMENT:
                        ASSIGN:
                            IDENTIFIER: x
                            MINUS:
                                IDENTIFIER: x
                                VALUE: 1.000000
            STATEMENT:
                PRINT:
                    IDENTIFIER: fact

This output can be viewed in tree form here.

Complete Interpreter

For the final portion of this assignment, you will take the parse trees you generated and actually execute them.

I would suggest writing two functions to accomplish this task:

eval_expression
Takes a parse tree node, and returns the value of the expression it represents. It will handle the node types that produce a value such as identifiers, values, inputs, and arithmetic / logical operators. For expression nodes that have children, it will need to recursively evaluate those expressions.
eval_statement
Takes a parse tree node, and evaluates the statement it represents. Most of the statement types contain statements and/or expressions as children nodes. These will have to be handled recursively.

To handle input expressions, you will read input from stdin. You can include a generic input prompt if you like. However, for the lexical analyzer to work, stdin was redirected from a file. Now it will have to be re-directed back:


/* save stdin */
FILE* orig_stdin = stdin;
stdin = fopen(argv[1], "r");

/* parse */
yyparse( );

/* restore stdin */
fclose(stdin);
stdin = orig_stdin;

/* now interpret the tree! */

Another aspect of this assignment is keeping track of the values of any variables. To do this, you will maintain a symbol table which keeps track of variable / value pairs. This is often implemented with a hash table, but a simple array will work.

When an assignment statement assigns a value into a new variable, you must insert that variable into the symbol table. When an assignment is made to an existing variable, you must search for the variable in the symbol table and update its value. When an expression references a variable, you must search for its current value to use.

The following demonstrates the output of the fact.sl program:

Input Value: 5
120.000000

Submitting

For each of the three portions of this project, submit all of the necessary files to ifinlay@umw.edu.