Home CPSC 401

Sloth Interpreter Assignment

 

Due Dates

PortionDate
Lexical AnalysisFebruary 5
Syntax AnalysisMarch 10
Complete InterpreterApril 9

 

Introduction

For this assignment, you will write a complete interpreter for a programming language called Sloth (Simple Language Of Tiny Heft). The language has no functions, and the only data type is the double-precision floating point number. However, it does have variables, mathematical expressions and common control structures.

This assignment is broken into three portions: the lexical analyzer, the syntax analyzer and the interpreter. The lexical analyzer takes textual source code and reduces it to a stream of tokens. The syntax analyzer takes the stream of tokens and parses it into a tree structure. The interpreter then takes the parse tree and interprets it to execute the program.


 

Test Programs

ProgramDescription
simple.slPrints 1 + 1.
area.slInputs the radius of a circle and computes its area.
fact.slInputs a number and computes the factorial of that number.
fibs.slInputs a number N and displays the first N Fibonacci numbers.
minmax.slInputs a number N, then inputs N numbers, then displays the ones with the least and greatest values.
acid.slA program that tests out most aspects of the langauge, and simply outputs 42.

 

Lexical Analysis

For this portion of the assignment, you will write a lexer for Sloth. The goal of the lexer is to break code up into individual tokens as discussed in class. The lexical details of the language are described here, and you will write a lexer using flex.

Identifiers in this language will start with a letter or underscore, followed by any number of letters, digits or underscores.

The only number types in Sloth are double-precision floating point. However, numbers can appear in programs as integers in decimal format. These are simply a sequence of one or more digits.

Floating point numbers consist of zero or more digits, a ".", then one or more digits. There is no scientific notation in Sloth.

Comments in this language start with the % symbol and continue until a line break is encountered.

Your lexer will read programs in this Sloth code and return the tokens it encounters. The following table lists all of the tokens in the language with a token code for each.

Token Code
Identifier 100
Value 101
+ 102
- 103
/ 104
* 105
< 106
> 107
<= 108
>= 109
== 110
!= 111
&& 112
|| 113
! 114
; 115
:= 116
( 117
) 118
begin 119
end 120
if 121
then 122
else 123
while 124
do 125
print 126
input 127

If you see a character that is not matched by any of these tokens, you should print an error message and exit the program.

Your program should take one command line parameter, so that it can be invoked as:

$ lexer program.sl

You should check that the user passed an argument and that the file exists.

When calling the yylex function, it assumes that input is from stdin. To use another file as input, you must assign it to stdin before calling yylex:


stdin = fopen(argv[1], "r");

Tips:

The lexer output of "fact.sl" is given below:

100
116
127
115
100
116
101
115
124
100
107
101
125
119
100
116
100
105
100
115
100
116
100
103
101
115
120
126
100
115
0

 

Syntax Analysis

For this portion of the assignment, you will write a parser for Sloth that will use the lexer you wrote.

The syntactic details of the language are described here, and you will write a parser using bison.

Programs in Sloth consist of zero or more statements.

Each statement can be:

Notes:

Expressions are the other major part of Sloth syntax. An expression consists of literal values, identifiers and input expressions combined with parenthesis and the following operators (in order of precedence):

All operators are binary except for ! which is unary. The operators each have the same meaning as they do in C-based languages.

Remember to make your expression grammar unambiguous and enforce the precedence rules as discussed in class.

Your program will parse Sloth programs using Bison. If an input program does not fit the grammar you specify, Bison will produce an error.

Note that having one shift/reduce error is expected! As discussed in class, this is caused by the dangling else issue.

Bison will trigger the actions as it is parsing each grammar rule. Inside of these actions, your parser should construct a parse tree that represents the program.

Trees will be constructed with one child for each sub-portion of a node. For example, the following code:

a := a + 1;

if a > 10 then
  a := 0;

Would be represented with the following tree:

You can use the Node structure, make_node and attach_node functions to build the tree:


#define ID_SIZE 100
#define MAX_CHILDREN 3

/* a tree node definition */
struct Node {
  /* the type of the node */
  int type;

  /* the value of the node if it can have one */
  double value;

  /* the id of the node (used for identifiers only) */
  char id[ID_SIZE];

  /* at most three children nodes */
  int num_children;
  struct Node* children[MAX_CHILDREN];
};


/* creates a new node and returns it */
struct Node* make_node(int type, double value, char* id) {
  int i;

  /* allocate space */
  struct Node* node = malloc(sizeof(struct Node));

  /* set properties */
  node->type = type;
  node->value = value;
  strcpy(node->id, id);
  node->num_children = 0;
  for(i = 0; i < MAX_CHILDREN; i++) {
    node->children[i] = NULL;
  }

  /* return new node */
  return node;
}

/* attach an existing node onto a parent */
void attach_node(struct Node* parent, struct Node* child) {
  /* connect it */
  parent->children[parent->num_children] = child;
  parent->num_children++;
  assert(parent->num_children <= MAX_CHILDREN);
}

Each parser action should construct the tree that represents that portion of the program. The tree should be assigned to the $$ variable. Don't forget to attach any subtree nodes.

To illustrate this, the rule below matches an if statement in Sloth (where IF and THEN are tokens from the lexer). The action creates an IF node, (the value and id are only used for literals and identifiers respectively). It then attaches the expression node as the first child, and the statement node as the right child.


IF expr THEN stmt {
  $$ = make_node(IF, 0, "");
  attach_node($$, $2);
  attach_node($$, $4);
}

The first rule of your grammar should assign the complete parse tree to a variable that you can use in your main program.

To test your program, call yyparse in main to parse the program and produce the parse tree. Then call this function to print the parse tree to the screen:


void print_tree(struct Node* node, int tabs) {
  int i;

  /* base case */
  if(!node) return;

  /* print leading tabs */
  for(i = 0; i < tabs; i++) {
    printf("    ");
  }

  switch(node->type) {
    case IDENTIFIER: printf("IDENTIFIER: %s\n", node->id); break;
    case VALUE: printf("VALUE: %lf\n", node->value); break;
    case PLUS: printf("PLUS:\n"); break;
    case MINUS: printf("MINUS:\n"); break;
    case DIVIDE: printf("DIVIDE:\n"); break;
    case TIMES: printf("TIMES:\n"); break;
    case LESS: printf("LESS THAN:\n"); break;
    case GREATER: printf("GREATER:\n"); break;
    case LESSEQ: printf("LESS EQUAL:\n"); break;
    case GREATEREQ: printf("GREATER EQUAL:\n"); break;
    case EQUALS: printf("EQUALS:\n"); break;
    case NEQUALS: printf("NOT EQUALS:\n"); break;
    case AND: printf("AND:\n"); break;
    case OR: printf("OR:\n"); break;
    case NOT: printf("NOT:\n"); break;
    case ASSIGN: printf("ASSIGN:\n"); break;
    case IF: printf("IF:\n"); break;
    case WHILE: printf("WHILE:\n"); break;
    case PRINT: printf("PRINT:\n"); break;
    case INPUT: printf("INPUT:\n"); break;
    case STATEMENT: printf("STATEMENT:\n"); break;
    default:
      printf("Error, %d not a valid node type.\n", node->type);
      exit(1);
  }

  /* print all children nodes underneath */
  for(i = 0; i < node->num_children; i++) {
    print_tree(node->children[i], tabs + 1);
  }
}

Below is the correct output for the parser on the fact.sl program:

STATEMENT:
    ASSIGN:
        IDENTIFIER: x
        INPUT:
    STATEMENT:
        ASSIGN:
            IDENTIFIER: fact
            VALUE: 1.000000
        STATEMENT:
            WHILE:
                GREATER:
                    IDENTIFIER: x
                    VALUE: 1.000000
                STATEMENT:
                    ASSIGN:
                        IDENTIFIER: fact
                        TIMES:
                            IDENTIFIER: fact
                            IDENTIFIER: x
                    STATEMENT:
                        ASSIGN:
                            IDENTIFIER: x
                            MINUS:
                                IDENTIFIER: x
                                VALUE: 1.000000
            STATEMENT:
                PRINT:
                    IDENTIFIER: fact

This output can be viewed in tree form here.


 

Complete Interpreter

For the final portion of this assignment, you will take the parse trees you generated and actually execute them.

I would suggest writing two functions to accomplish this task:

To handle input expressions, you will read input from stdin. You can include a generic input prompt if you like. However, for the lexical analyzer to work, stdin was redirected from a file. Now it will have to be re-directed back:


/* save stdin */
FILE* orig_stdin = stdin;
stdin = fopen(argv[1], "r");

/* parse */
yyparse( );

/* restore stdin */
fclose(stdin);
stdin = orig_stdin;

/* now interpret the tree! */

Another aspect of this assignment is keeping track of the values of any variables. To do this, you will maintain a symbol table which keeps track of variable / value pairs. This is often implemented with a hash table, but a simple array will work.

When an assignment statement assigns a value into a new variable, you must insert that variable into the symbol table. When an assignment is made to an existing variable, you must search for the variable in the symbol table and update its value. When an expression references a variable, you must search for its current value to use.

The following demonstrates the output of the fact.sl program:

Input Value: 5
120.000000

 

Submitting

For each of the three portions of this project, submit all of the necessary files to ifinlay@umw.edu.

Copyright © 2025 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.