Sloth Lexer

Due: February 5

Introduction

This is the first phase of a multi-part assignment in which you will write an interpreter for a simple, but complete programming language called Sloth (Simple Language of Tiny Heft). In this part of the assignment you will build a lexical analyser (lexer) which is responsible for breaking the program into tokens. We will use the ANTLR system for this.

Sloth

Sloth has the following features:

Dynamic typing with four data types: integers, real numbers, booleans, and strings
Variables, assignment statements and the usual mathematical, boolean, and comparison operators
While loops, if, and else statements
Output functions for printing and input functions for reading the four data types.

Test Programs

Program	Description
hello.sl	Prints hello world
simple.sl	Prints 1 + 1.
area.sl	Inputs the radius of a circle and computes its area.
fact.sl	Inputs a number and computes the factorial of that number.
fibs.sl	Inputs a number N and displays the first N Fibonacci numbers.
minmax.sl	Inputs a number N, then inputs N numbers, then displays the ones with the least and greatest values.
guess.sl	A guess the number game
table.sl	Prints a 2D multiplication table
acid.sl	A program that tests out most aspects of the language, and simply outputs 42.

Lexical Analysis

For this portion of the assignment, you will write a lexer for Sloth. The goal of the lexer is to break code up into individual tokens as discussed in class. The lexical details of the language are described here, and you will write a lexer using ANTLR.

Sloth includes the following keywords which must be recognized by the lexer:
- begin
- end
- if
- then
- else
- while
- do
- print
- println
- readInt
- readReal
- readBool
- readString
- true
- false
Sloth includes the following symbols which must be recognized by the lexer:
- +
- -
- /
- *
- <
- >
- <=
- >=
- ==
- !=
- &&
- ||
- !
- ;
- :=
- (
- )
Identifiers in this language will start with a letter or underscore, followed by any number of letters, digits or underscores.
Integer values consist of a string of one or more digits, which may begin with a minus sign.
Real values consist of 0 or more digits, a decimal point, then one or more digits. They also can optionally begin with a minus sign. Scientific notation is not supported in Sloth.
String values begin with a double quote symbol, then contain any number of characters which are not a double quote, then finally the closing double quote symbol.
Comments in this language start with the % symbol and continue until a line break is encountered. Comments should be discarded by the lexer.
White space should also be discarded by the lexer.

Building the Lexer

You should name the lexer file as "SlothLexer.g4". Include the following line at the top of your lexer file:


grammar SlothLexer;

Include all of the needed lexer rules described above. Remember that lexer rules need to use names in all capitals.

In order to test our lexer, we do need to provide ANTLR with at least one grammar rule. To accomplish this, include the following "dummy rule" after the lexer rules:


program:;

You can use the following build and clean scripts to build and clean (remove generated files) the lexer. These rely on having completed the ANTLR setup instructions.

Testing

We can test the lexer using the ANTLR test rig, which we aliased as "grun". The -tokens flag can be passed to simply print the tokens recogized by your lexer from the input program. For instance, to test your lexer on the fact.sl program, you could use the following:


grun SlothLexer program -tokens < fact.sl

This should produce output like the following:


[@0,47:51='print',<'print'>,4:0]
[@1,52:52='(',<'('>,4:5]
[@2,53:77='"Please enter a number: "',<STRINGVAL>,4:6]
[@3,78:78=')',<')'>,4:31]
[@4,79:79=';',<';'>,4:32]
[@5,81:81='x',<ID>,5:0]
[@6,83:84=':=',<':='>,5:2]
[@7,86:92='readInt',<'readInt'>,5:5]
[@8,93:93='(',<'('>,5:12]
[@9,94:94=')',<')'>,5:13]
[@10,95:95=';',<';'>,5:14]
[@11,98:104='counter',<ID>,7:0]
[@12,106:107=':=',<':='>,7:8]
[@13,109:109='x',<ID>,7:11]
[@14,110:110=';',<';'>,7:12]
[@15,112:115='fact',<ID>,8:0]
[@16,117:118=':=',<':='>,8:5]
[@17,120:120='1',<INTVAL>,8:8]
[@18,121:121=';',<';'>,8:9]
[@19,134:138='while',<'while'>,11:0]
[@20,140:146='counter',<ID>,11:6]
[@21,148:148='>',<'>'>,11:14]
[@22,150:150='1',<INTVAL>,11:16]
[@23,152:153='do',<'do'>,11:18]
[@24,155:159='begin',<'begin'>,11:21]
[@25,165:168='fact',<ID>,12:4]
[@26,170:171=':=',<':='>,12:9]
[@27,173:176='fact',<ID>,12:12]
[@28,178:178='*',<'*'>,12:17]
[@29,180:186='counter',<ID>,12:19]
[@30,187:187=';',<';'>,12:26]
[@31,193:199='counter',<ID>,13:4]
[@32,201:202=':=',<':='>,13:12]
[@33,204:210='counter',<ID>,13:15]
[@34,212:212='-',<'-'>,13:23]
[@35,214:214='1',<INTVAL>,13:25]
[@36,215:215=';',<';'>,13:26]
[@37,217:219='end',<'end'>,14:0]
[@38,231:237='println',<'println'>,17:0]
[@39,238:238='(',<'('>,17:7]
[@40,239:239='x',<ID>,17:8]
[@41,241:241='+',<'+'>,17:10]
[@42,243:248='"! = "',<STRINGVAL>,17:12]
[@43,250:250='+',<'+'>,17:19]
[@44,252:255='fact',<ID>,17:21]
[@45,256:256=')',<')'>,17:25]
[@46,257:257=';',<';'>,17:26]
[@47,260:259='<EOF>',<EOF>,19:0]

Submitting

Submit your SlothLexer.g4 file to Canvas for this assignment.