GitHub - meisam/Sub-Lisp-Interpreter: An Interpreter for a subset of the Lisp Language

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 172 Commits
.settings		.settings
src/edu/osu/cse/meisam/interpreter		src/edu/osu/cse/meisam/interpreter
test/edu/osu/cse/meisam/interpreter		test/edu/osu/cse/meisam/interpreter
testfiles		testfiles
.classpath		.classpath
.gitignore		.gitignore
.project		.project
Makefile		Makefile
README.txt		README.txt
Runfile		Runfile
gplv3.0.txt		gplv3.0.txt
manifest		manifest

Repository files navigation

Meisam Fathi Salmi
[email protected]

==================================================
Structure of files

./							The root directory
./src						Contains the source code of the program
./test  					Contains the source code of test cases
./bin						When the program is compiled, this directory will contain the compiled java bytecode of the program
./dist  					When the program is built, this directory will contain the executable jar file
./dist/intrpreter.jar		This file is the executable file for the program
./testfiles					Contains the input and expected output for the program testcases
./gplv3.0.txt				GPLv3.0 license under which this program is released 
./Makefile					Makefile for building the program
./manifest					A manifest file that is used for deploying the program into a jar file
./Runfile					The file containing one single line which specifies how this program should be executed


==================================================
Notes

The interpreter functionality is implemented using only java 1.2 libraries. Source code of the interpreter is available in the ./src directory. The make command only builds the interpreter. If you want to execute the testcases, too, the best way is to load the project into an IDE. The source code of this project including the settings for the eclipse IDE is available on github (at git://github.com/meisam/Sub-Lisp-Interpreter.git). Test cases use JUnit 3.8.2. 

==================================================
General Design

The overall design is quite straightforward and can be found in most classical compiler text books. Lexer reads the input characters one by one and generates tokens. Parsers reads tokens generated by the Lexer and builds a parse tree. Interpreter gets the parse trees generated by the interpreter and uses evaluates them.

As the interpreter evaluates the  parse trees in generates S-Expresins. InputProvider (in package edu.osu.cse.meisam.interpreter) and it subclasses provide the input for the interpreter. the InputStreamProvider subclass reads the input from a steam (i.e. any file or stdin). StringInputProvider reads the input from a String. StringInputProvider is mainly used for testing the program and is not used in the real application. Lexer uses one of the subclasses of InputProvider to read the source code. 

==================================================
Testing Methodology

StringInputProvider class is used to feed any arbitrary input to the program to test the program in a non-systematic way. When the input is provided via the StringInputProvider, the output and the behavior of the program should be checked in an ad-hoc manner.

To test the program systematically, a set of tetcases are provided in the ./testfiles directory. Files in this directory are categorized into different categories based on their names extensions. These categories are as followed

1. Any file with a ".input" suffix is used as an input to the program.
2. Any file with a ".lexer" suffix shows the output that should be generated by the lexer.
3. Any file with a ".parser" suffix shows the output that should be generated by the parser. This file includes an infix traversal of all nodes in the tree.
4. Any file with a ".output" suffix shows the output that should be generated by the interpreter (i.e. the program).

The content of all the ".lexer" and ".parser" files always appears in pairs of lines. In Each pair of line, the first line is the type of the output that is expected and the second line is the value. For example, if the lexer expects to see an Open parentheses in the input, then the ".lexer" file will contain two lines, the first line contains OpenParentheses (the type of the Token) and the second line (the actual value of the token)

In addition to this categorization based on the suffixes, all the files in this directory are categorized based on their name.

1. Any file whose name starts with "pass-lex-parse-interpret-" and has a ".input" is a perfectly valid input for the program. For all these file there is one ".output" file which contains the expected output of the interpreter, one ".parser" file which contains the expected output from the parser, and one ".lexer" file which contains the expected output for the lexer.
2. Any file whose name starts with "pass-lex-parse-fail-interpret-" and has a ".input" is a semantically invalid input for the program. These inputs lex and parse perfectly but cannot be interpreted. For all these file there is one ".parser" file which contains the expected output from the parser, and one ".lexer" file which contains the expect output for the lexer.
3. Any file whose name starts with "pass-lex-fail-parse-interpret-" and has a ".input" is an invalid input for the program that contains valid tokens but cannot be parsed (and consequently interpreted). For all these file there is one ".lexer" file which contains the expected output for the lexer.
4. Any file whose name starts with "fail-lex-" and has a ".input" is an invalid input that contains at least one invalid token. These files cause the lexer to fail (and consequently, the parser and interpreter).

When testing the program systematically, ".input" file are read one by one and based on their names, the output of the interpreter, parse and lexer is compared with the corresponding expected output files.

==================================================
Known Issues

When the interpreter is invoked without piping the input from a file, it does not evaluate S-Expressions until the first token from the next S-Expression is available. This is mainly because of this fact that the lexer needs the lookahead character to lex tokens and remove white spaces. Changing the lexer to remove this bug makes its design clumsy and introduces codes for handling special caces (such as an input containing only whitespace).

   
==================================================
Ideas and tools used

To implement this program, ideas from normal compiler textbooks (such as the dragon book) are used.
Some of the testcases are based on the testcases avaiable on the website of the project available at http://www.cse.ohio-state.edu/~rountev/755/projects.html
Tools used include the UNIX coreutils, JUnit and eclipce IDE.
gplv3.0.txt file is a verbatim from the GPL version 3.0 License.