The AnyCC Compiler is a project aimed at creating a compiler for a custom programming language. The compiler consists of various components, including a Lex, Parser, and Symbol Table. This README provides an overview of the entire project, including how to build and use the compiler.
The project is organized into the following main components
-
The Lex Component is a component responsible for scanning the source code and generating tokens.
-
The Parser Component is a component responsible for parsing the tokens and building the syntax tree.
-
The Symbol Table Component is a component responsible for managing information about tokens encountered during lexical analysis.
-
Tests Using Google Test
- Unit Tests for public methods if it is not a simple getter/setter and not complex.
- Integration Tests for public methods when needed.
Notes
- The project is not fully tested, but I tried to test as much as I can.
- Analyzer sometimes shifts the column number, but it is not a big problem and doesn't show up in important places.
Ensure you have the necessary dependencies installed on your system
- C++ compiler (supporting C++17)
- CMake
-
Clone the repository
git clone <repository_url> cd anycc
-
Create a build directory and navigate into it
mkdir build cd build
-
Generate build files with CMake
- On Linux
cmake -G "Unix Makefiles" ..
- On Windows
or
cmake -G "MinGW Makefiles" ..
cmake -G "Visual Studio 17 2022" ..
- On Linux
-
Build the project
cmake --build .
The AnyCC Compiler takes three command-line arguments: <rules_file_name>
, <cfg_file_name>
,
and <program_file_name>
. These arguments specify the files containing lexical rules, context-free grammar rules, and
the program to be compiled, respectively.
anycc <rules_file_name> <cfg_file_name> <program_file_name>
Upon successful execution, the compiler will generate an output folder containing various artifacts which includes
- Symbol Table in .md format.
- NFA Graph in .dot, csv, and md formats.
- DFA Graph in .dot, csv, and md formats.
- Minimized DFA Graph in .dot, csv, and md formats.
- tokens.txt file containing the tokens generated by the lexer if getAllTokensAndCreateOutputFile() is called.
- First and Follow sets in .md format.
- LL(1) Parsing Table in .md format.
- Left most derivation in .md format.
- Predictive Parsing Table in .md format.
Here is an example of how to use the AnyCC Compiler
anycc ../input/rules.txt ../input/CFG.txt ../input/program.txt
This example assumes that lex_rules.txt
contains lexical rules, cfg_rules.txt
contains context-free grammar rules,
and source_code.txt
contains the program source code.