Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self-hosted compiler?!!?!?? #116

Open
7 of 14 tasks
Akuli opened this issue Jan 21, 2023 · 10 comments
Open
7 of 14 tasks

self-hosted compiler?!!?!?? #116

Akuli opened this issue Jan 21, 2023 · 10 comments

Comments

@Akuli
Copy link
Owner

Akuli commented Jan 21, 2023

i.e. a Jou compiler that compiles Jou code, and of course can compile itself.

progress:

  • Hello world tokenizes
  • All test files tokenize
  • Hello world parses
  • All test files parse
  • Hello world typechecks
  • All test files typecheck
  • Hello world compiles to an object file
  • All test files compile to object files
  • Hello world runs
  • All test files run correctly
  • Self-hosted compiler emits the correct control flow analysis warnings/errors (e.g. unreachable code, trying to use an undefined variable, missing return statement)
  • Self-hosted compiler can compile itself (?????!!!!!)
  • Self-hosted compiler is the compiler that ships in Windows zip files
  • On Linux, the jou executable that make produces is the self-hosted compiler

For now, the main way to develop the self-hosted compiler is running the ./compare_compilers.sh script (previously named tokenizers.sh). It attempts to compile files with both compilers and compares the results. There are lists of various "known not working" files in self_hosted/, and ./compare_compilers.sh --fix updates them automatically.

@Akuli Akuli changed the title self-hosted compiler?!!?! self-hosted compiler?!!?!?? Jan 23, 2023
@Akuli
Copy link
Owner Author

Akuli commented Feb 24, 2023

I have now written a tokenizer in Jou that is capable of tokenizing the hello world program. Here's how to run it (with the latest jou repository cloned, self_hosted isn't included in the Windows zip files that jou --update uses):

jou -o asd.exe -O1 self_hosted/tokenizer.jou
./asd.exe examples/hello.jou 

Output:

===== Tokens for file "examples/hello.jou" =====

Line 1:
  keyword "from"
  string "stdlib/io.jou"
  keyword "import"
  name "puts"
  newline token (next line has 0 spaces of indentation)

Line 3:
  keyword "def"
  name "main"
  operator '('
  operator ')'
  operator '->'
  keyword "int"
  operator ':'
  newline token (next line has 4 spaces of indentation)

Line 4:
  indent (+4 spaces)

Line 5:
  name "puts"
  operator '('
  string "Hello World"
  operator ')'
  newline token (next line has 4 spaces of indentation)

Line 6:
  keyword "return"
  integer 0
  newline token (next line has 0 spaces of indentation)

Line 7:
  dedent (-4 spaces)
  end of file

You can also get the exactly same output with the original tokenizer written in C:

jou --tokenize-only examples/hello.jou

However, it does not tokenize all test files correctly. The next step would be to make the new tokenizer tokenize all files in exactly the same way as the old tokenizer.

@littlewhitecloud Are you interested in working on the tokenizer? It should be pretty easy to finish it from here, because you can look at the C code in src/tokenize.c or the newly created doc/syntax-spec.md. As I mentioned, the goal is to make it behave exactly like the existing tokenizer written in C.

I made a script tokenizers.sh that attempts to tokenize all Jou files with both tokenizers and checks whether they produce the same output or something different. To run it, you need to:

  1. If you are on Windows, install Git if you haven't installed it yet and open Git Bash.
  2. Clone the repository: git clone https://github.com/Akuli/jou
  3. Go to the cloned repository: cd jou
  4. Ensure that the jou directory from git clone contains the Jou executable (jou.exe on Windows, jou on Linux). On Windows you can copy jou.exe from the latest zip file. On Linux you can run make.
  5. Run ./tokenizers.sh

@littlewhitecloud
Copy link
Contributor

How to work on the tokenizer?

@Akuli
Copy link
Owner Author

Akuli commented Mar 3, 2023

My workflow is typically:

  • Make sure that ./compare_compilers.sh runs with no errors
  • Delete one line from self_hosted/tokenizes_wrong.txt.
  • Run ./compare_compilers.sh again. It will fail, and the error message shows you the differences between how self_hosted/tokenizer.jou (green) and src/tokenizer.c (red) tokenize the file.
  • Edit self_hosted/tokenizer.jou and run ./compare_compilers.sh again until it succeeds.
  • Make a PR

@Akuli
Copy link
Owner Author

Akuli commented Mar 3, 2023

Also, tokenizers.sh was renamed to compare_compilers.sh at some point. Sorry about the confusion :)

@littlewhitecloud
Copy link
Contributor

Ok

@littlewhitecloud
Copy link
Contributor

So self-compiler is just the jou compiler made with jou?

@littlewhitecloud
Copy link
Contributor

Maybe we can write a translator C to jou and we translator the compiler made in C and we can compiler the jou code that after conversion. Then we will get a compiler made in jou and as same as C compiler.

@Akuli
Copy link
Owner Author

Akuli commented Mar 8, 2023

Yes, the self-hosted compiler is just the compiler made with Jou.

I have thought about auto-translating the C code, but I have discovered a lot of unclear error messages and compiler bugs when translating manually. I think it's a good test of how nice the compiler is to use.

@Moosems
Copy link

Moosems commented Dec 9, 2023

Any new progress on this?

@Akuli
Copy link
Owner Author

Akuli commented Dec 9, 2023

Not much. I haven't poured much time into this recently, as I am more focused on advent of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants