X-Forth is a small specification for a Forth like language meant to be used in compiler projects. You can likely implement the basic X-forth in a couple hours or less.
2 3 + .s
# <2> 2.0 3.0 ok
A more complex program:
a: 10 var
b: 5 var
a @ b @ + . # 15.0
dup * . # 225.0
drop .s # <0> ok
5 6 + 3 4 # <3> 11.0 3.0 4.0 ok
There are ongoing (WIP) tutorials that accompany the Python implementation here.
X-Forth is so named because the X refers to a few aspects and goals of the project:
- X-Forth - where X == implementation language, ex. Go-Forth, Dart-Forth, JS-Forth etc.
- X for eXample - X-Forth is meant to be used as an example project to play with. The various extensions of X-Forth provide smaller sub projects to help you explore your language of choice as well as broader compiler/language features.
- X for eXtended - X-Forth is intentionally designed as a series of extensions to a very primitive language
Similar to the Scheme language, X-forth specifies language add-ons which can be implemented to extend the language, designated as X-<number> (description)
. The basic version of X-Forth is denoted as X-B (Basic)
and contains a bare bones interpreter with very basic mathmatical and logical operators along with a few built in words for display and stack manipulation. The first few extensions are designed to provide easily manageable incremental projects.
I believe concatenative stack based langauges are the most simple languages to write compilers for, even simpler than the more popular lisp, though they may be more unintuitive to program in than a traditional language. You can learn a lot and have fun focusing on features rather than parsing a complex langauge. If you're interested in something that is more similar to a traditional C like language, I'll be working on a sister project to this based on Basic called X-Basic
.
It is very fast to write a Forth and it can even easily bootstrap itself from assembly as seen in Jones Forth
The most basic version of X-Forth. It contains only a few operators and functions, collectively known as words
. The tokenizer is very primitive and effectively makes the language white space significant since all tokens are separated on white space.
- Number - all numbers are treated as double precision floats. X-B does not have a boolean type and false is treated as 0.0 while true is 1.0
- +, -, *, / - The basic math operators
- <, >, ==, != - The basic logic operators
- . (dot) - print and consume the value on the top of the stack
- .s - print the whole stack to the console without consuming it in the format: <count of values> val1 val2 ... ok
- drop - remove the top value from the stack
- dup - duplicate the top value on the stack
- show - print the value on the top of the stack without consuming it
- Undefined token - present this error when we find an invalid token in the source code
- format
ERROR: Undefined token <token>
- format
- Stack underflow - present this error when there aren't enough values on the stack for a word
- math and logic operators require 2 arguments
- . (dot) and dup require 1 argument
- format
ERROR: <word name>: Stack underflow
- what happened!? - X-B contains virtually no error handling at the compiler level with the exception of throwing an error when encountering an Undefined token. There are no stack traces, line numbers in errors or other helpful bits of error context.
- hardcoded source - X-B's source code is hardcoded in the program, see X-1 (External Source Code) to get variable source code
Acquire the source code from an external file. Receive an external file path on the command line, validate its existence and read its contents into a source string variable.
X-Forth files use the .xf
extension
- Source file not found - The passed file path could not be found
- format:
ERROR: <source path>: Source File Not Found
- format:
- numbers - numbers are machine word/register sized floating point.
- booleans - in X-Forth booleans are numeric values where 0 is true and anything else is false. This is odd compared to C languages, but it aligns with the design ofr errors wherein an error can be represented as an enum with the first value 0 representing no error, see X-7 (errors) for more. Instead of thinking in terms of true or false, we think in terms of "did an error occur" where
0
means no error occured and any other number denotes that some error occured and which one
Symbols are are words whose value equates to a hash value, they are similar to symbols found in Ruby. Syntactically a symbol is a word that ends with :
, for example: dup
is the word dup, and when this is encountered it be called immediately but, dup:
is a symbol which instead of calling the dup
word pushes a hash value to the stack. Symbols are important for things like variables, lookup tables and passing words around as values. Symbols can be converted to and from strings if you have implemented X-5 (String Support).
Symbol is the type of symbols and their value should be a tuple (or equivalent) of the word as a string and its hash value.
- X-5 (String support)
- to-string ( symbol -- string ) - converts a symbol into a string without the
:
suffix - to-symbol ( string -- symbol ) - converts a string into a symbol
- to-string ( symbol -- string ) - converts a symbol into a string without the
- Symbol
Support for boolean types. This is mostly cosmetic as they should be treated as numbers under the hood. The words True
and False
should be aliases for 0
and 1
respectively, effectively making them constants. Though they are mostly aliases, it can be helpful to give them a dedicated boolean type even though they are stored numerically so you can differentiate them from numbers for printing.
As mentioned in the X-B section, the use of 0
for true and 1
for false differs from traditional C-like languages because it is designed with error handling in mind. Instead of thinking in terms of true or false we think in terms of asserting whether or not an error occured with 0
meaning no error occured and any other number not only denoting that an error occured, but which one. By convention, user defined errors should be >= 1 <= MAX NUMBER
, while negative values are used for built in errors.
- Bool - can be either True (0) or False (1)
- True
True: 0 con
- constant ==0
- False
False: 1 con
- constant ==1
- to-bool (number -- bool) - converts
0
toTrue
and anything else toFalse
- to-bool (bool -- bool) - no op
- to-number ( bool -- number ) - converts
True
to0
andFalse
to1
- to-number ( number -- number ) - no op
- X-5 (String support)
- to-string ( bool -- string ) - converts a bool into either
"True"
or"False"
- to-string ( bool -- string ) - converts a bool into either
Implements both variables and constants. X-Forth uses var
and con
for variables and constants respectively. As arguments var and con take a value and a symbol.
num: 10 var # store 10 into the variable num
n2: 13 con # define n2 as the constant 13
num: var # num variable created with its value set to 'Undefined'
num 10 ! # write 10 to it
To get the value of a variable use the @
(read) word:
num @ # 10
Like a traditional Forth, the usage of the variable name pushes its address to the stack. con
is the same as var
but internally it makes use of some sort of is_set
flag denoting whether or not the constant has been set. It must be set before use and once set it cannot change.
Unlike variables, constants do not push their address to the stack, and instead just directly push their value, thus you do not use @
with constants.
- Undefined - represents an undefined value, errors will occur if you try to use an undefined value. This is a built in value and must be given its own type which is also
Undefined
- var ( symbol any -- ) - If the top value is not a symbol, but the second value is, then the top value is assigned to the variable and the address is NOT pushed to the stack
- ( symbol -- address) - If the top value is a symbol, then variable's value will be set to
Undefined
.
- ( symbol -- address) - If the top value is a symbol, then variable's value will be set to
- con ( symbol any -- ) - con is exactly like the first overload of var except that its value cannot be changed and the words value rather than its address is pushed to the stack when the word is used
- ! ( address any -- ) - writes the value on the top of the stack to the address
- @ ( address -- any) - reads and pushes the value at the address
value1: 15 var
value2: 30 con
- Address - represents an address, it is similar to a pointer and is an index into the program's memory.
- Undefined - the type of undefined variables
X-5 brings string support. Specifics about the string implementation like: immutability vs mutability, pascal style (length prefixed) vs c style (null terminated) are left to the implementation to decide.
- length ( string -- number ) - gives count of characters in the string.
- append ( string string -- string ) - string concatenation
- to-string ( number -- string ) - converts a number into a string
- should be implemented for each primtive datatype you included in your Forth. So if you've implemented X-3 (Bool), you should also provide a bool conversion that returns
"True"
or"False"
- should be implemented for each primtive datatype you included in your Forth. So if you've implemented X-3 (Bool), you should also provide a bool conversion that returns
- X-2 (Symbols)
- symbol-from-string ( string -- symbol ) - convert a string into its symbol representation. Any string can be converted into a symbol even if the string does not end with ':'. So both
"Pig"
and"Pig:"
convert to the symbolPig:
- symbol-from-string ( string -- symbol ) - convert a string into its symbol representation. Any string can be converted into a symbol even if the string does not end with ':'. So both
X-Forth strings are defined using double quotes "
which allows for the optional implementation of single char types.
X-6 brings include support which allows including external Forth files. It works similarly to C's include
and inserts the source code of another file at the location. include
operates on a string path:
"./files/some_file.xf" include
include
should by some means cache the file it includes so a file is only ever included once. Though it looks like a normal x-forth word it is more like a compiler command and should be executed (and removed from the token stream) before interpretation.
- include (string -- ?) - extends the token stream with tokens from the passed file
- note that we have ? as the return value because the state of the stack depends on the file included. If that file pushed a value as the last statement then that would apply to the current program
Blocks are a unique construct in X-Forth and are similar to LISP lists. Blocks are denoted with square brackets []
and can contain any tokens: [ 1 2 ]
, [ 1 2 + ]
, [ "hello" some-word ]
. Blocks are just lists of tokens which can be passed around and operated upon by words. Blocks serve as the basis for higher level constructs like custom words and control flow.
Blocks have multiple uses and provide different functionality based on the words used to operate on them:
- lists - blocks can be used as lists of data, similar to lists in Python or JavaScript, here is a block (list) of numbers :
[ 1 2 3 ]
- tables - blocks can be used as lookup tables as well:
[ 0 10 1 20 ] 0 get # 10
- If you've implemented X-2 (Symbols) you can get JavaScript like syntax:
[ a: 10 b: 20 ] a: get # 10
- If you've implemented X-2 (Symbols) you can get JavaScript like syntax:
- custom words - blocks in combination with X-4 (Variables) for the basis for custom words (see X-6)aka functions:
add-five: [ 5 + ] con
- anonymous functions - you can pass blocks around and execute them using the
call
andapply
words:
10 [ 5 + ] call # 15, call operates directly on the stack taking only a single block as an argument
[ 5 + ] [ 10 ] apply # 15, apply takes a block of arguments which will be pushed to the stack first and a block of instructions
- named functions - If you define a block variable with
con
then the use of the constant name calls that block immediately:
add-five: [ + 5 ] con
10 add-five . # 15.0
Block variables act slightly differently depending on whether they are defined with var
or con
. If defined with var
a variable holding a block just pushes the block to the stack like any other value, however if the block was defined with con
, then in addition to pushing the block to the stack it is immediately called which is what allows user defined words to work. If you use a block which is a var
, then you will need to also use call
or apply
to execute the block.
- Block
- call ( block-body -- ? ) - executes the tokens inside the passed block, applying its effect to the stack:
2 [ 3 + ] call # 5.0
- apply ( block-body block-args -- ?) - the same as call, except instead of getting its args directly from the stack, its top value is a block containing arguments that are pushed to the stack before the body is called:
[ 3 + ] [ 2 ] apply # 5.0
TODO
TODO
X-10 brings the ability to add documentation to words with stack effect comments: add-five: [ 5 + ] ( n -- n "adds 5 to n" ) con
. This stack comment is a special type of comment that it associated with the word and can be looked up with the words help
, sig
TODO
Add a Read Eval Print Loop. TODO
X-Forth uses #
for single line comments like Python. You can safely ignore any tokens beginning with #
.
X-Forth uses a simple event loop in which the X-Forth interpreter runs two (or more) execution contexts (threads, coroutines or otherwise) with two (or more) interpreters, one for the main program denoted the Main Context
, and another (or multiple) for the non bocking execution of asynchronous tasks, called the Background Context
. The Background Context
should have an event queue which the Main Context
can push jobs to and which the Background Context
will constantly poll
Memory is introduced in X-4 (Variables) to implement user defined variables, but it is also used heavily for the X-17 Graphics extension and potential others. The size of the memory (and graphics) are up to the user. The memory should be a contigous array or linked list of contiguous blocks with portions dedicated to variables (X-4), keyboard input (X-16), screen pixels (X-17), and a data section (X-18)
Allows for Keyboard input, both block and non-blocking
X-16 brings a simple pixel buffer based graphics API inspired by BASIC. The program memory array is used heavily for the graphics api and a portion of it is dedicated to the screen memory. The size of the screen is up to the implementation, recommended sizes are: 256x240
TODO
With binds values on the stack to variable names temporarly for the lifetime of the top block passed to it: 10 5 [ a: b: ][ + ] with # 15
. After execution of its block, the variables disappear and do not pollute the global scope.
This extension adds more stack words that are normally seen in Forths.
- over ( a b -- a b a ) - copies the second stack value to the top of the stack
- swap ( a b -- b a ) - switch positions of the top two stack values
- rot ( a b c -- c b a ) - swap the first and third values on the stack, rotating them around the second value. Note that this word behaves differently than the rot word seen in traditional Forths
- grab ( a+ n -- a#n ) - given some values on the stack and a number at the top, copy value at index n (starting at 0) to the top.
- with ( block block -- ? )
Stores X-Forth into an efficient Bytecode format. Values should be Nan Boxed or Pointer Tagged. TODO
The X-Forth Machine is a virtual Forth machine, meant to be bootstrapped from native code such as Assembly, C for desktop or JavaScript or WASM in the browser. The XFVM (X-Forth Virtual Machine) posesses a fixed amount of memory used for variables, input/output, graphics and data storage. TODO
There are several implementations of X-Forth created along with the project in varying stages of completion and extension implementation. All implementations listed implement at least X-B:
- phillvancejr / x-forth-dart
- X-B (Basic)
- phillvancejr / x-forth-go TODO
- X-B (Basic)
- phillvancejr / x-forth-py
- X-B (Basic)
- X-1 (External/Variable Source Code)
- X-2 (Symbols)
- X-3 (Bools)
- X-4 (Variables)
- X-5 (Strings)
- X-6 (Includes)
If you create an X-Forth please let me know, I'd love to link to it! Any language is great!
This project was created to allow me to play with and compare different languages and to explore different aspects of programming languages and compilers like: interpreters, compiling to native code, assembly, virtual machines and more. Even if you're not interested in Forth, you might be interested in the ways similar things are implemented in the different languages used here.