-
Notifications
You must be signed in to change notification settings - Fork 122
How Numbas' Judged Mathematical Expression system works
This document will describe the various algorithms used in the process of interpreting, manipulating and evaluating mathematical expressions written in JME syntax. There will also be some discussion of the design decisions made about data structures and interfaces.
Numbas.jme.compile
takes a string representing a mathematical expression and returns a syntax tree.
Compilation consists of:
-
tokenize
the string (convert string of characters to a list of tokens representing syntactical elements) -
shunt
the tokens into an evaluation tree corresponding to the expression -
typecheck
all operations in the tree, if required
Tokenizing an expression string consists of scanning across it, recognising substrings as tokens, and storing them in a list in the same order as they were found.
The algorithm: (parameter: expr
– a string representing the expression)
While expr
is not the empty string:
- trim whitespace off the beginning of
expr
- test the string against regular expressions for each type of token. They match only from the beginning of the string, and are greedy (they match as long a string as possible)
- create a token object of the appropriate type, and add in any implicit tokens (e.g. a multiplication operation between adjacent variable names)
- chop the matched string off the start of
expr
Every token object has a type
property. Additionally, data types (string, number, etc.) have a value
property, while operations, functions and variables have a name
property.
The objects used to represent tokens are the same as the data objects used when evaluating expressions, in order to avoid a conversion step.
The recognised token types are these:
-
number
: a number, real or complex. -
string
: a string of characters. -
boolean
: true or false. -
list
: an array of data objects of any type. -
range
: a closed range of numbers, either continuous or discrete. -
name
: a variable name. Function names are first tokenised as this type of object, before the shunt phase. -
function
: a function. This type can not be produced by the tokenizer, but is created during the shunt phase. -
op
: a binary or unary operation. -
special
: a special character. (I think this is just a relic from i-assess, in order to produce things like greek symbols. MathJax and unicode are used now instead) -
conc
: a virtual operation representing concatenation of two special characters. Again, no longer used. -
(
)
,
[
]
: punctuation characters.
The constructors for these data types are stored in Numbas.jme.types
Here is a (potentially incomplete) list of the extra fiddling that tokenize
does with expressions, to be more accepting of student input:
- extraneous whitespace is removed, so you can put as many spaces between operators and operands as you like
- right bracket followed by a number, eg
)3
, has a multiplication operator inserted in the middle:)*3
- number or right bracket followed by a left bracket has a multiplication operator inserted in the middle
-
+
preceded by(
,
or another operator, or at the start of an expression, is interpreted as unary plus. - the name of a builtin function prepended with a single-letter variable name, eg
xsin
, is interpreted as a multiplication, likex*sin
- built-in constants
e
,pi
andi
are replaced with their respective number values.
Syntactical errors are caught at this stage — invalid characters, etc. cause the tokenising process to fail. Grammatical errors, such as unmatched brackets, are not caught yet.
Once the expression has been converted into a list of tokens, it is “shunted” into an abstract syntax tree.
The algorithm began life as the shunting yard algorithm, which produces an expression in RPN, but I eventually decided it was better just to go straight to a tree structure.
The output of the shunt
algorithm is an “expression” object with two properties: tok
and args
.
tok
is the corresponding token object created in the tokenising step, so either a primitive data type (eg number), or an operation or function.
args
is a list of arguments, if the token is an operation or function. The arguments are also expression objects, of the same form.
The final step is to check whether the expression can be evaluated. This consists of looking at every operation and function call in the expression, checking that there is a corresponding definition taking the right number and type of arguments.
As a side effect of this process, the relevant javascript code is attached to the nodes on the tree, so they don’t have to be looked up for each evaluation.
If there are free variables in the expression, they are marked as unbound, which makes substituting in values a bit quicker when the expression finally does get evaluated.
The typecheck
step is optional – if the compiled expression is only ever going to be displayed and not evaluated, it is unnecessary.
The function Numbas.jme.substituteTree
takes a compiled JME syntax tree and a dictionary of variable values to substitute in, and returns a copy of the tree with the given values substituted in appropriately.
The values to be substituted should be either syntax trees themselves, or single data objects created from Numbas.jme.types
.
A deep copy of the tree is made, so that the original can be re-used, and then the copy is traversed depth-first. Whenever an object of type name
is encountered, the corresponding value from the dictionary of values is inserted at that point.
If the dictionary does not contain a value for a particular variable, the operation fails, unless the parameter allowUnbound
is true
, in which case the name
token is left in place.
Numbas.jme.evaluate
takes either a string expression or a syntax tree, and dictionaries of variables and functions to substitute into the expression.
If the given expression is a string, it is compiled to a syntax tree. The variables
and functions
parameters can be omitted if not needed.
First of all, the given variables are subsituted into the tree, as described above. Then, Numbas.jme.bind
is run on the tree to attach any new functions or operations substituted in.
Finally, the tree is evaluated to a single value. The token at the top of the tree is inspected, and what happens depends on what type it is.
- Tokens of type
number
,boolean
andrange
are returned without modification. -
string
tokens are passed throughNumbas.jme.contentsubvars
before being returned. Sincecontentsubvars
applies the Textile parser, you might want to remove this step if you just want to deal with mathematical expressions. -
list
tokens have each item evaluated before being returned. - If the top of the tree is an
op
or afunction
token, the relevant bit of code is evaluated, and should return a token object fromNumbas.jme.types
. Note that the arguments to the operation are not evaluated in advance; this allows for “lazy evaluation” of things like conditional statements or recursive algorithms.
The function which evaluates an operation should take three arguments: args
, variables
and functions
. The args
are the arguments of the operation, and the other two parameters are needed in case some string substitution needs to happen.
Functions are free to evaluate all, some, or none of their arguments, using Numbas.jme.evaluate
.
The JME system has been designed to be fairly easily customised. In particular, new functions can be added very easily. It should also be quite straightforward to add new data types.
Exam authors can define new functions in the .exam file using JME syntax. These are passed through to the compile
and evaluate
steps by the Numbas runtime.
When writing an extension, however, it’s easier just to write a function in pure javascript and add it to the list of builtin functions. To do that, write your function in javascript, and then create a funcObj
object to go with it:
function addtwo(n) { return n+2; }
new Numbas.jme.funcObj('addtwo', [TNum], TNum, addtwo);
The arguments for the funcObj
constructor are:
-
name
– the name used to call the function in JME expressions -
intype
– an array ofNumbas.jme.types
constructors, saying which types the arguments to the function must be. You can use the string'?'
instead of a type constructor if an argument can be of any type, or if you want to do your own typechecking. -
outcons
– the constructor to use on the value your function returns. Again, this should be one of the constructors fromNumbas.jme.types
. -
fn
– your javascript function
The constructors for data types are stored in Numbas.jme.types
. They’re just normal javascript object constructors (see the MDN page on the new operation), with an extra property called type
. Normally, a data element’s value is stored in a property called value
, though this is not required: for example, TFunc
elements have a property called name
.
Once you’ve created a new data type, you can add functions to construct and use elements of that type. If you’re adventurous, you could add some code to the tokeniser to parse literals of your new type.