Skip to content

PMunch/binaryparse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

binaryparse

This module implements a macro to create binary parsers. The parsers generated reads from a Stream and returns a tuple with each named field. The general format the macro takes is:

[type]<size>: <name>[options]

Where optional fields are in [] brackets and required fields are in <> brackets. Each field has separate meanings, as described in the table below:

Name Description
type This is the type of value found in this field, if no type is specified then it will be parsed as an integer. Supported types are u to get unsigned integers, f for floating point, s for strings, and * for custom parser.
size The size, in bits, of the field to read. For uint and int values from 1 to 64 inclusive are supported. For floats only 32 and 64 are supported. Strings use this field to specify the amount of characters to read into the string. If they don't specify a size they will be read to the first NULL byte (this only applies to strings). When the custom parser type is specified the size field is used to name the custom parser procedure.
name The name of the value, this will be used as the name in the resulting tuple. If the value doesn't need to be stored one can use _ as the name and it will not get a field in the result.
options These will change the regular behaviour of reading into a field. Since they are so different in what they do they are described below instead of in this table.

Many binary formats include special "magic" sequences to identify the file or regions within it. The option = <value> can be used to check if a field has a certain value. If the value doesn't match a MagicError is raised. Value must match the value of the field it checks. When the field is a string type the exact length of the magic string is read, to include a terminating NULL byte use \0 in the string literal.

To read more fields of a certain kind into a sequence you can use the option [[count]] (that is square brackets with an optional count inside). If no count is specified and the brackets left empty it must be the last field or the next field needs to be a magic number and will be used to terminate the sequence. If it is the last field it will read until the end of the stream. As count you can use the name of any previous field, literals, previously defined variables, or a combination. Note that all sequences are assumed to terminate on a byte border, even if given a statically evaluatable size.

Another thing commonly found in binary formats are repeating blocks or formats within the format. These can be read by using a custom parser. Custom parsers technically supports any procedure that takes a Stream as the first argument, however care must be taken to leave the Stream in the correct position. You can also define the inner format with a parser from this module and then pass that parser to the outer parser. This means that you can easily nest parsers. If you need values from the outer parser you can add parameters to the inner parser by giving it colon expressions before the body (e.g the call createParser(list, size: uint16) would create a parser proc (stream: Stream, size: uint16): <return type>). To call a parser use the * type as described above and give it the name of the parser and any optional arguments. The stream object will get added automatically as the first parameter.

When creating a parser you get a tuple with two members, get and put which is stored by a let as the identifier given when calling createParser. These are both procedures, the first only takes a stream (and any optional arguments as described above) and returns a tuple containing all the fields. The second takes a stream and a tuple containing all the fields, this is the same tuple returned by the get procedure and writes the format to the stream.

Example: In lieu of proper examples the binaryparse.nim file contains a when isMainModule() block showcasing how it can be used. The table below describes that block in a bit more detail:

Format Description
u8: _ = 128 Reads an unsigned 8-bit integer and checks if it equals 128 without storing the value as a field in returned tuple
u16: size Reads an unsigned 16-bit integer and names it size in the returned tuple
4: data[size*2] Reads a sequence of 4-bit integers into a data field in the returned tuple. Size is the value read above, and denotes the count of integers to read.
s: str[] Reads null terminated strings into a str field in the returned tuple. Since it's given empty brackets the next field needs to be a magic field and the sequence will be read until the magic is found.
s: _ = "9xC\0" Reads a non-null terminated string and checks if it equals the magic sequence.
*list(size): inner Uses a pre-defined procedure list which is called with the current Stream and the size read earlier. Stores the return value in a field inner in the returned tuple.
u8: _ = 67 Reads an unsigned 8-bit integer and checks if it equals 67 without storing the value.

This file is automatically generated from the documentation found in binaryparse.nim. Use nim doc2 binaryparse.nim to get the full documentation.