diff --git a/_books/ion-1-1/src/SUMMARY.md b/_books/ion-1-1/src/SUMMARY.md index 0ecfd7d..7c932e4 100644 --- a/_books/ion-1-1/src/SUMMARY.md +++ b/_books/ion-1-1/src/SUMMARY.md @@ -3,6 +3,8 @@ - [Introduction](./introduction.md) - [What's new](./whats_new.md) - [Macros by example](macros_by_example.md) +- [Modules](modules.md) + - [System Module](modules/system_module.md) - [Binary encoding](binary/encoding.md) - [Encoding primitives](binary/primitives.md) - [`FlexUInt`](binary/primitives/flex_uint.md) diff --git a/_books/ion-1-1/src/binary/e_expressions.md b/_books/ion-1-1/src/binary/e_expressions.md index 4c5c03f..fe33614 100644 --- a/_books/ion-1-1/src/binary/e_expressions.md +++ b/_books/ion-1-1/src/binary/e_expressions.md @@ -98,10 +98,29 @@ Address : 142918
-> This section was obsolete and needs to be rewritten. +> This section needs more details.
+The opcode is `0xEE`. The macro address is given as a trailing [FlexUInt](primitives/fixed_uint.md) with no bias. + + +## System Macro Invocations + +E-expressions that invoke a [system macro](../modules/system_module.md#system-macro-addresses) can be encoded using the `0xEF` opcode followed by a _positive_ 1-byte `FixedInt`. +(Negative values are used for [system_symbols](values/symbol.md#system-symbols).) + +##### Encoding of the system macro `values` +``` +┌──── Opcode 0xEF indicates a system symbol or macro invocation +│ ┌─── FixedInt 0 indicates macro 0 from the system macro table +│ │ +EF 00 +``` + +In addition, system macros MAY be invoked using any of the `0x00`-`0x5F` or `0xEE` opcodes, provided that the macro being invoked has been given an address in user macro address space. + + ## Tagged E-expression Argument Encoding When a macro parameter has a tagged type, the encoding of that parameter's corresponding argument in an E-expression diff --git a/_books/ion-1-1/src/binary/primitives/flex_sym.md b/_books/ion-1-1/src/binary/primitives/flex_sym.md index 16f8a50..ca220aa 100644 --- a/_books/ion-1-1/src/binary/primitives/flex_sym.md +++ b/_books/ion-1-1/src/binary/primitives/flex_sym.md @@ -9,14 +9,7 @@ A `FlexSym` begins with a [`FlexInt`](#flexint); once this integer has been read No more bytes follow. * **less than zero**, its absolute value represents a number of UTF-8 bytes that follow the `FlexInt`. These bytes represent the symbol’s text. -* **exactly zero**, another byte follows that is an [opcode](opcodes.md). The `FlexSym` parser is not responsible for -evaluating this opcode, only returning it—the caller will decide whether the opcode is legal in the current context. -Example usages of the opcode include: - * Representing SID `$0` as `0xA0`. - * Representing the empty string (`""`) as `0x90`. - * When used to encode a struct field name, the opcode can invoke a macro that will evaluate to a struct whose key/value -pairs are spliced into the parent [struct](../values/struct.md). - * In a <>, terminating the sequence of `(field name, value)` pairs with `0xF0`. +* **exactly zero**, another byte follows that is a [`FlexSymOpCode`](#flexsymopcode). #### `FlexSym` encoding of symbol ID `$10` ``` @@ -40,13 +33,40 @@ pairs are spliced into the parent [struct](../values/struct.md). negative 5 ``` +### `FlexSymOpCode` + +`FlexSymOpCode`s are a combination of [system symbols](../../modules/system_module.md#system-symbols) and a subset of the general [opcodes](../opcodes.md). +The `FlexSym` parser is not responsible for evaluating a `FlexSymOpCode`, only returning it—the caller will decide whether the opcode is legal in the current context. + +Example usages of the `FlexSymOpCode` include: +* Representing SID `$0` +* Representing system symbols + * Note that the empty symbol (i.e. the symbol `''`) is now a system symbol and can be referenced this way. +* When used to encode a struct field name, the opcode can invoke a macro that will evaluate + to a struct whose key/value pairs are spliced into the parent [struct](../values/struct.md). +* In a [delimited struct](../values/struct.md#delimited-encoding), terminating the sequence of `(field name, value)` pairs with `0xF0`. + + +| OpCode Byte | Meaning | Additional Notes | +|:---------------:|:----------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `0x00` - `0x5F` | E-Expression | May be used when the `FlexSym` occurs in the field name position of any struct | +| `0x60` | Symbol with unknown text (also known as `$0`) | | +| `0x61` - `0xDF` | System SID (with `0x60` bias) | While the range of `0x61` - `0xDF` is reserved for system symbols, not all of these bytes correspond to a system symbol. See [system symbols](../../modules/system_module.md#system-symbols) for the list of system symbols. | +| `0xEE` | _TODO: Add meaning_ | | +| `0xEF` | E-Expression invoking a system macro | May be used when the `FlexSym` occurs in the field name position of any struct | +| `0xF0` | Delimited container end marker | May only be when the `FlexSym` occurs in the field name position of a delimited struct | +| `0xF5` | Length-prefixed macro invocation | May be used when the `FlexSym` occurs in the field name position of any struct | + + + #### `FlexSym` encoding of `''` (empty text) using an opcode ``` ┌─── The leading FlexInt ends in a `1`, │ no more FlexInt bytes follow. │ -0 0 0 0 0 0 0 1 10010000 +0 0 0 0 0 0 0 1 01110111 └─────┬─────┘ └───┬──┘ - 2's comp. opcode 0x90: - zero empty symbol + 2's comp. FixedInt 0x77, + zero System SID 23 + (the empty symbol) ``` diff --git a/_books/ion-1-1/src/binary/values/symbol.md b/_books/ion-1-1/src/binary/values/symbol.md index ee490ed..2b2f602 100644 --- a/_books/ion-1-1/src/binary/values/symbol.md +++ b/_books/ion-1-1/src/binary/values/symbol.md @@ -61,3 +61,22 @@ address that is decoded is biased by the number of addresses that can be encoded | `0xE1` | 0 to 255 | 0 | | `0xE2` | 256 to 65,791 | 256 | | `0xE3` | 65,792 to infinity | 65,792 | + + +### System Symbols + + + +System symbols (that is, symbols defined in the system module) can be encoded using the `0xEF` opcode followed by a _negative_ 1-byte `FixedInt`. +(Positive values are used for [system macro invocations](../e_expressions.md#system-macro-invocations).) + +Unlike Ion 1.0, symbols are not required to use the lowest available SID for a given text, and system symbols +_MAY_ be encoded using other SIDs. + +##### Encoding of the system symbol `$ion` +```plain +┌──── Opcode 0xEF indicates a system symbol or macro invocation +│ ┌─── FixedInt -1 indicates system symbol 1 +│ │ +EF FF +``` diff --git a/_books/ion-1-1/src/modules.md b/_books/ion-1-1/src/modules.md new file mode 100644 index 0000000..274d9ef --- /dev/null +++ b/_books/ion-1-1/src/modules.md @@ -0,0 +1,5 @@ +# Ion 1.1 Modules + +Modules are a generalization of symbol tables found in Ion 1.0. + + diff --git a/_books/ion-1-1/src/modules/system_module.md b/_books/ion-1-1/src/modules/system_module.md new file mode 100644 index 0000000..1e4a00d --- /dev/null +++ b/_books/ion-1-1/src/modules/system_module.md @@ -0,0 +1,434 @@ +## The System Module + +The symbols and macros of the system module `$ion` are available everywhere within an Ion document, +with the version of that module being determined by the spec-version of each segment. +The specific system symbols are largely uninteresting to users; while the binary encoding heavily +leverages the system symbol table, the text encoding that users typically interact with does not. +The system macros are more visible, especially to authors of macros. + +This chapter catalogs the system-provided symbols and macros. +The examples below use unqualified names, which works assuming no other macros with the same name are in scope. The unambiguous form `$ion::macro-name` is always available to use in the [template definition language](../macros_by_example.md). + + +> [!WARNING] +> This list is not complete. We expect it to grow and evolve as we gain experience writing macros. + +### System Symbols + +The Ion 1.1 System Symbol table _replaces_ rather than extends the Ion 1.0 System Symbol table. The system symbols are as follows: + + + + +| ID | Text | +|---:|:-----------------------------------------------| +| 1 | `$ion` | +| 2 | `$ion_1_0` | +| 3 | `$ion_symbol_table` | +| 4 | `name` | +| 5 | `version` | +| 6 | `imports` | +| 7 | `symbols` | +| 8 | `max_id` | +| 9 | `$ion_shared_symbol_table` | +| 10 | `$ion_encoding` | +| 11 | `$ion_literal` | +| 12 | `$ion_shared_module` | +| 13 | `macro` | +| 14 | `macro_table` | +| 15 | `symbol_table` | +| 16 | `module` | +| 17 | `retain` | +| 18 | `export` | +| 19 | `catalog_key` | +| 20 | `use` | +| 21 | `load` | +| 22 | `import` | +| 23 | _<empty string>_ (i.e. `''`) | +| 24 | `literal` | +| 25 | `if_void` | +| 26 | `if_single` | +| 27 | `if_multi` | +| 28 | `for` | +| 29 | `fail` | +| 30 | `values` | +| 31 | `annotate` | +| 32 | `make_string` | +| 33 | `make_symbol` | +| 34 | `make_blob` | +| 35 | `make_decimal` | +| 36 | `make_timestamp` | +| 37 | `make_list` | +| 38 | `make_sexp` | +| 39 | `make_struct` | +| 40 | `parse_ion` | +| 41 | `repeat` | +| 42 | `delta` | +| 43 | `flatten` | +| 44 | `sum` | +| 45 | `local_symtab` (or maybe just `symbol_table`?) | +| 46 | `lst_append` (or maybe just `add_symbols`?) | +| 47 | `local_mactab` (or maybe just `macro_table`?) | +| 48 | `lmt_append` (or maybe just `add_macro`?) | +| 49 | `comment` | +| 50 | `var_symbol` | +| 51 | `var_string` | +| 52 | `var_int` | +| 53 | `var_uint` | +| 54 | `uint8` | +| 55 | `uint16` | +| 56 | `uint32` | +| 57 | `uint64` | +| 58 | `int8` | +| 59 | `int16` | +| 60 | `int32` | +| 61 | `int64` | +| 62 | `float16` | +| 63 | `float32` | +| 64 | `float64` | + + _Logical Parameter Type Names_ (possible in Ion 1.2?) + +| ID | Text | +|---:|:------------| +| 65 | `number` | +| 66 | `exact` | +| 67 | `text` | +| 68 | `lob` | +| 69 | `sequence` | +| 70 | `'null'` | +| 71 | `bool` | +| 72 | `timestamp` | +| 73 | `int` | +| 74 | `decimal` | +| 75 | `float` | +| 76 | `string` | +| 77 | `symbol` | +| 78 | `blob` | +| 79 | `clob` | +| 80 | `list` | +| 81 | `sexp` | +| 82 | `struct` | + + +In Ion 1.1 Text, system symbols can never be referenced by symbol ID; `$1` always refers to the first symbol in the user symbol table. +This allows the Ion 1.1 system symbol table to be relatively large without taking away SID space from the user symbol table. + +### System Macros + + + +#### System Macro Addresses + + +| ID | Text | +|----:|:-----------------------------------------------| +| 0 | `values` | +| 1 | `annotate` | +| 2 | `make_string` | +| 3 | `make_symbol` | +| 4 | `make_blob` | +| 5 | `make_decimal` | +| 6 | `make_timestamp` | +| 7 | `make_list` | +| 8 | `make_sexp` | +| 9 | `make_struct` | +| 10 | `parse_ion` | +| 11 | `repeat` | +| 12 | `delta` | +| 13 | `flatten` | +| 14 | `sum` | +| 15 | `import` | +| 16 | `local_symtab` (or maybe just `symbol_table`?) | +| 17 | `lst_append` (or maybe just `add_symbols`?) | +| 18 | `local_mactab` (or maybe just `macro_table`?) | +| 19 | `lmt_append` (or maybe just `add_macros`?) | +| 20 | `comment` | + +#### `values` + +```ion +(values (v*)) -> any* +``` + +Produces a stream from any number of arguments, concatenating the streams produced by the nested expressions. +Used to aggregate multiple values or sub-streams to pass to a single argument, or to return multiple results. + +#### `make_string` + +```ion +(make_string (text::content*)) -> string +``` + +Produces a non-null, unannotated string containing the concatenated content produced by the arguments. +Nulls (of any type) are forbidden. Any annotations on the arguments are discarded. + +#### `make_symbol` + +```ion +(make_symbol (text::content*)) -> symbol +``` + +Like `make_string` but produces a symbol. + +#### `make_blob` + +```ion +(make_blob (lob::content*)) -> blob +``` + +Like `make_string` but accepts lobs and produces a blob. + +#### `make_list` + +```ion +(make_list (vals*)) -> list +``` + +Produces a non-null, unannotated list by concatenating the _content_ of any number of non-null list or sexp inputs. + +#### `make_sexp` + +```ion +(make_sexp (vals*)) -> sexp +``` + +Like `make_list` but produces a sexp. +This is the only way to produce an S-expression from a template: unlike lists, S-expressions in +templates are not quasi-literals. + +```ion +(:make_sexp) ⇒ () +(:make_sexp null) ⇒ (null) +``` + + +#### `make_struct` + +```ion +(make_struct (structs*)) -> struct +``` + +Produces a non-null, unannotated struct by combining the fields of any number of non-null structs. + +```ion +(:make_struct { k1: 1, k2: 2} {k3:3} {k4: 4}) ⇒ {k1:1, k2:2, k3:3, k4:4} +``` + +#### `make_decimal` + + +```ion +(make_decimal (flex_int::coefficient flex_int::exponent)) -> decimal +``` + +This is no more compact than the regular binary encoding for decimals. +However, it can be used in conjunction with other macros, for example, to represent fixed-point numbers. + +```ion +(macro usd (cents) (annotate (literal USD) (make_decimal cents -2)) + + +(:usd 199) ⇒ USD::1.99 +``` + + +#### `make_timestamp` + +```ion +(make_timestamp (int::year + uint8::month uint8::day + uint8::hour uint8::minute decimal::second + int::offset_minutes)) + -> timestamp +``` +Produces a non-null, unannotated timestamp at various levels of precision. +When `offset` is absent, the result has unknown local offset; offset `0` denotes UTC. +The arguments to this macro may not be any null value. + +> [!NOTE] +> TODO [ion-docs#256](https://github.com/amazon-ion/ion-docs/issues/256) Reconsider offset semantics, perhaps default should be UTC. + +Example: + +```ion +(macro ts_today + (uint8::hour uint8::minute uint32::seconds_millis) + (make_timestamp 2022 04 28 hour minute (decimal seconds_millis -3) 0)) +``` + + +#### `annotate` + +```ion +(annotate (text::ann* value)) -> any +``` + +Produces the `value` prefixed with the annotations ``ann``s. +Each `ann` must be a non-null, unannotated string or symbol. + +```ion +(:annotate (: "a2") a1::true) => a2::a1::true +``` + +#### `repeat` + +The `repeat` system macro can be used for efficient run-length encoding. + +```ion +(repeat (int::n! any::value+)) -> any +``` +Produces a stream that repeats the specified `value` expression(s) `n` times. + +```ion +(:repeat 5 0) => 0 0 0 0 0 +(:repeat 2 true false) => true false true false +``` + +#### `delta` + +> [!NOTE] +> 🚧 Name still TBD 🚧 + +The `delta` system macro can be used for directed delta encoding. + +```ion +(delta int::initial! int::deltas+) -> int +``` + +```ion +(:delta 10 1 2 3 -4) => 11 13 16 12 +``` + +#### `flatten` + +The `flatten` system macro flattens one or more sequence values into a stream of their contents. + +```ion +(flatten (sequence+)) -> any +``` +Produces a stream with the contents of all the `sequence` values. +Any `null.sexp` or `null.list` is treated as an empty sequence. +Any annotations on the `sequence` values are discarded. + +```ion +(:flatten [a, b, c] (d e f)) => a b c d e f +(:flatten [[], null.list] null.sexp foo::()) => [] null.list +``` + + +The `flatten` macro can also be used to splice the content of one list or s-expression into another list or s-expression. +```ion +[1, 2, (:flatten [a, b]), 3, 4] => [1, 2, a, b, 3, 4] +``` + +#### `sum` + +```ion +(sum (int::i*)) -> int +``` +Produces the sum of all the integer arguments. + +```ion +(:sum 1 2 3) => 6 +(:sum (:)) => 0 +``` + +#### `parse_ion` + +Ion documents may be embedded in other Ion documents using the `parse_ion` macro. + +```ion +(parse_ion (data!)) -> any +``` + +The `parse_ion` macro accepts a single, self-contained Ion document as a blob or string, and produces a stream of application values. + +```ion +(:parse_ion + ''' + $ion_1_1 + $ion_encoding::( + (module local (symbol_table "foo" "bar")) + (symbol_table local) + ) + $1 $2 + ''' +) +=> foo bar +``` + +> [!NOTE] +> TODO: Consider adding an example using embedded binary + +> [!NOTE] +> TODO: Consider defining parse_ion variants that can +> - leak encoding context to the outer Ion +> - consume the encoding context from the outer Ion + + +#### Local Symtab Declaration + +This macro is optimized for representing symbols-list with minimal space. + +```ion +(macro import (string::name uint::version? uint::max_id?) -> struct +{ name:name, version:version, max_id:max_id }) + +(macro local_symtab (import::imports* string::symbols*) + $ion_symbol_table::{ + imports:(if_void imports (values) [imports]), + symbols:(if_void symbols (values) [symbols]), + }) +``` + +```ion +(:local_symtab ("my.symtab" 4) (: "newsym" "another")) +=> +$ion_symbol_table::{ imports:[{name:"my.symtab", version:4}], +symbols:["newsym", "another"] } +``` + + +#### Local Symtab Appending + +```ion +(macro lst_append (string::symbols*) + (if_void symbols + (void) // Produce nothing if no symbols provided. + $ion_symbol_table::{ + imports: (literal $ion_symbol_table), + symbols: [symbols] + } + ) +) +``` + +```ion +(:lst_append "newsym" "another") => + +$ion_symbol_table::{ + imports:$ion_symbol_table, + symbols:["newsym", "another"] +} +``` + +#### Local Macro Table Appending + +```ion +(macro lmt_append (sexp::template_macros*) + (if_void template_macros + (values) // Produce nothing if no symbols provided. + $ion_encoding::( + (retain *) + (module syms2 (symbol_table ["s3", "s4"])) + (symbol_table syms syms2) + ) + ) +) +``` + + +#### Compact Module Definitions + +**TODO**