You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since we intend to favor streaming parsing, we need to consider a format suited for streaming.
Strings + Lazy parsing
One of the problems we are going to encounter is the combination of strings and lazy parsing:
consider two independent lazy functions foo and bar, where bar is somewhere further down the stream from foo;
assume that foo defines a literal string s that does not show up in our AOT dictionary;
how should bar refer to s in such a way that we do not first need to parse foo?
One way to do this is the following:
divide the stream in packets;
each packet starts with a table of strings, which may now used by every packet further down the line.
If we do so, the packet containing foo will define literal string s. The packet containing bar will either be the same packet or a packet further down the line, and will be able to access s.
As a bonus, this will let us compress these strings table using a well-known algorithm, such as brotli.
Model State + Lazy Parsing
We will need to adapt our models to restart from a well-specified state whenever parsing a lazy function.
(TBD)
Offsets + Entropy + Streaming
We need the ability to tell the decoder where to fetch a lazy function. In non-entropy-coding versions, we could reference the actual offset at which a lazy function was encoded. With entropy coding, offsets make no sense.
A partial solution would be the following:
each packet may contain a number of (aligned) lazy declarations;
each packet's header declares the lazy declarations included in this packet (as keys, actual value of the key is an arbitrary string), with their starting-offset-in-packet;
when encoding a [lazy] field, we specify the key at which to find the content of the field;
note that a lazy declaration could span over several packets.
The text was updated successfully, but these errors were encountered:
Since we intend to favor streaming parsing, we need to consider a format suited for streaming.
Strings + Lazy parsing
One of the problems we are going to encounter is the combination of strings and lazy parsing:
foo
andbar
, wherebar
is somewhere further down the stream fromfoo
;foo
defines a literal strings
that does not show up in our AOT dictionary;bar
refer tos
in such a way that we do not first need to parsefoo
?One way to do this is the following:
If we do so, the packet containing
foo
will define literal strings
. The packet containingbar
will either be the same packet or a packet further down the line, and will be able to accesss
.As a bonus, this will let us compress these strings table using a well-known algorithm, such as brotli.
Model State + Lazy Parsing
We will need to adapt our models to restart from a well-specified state whenever parsing a lazy function.
(TBD)
Offsets + Entropy + Streaming
We need the ability to tell the decoder where to fetch a lazy function. In non-entropy-coding versions, we could reference the actual offset at which a lazy function was encoded. With entropy coding, offsets make no sense.
A partial solution would be the following:
[lazy]
field, we specify the key at which to find the content of the field;The text was updated successfully, but these errors were encountered: