Skip to content
/ Quantity Public

Number format to represent quantities for display and archiving

License

Notifications You must be signed in to change notification settings

jido/Quantity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 

Repository files navigation

Quantity

Number format to represent quantities for display and storage

  • Efficient conversion to and from text
  • What you see is what you get: it's a decimal format which means no internal rounding, no hidden digits
  • Arbitrary precision
  • Supports native equality and comparison of like numbers

Small quantity (32 bit)

Bit layout

sxmmmmmmmmmmkkkkkkkkkkuuuuuuuuuu

s = sign

x = extension bit (0)

m = millions

k = thousands

u = units

  • Numbers from 0 to 999,999,999
  • Infinity has all bits set except for sign
  • Negative quantities stored using two's complement
  • If only the sign bit is set it indicates an invalid quantity (NaN)
  • Decimal digits are stored in groups of three digits from 000 to 999
  • The extension bit allows to represent quantities from a billion onwards as described below

Examples

One

00000000000000000000000000000001
                      \___1____/

One thousand

00000000000000000000010000000000
            \___1____/\__000___/

Speed of light (299,792,458 m/s)

00010010101111000110000111001010
  \__299___/\__792___/\__458___/

Positive infinity

01111111111111111111111111111111

Large quantity (variable length)

When the extension bit is set a larger quantity is stored using as many bits as required.

The extended format starts with a 48-bit header:

sxxxnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

s = sign

x = extension bits (100)

n = number of chunks (44 bits)

A chunk consists of eight groups of three digits for a total of 24 digits. Each chunk occupies 80 bits = 10 bytes in memory.

As previously, each group of three digits is stored as a 10-bit integer up to 999 (in binary: 1111100111).

A special value of 1023 means that the group is not in use (in binary: 1111111111). That is useful when the number of groups is not known in advance. This special value marks the last group of digits of the quantity.

A special value of 1022 or 1021 mean that one digit and two digits respectively are not in use in the preceding group (in binary: 1111111110 and 1111111101). The unused digits should be all 0.

The chunks follow immediately the header. The number of bytes occupied by the quantity is rounded up to the nearest multiple of four, which means that two bytes padding are added when n is even.

The padding is normally filled with ones. The first 10 bits of the padding can be used for special values 1022 or 1021 when the last one or two digits of the previous chunk should be discarded.

Extensions

The default extension value is 100 (binary).

The number of chunks is encoded using 44 bits which corresponds to a quantity with up to 4×10¹⁴ digits.

Other extension values put some of those bits to a different use:

100 : default
101 : with exponent
110 : 64 bit floating point
111 : variable length floating point

Exponent extension

The format recognises a special case where all the end digits of the quantity are zeros.

Instead of using more chunks to write these zeros, the first 16 bits of n are used to store an exponent which indicates the number of trailing zeros.

Header bit layout

sxxxeeeeeeeeeeeeeeee nnnnnnnnnnnnnnnnnnnnnnnnnnnn

s = sign

x = extension

e = exponent (16 bits)

n = number of chunks (28 bits)

Example

Advogadro constant (6.02214076×10²³)

01010000000000001000 0000000000000000000000000001
    \__exponent____/ \_____number of chunks_____/
10010110100011010110000100110000000000000000000000000000000000000000000000000000
\__602___/\__214___/\__076___/

Floating point extension

The above exponent extension could also be used to write floating point quantities by deciding a maximum number of figures after the decimal point and adjust the exponent by the same amount (exponent bias).

However, that is not an efficient method if the quantities have a wide range of magnitude.

Instead it is preferable to fix the position of the decimal point after the first digit and use chunks for the fractional part only.

In the floating point extension, the exponent is immediately followed by the first digit of the quantity.

Bit layout

64 bit:
sxxxeeeeeeeeeeeeeeeedddd mmmmmmmmmmµµµµµµµµµµnnnnnnnnnnpppppppppp

Variable length header:
sxxxeeeeeeeeeeeeeeeedddd nnnnnnnnnnnnnnnnnnnnnnnn

s = sign

x = extension (110 or 111)

e = exponent (16 bits)

d = first digit (1 to 9)

m = millis

µ = micros

p = picos

n = nanos (64 bit) or number of chunks (variable length)

Example

Electron mass at rest (9.1093837015×10⁻³¹ kg)

011001111111111000011001 0001101101010111111110101111010111110100
    \___exponent___/\9_/ \__109___/\__383___/\__701___/\__500___/

Variations

An implementation could allow only fixed length quantities, which are 32 bit small quantity and 64 bit floating point quantity. In case of a 32 bit quantity a second value could be added to write a fraction (x/y).

If 28 bits are always enough to store the number of chunks, the default extended format without exponent can be dropped altogether. Then quantities can have up to 6×10⁹ non-zero digits.

If there is no need for an explicit fractional part, the variable length floating point extension can be dropped. The 64 bit floating point extension can be made to behave like the exponent extension by using exponent bias +12.

Practical considerations

Endianness

Quantities should always be in big endian format for storage or exchange. This is because the textual representation of a quantity puts the most significant digit first.

However, it is acceptable to use little endian format for local usage. The byte length of a quantity is always a multiple of four, so it is practical to handle the data as a sequence of 32 bit values (four bytes) using the platform natural endianness. The extension bits and number of chunks may need to be duplicated or relocated when handling a quantity in little endian format.

Compression

The format does not define a compression scheme.

If the data contains patterns (repeated digits or sequence of digits), it is likely it will benefit from compression. This can be applied on top of the format as seen fit.

Normalisation

To make comparison more efficient quantities should be normalised, which generally means they should be written using as less bytes as possible.

The options below should be evaluated in order and the first fit be taken.

Integer quantity

Option Extension bits Number of chunks Length Application
Small quantity 0 - 32 bit value up to 999,999,999 and special values
With exponent 101 0 48 bit powers of 10
64 bit floating point 110 - 64 bit up to 13 significant figures
With exponent 101 any 128 bit+ values ending with zeros
Default extension 100 any 128 bit+ other values

Fractional quantity

Option Extension bits Number of chunks Length Application
Small quantity 0 - 32 bit round value up to 999,999,999 and special values
Floating point 111 0 48 bit single significant figure
64 bit floating point 110 - 64 bit up to 13 significant figures
Floating point 111 any 128 bit+ other values

Infinitesimal quantity

Although the first digit of a floating point value would normally be between one and nine, nothing prevents from making it zero instead. A floating point value which first digit is zero and with zero chunks denotes a quantity which is equivalent to zero, such as 1/x when x ⟶ ∞. It can be positive or negative.

Example

011100000000000000000000 000000000000000000000000

Another use for zero as first digit is to recover from a parsing error. For example, when converting from text to a quantity, floating point values may be detected by looking at a finite number of characters. When a decimal point appears after the detection point the quantity may already be written as a sequence of chunks without a first digit. It can be easily converted to a (non normalised) floating point value with first digit = 0.

Operations

Quantities do not define any operation other than equality, comparison and conversion to and from other formats such as text.

One reason is that combining quantities with very different magnitude can result in extreme memory consumption, for example 10¹⁰⁰⁰⁰⁰ + 1 does not have a space-efficient representation.

About

Number format to represent quantities for display and archiving

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages