© 2020 Blockchain Commons
Authors: Wolf McNally, Christopher Allen
Date: May 5, 2020
The content below is now deprecated and of historical interest only.
The [Bech32] checksummed base32 format was introduced to transport segregated witness signatures for Bitcoin. It has several desirable qualities:
- The payload and checksum data use a limited character set compatible with both URI syntax and QR code alphanumeric encoding mode.
- The character set uses the letters A-Z in a case-modal fashion (i.e., one can use all upper case or all lower case but the algorithm rejects mixed-case.)
- The character set is chosen to minimize ambiguity, and the ordering is chosen to minimize the number of pairs of similar characters.
- Built-in error detection and optional error location for payloads <80 characters.
- Because the character set used are all "word characters", most text editors will select the entire sequence of characters when it is double-clicked.
However, it also has a major drawback when considered for general data encoding:
- The characters allowed for the human readable part ("HRP") conflict with both URI syntax and QR code alphanumeric encoding mode.
- The HRP is not useful to identify a wide variety of general structured data types.
This document proposes a new, more general data encoding format BC32 that is based on Bech32 but makes the following modifications:
- The HRP and numeral
1
divider are no longer included. - TODO: Checksum parameters are defined to handle error correction and identification for larger payloads.
The following table compares the relative efficiency of various binary-to-text encoding methods. base32 encoding (and BC32 excluding the checksum) is 12.5% less efficient than base64, but also 12.5% more efficient than hexadecimal. The benefit gained by using only 32 characters is seamless compatibility with both QR code alphanumeric encoding and URI syntax.
Format | Number of characters | Number of bits per character | Efficiency compared to raw binary |
---|---|---|---|
raw binary | 256 | 8.0 | 100% |
base64 | 64 | 6.0 | 75% |
base45 (QR Code alphanumeric) | 45 | 5.49185 | 68.65% |
base32, bech32, BC32 (not including checksum) | 32 | 5.0 | 62.5% |
hexadecimal | 16 | 4.0 | 50% |
✳️ Note: Although it would appear that base32 is always less efficient than base64, this is not the case when encoding a payload for transport in QR codes. Because BC32 uses a character set optimized for the more efficient QR code alphanumeric mode, a payload of 1000 random bytes results in a QR code 13% less dense when the payload is encoded with BC32, compared to the same payload encoded as Base64.
Current implementations:
- Implemented by
bc32_seed_encode()
andbc32_seed_decode()
in the Blockchain Commons bc-bech32 library. - Implemented as a Wolfram Language (Mathematica) module accompanying this document.
- Implemented as bc-bech32 on Typescript/javascript
Input | BC32 Encoded |
---|---|
"Hello world" (UTF-8) == 48656c6c6f20776f726c64 | fpjkcmr0ypmk7unvvsh4ra4j |
d934063e82001eec0585ee41ab5d8e4b703a4be1f73aec21e143912c56 | my6qv05zqq0wcpv9aeq6khvwfdcr5jlp7uawcg0pgwgjc4shjm6xu |