Python binary trie implementation, helping with XOR distances.
Author: Guillaume Michel
IPFS and libp2p are built upon the Kademlia DHT, which uses the XOR distance as a distance metric between keys. As explained in this blogpost, the XOR distance is tricky to understand and represent. This distance metric is odd for it is non linear.
For instance,
One popular representation for XOR distance is the Binary Trie. A binary brie is a simple Trie with keyspace
Each node in the Trie can be associated with some metadata for convenience.
pip install binary-trie
from binary_trie import Trie, bytes_to_bitstring, bitstring_to_bytes, int_to_bitstring, bitstring_to_int
trie = Trie()
The add(key)
method takes a bitstring as input. The bitstring can either be provided directly, or be computer from an int
or bytes
using the int_to_bitstring()
and bytes_to_bitstring()
functions. Note that the l
parameter represent the bit length of the binary representation. It is important that all keys share the same bit length. The bit length can be omitted in bytes_to_bistring()
when working with bit lengths multiple of 8.
trie.add("0010")
trie.add(4*"0")
trie.add(int_to_bitstring(3, l=4))
trie.add(bytes_to_bitstring(b'\x0e', l=4))
The add function returns True
on success, and False
if the key was already in the trie (not inserted).
The find(key)
method returns True
if the provided key is in the Trie, False
otherwise.
trie.contains("0010") # True
trie.contains("0100") # False
The depth(key)
method returns the depth of the provided key, if it is in the Trie and -1
otherwise. The depth of a trie node is defined as the number of its direct ancestors, up to the root of the trie.
trie.depth("0") # 1
trie.depth("0010") # 3
trie.depth("0111") # 4
trie.depth("1101") # 2
trie.depth("11") # -1
The n_closest_keys(key, n)
method returns the n
closest keys to the provided key in the trie, according to the XOR distance. The keys are sorted according to the distance to the target key. Note that only leaves of the trie will be returned, not intermediary nodes.
trie.n_closest_keys("0001", 1) # ["0000"]
trie.n_closest_keys("0010", 3) # ["0010", "0011", "0000"]
The prefix_match_keys(prefix)
will return the list of keys of all leaves of the Trie matching the provided prefix
.
trie.prefix_match_keys("00") # ["0000", "0010", "0011"]
trie.prefix_match_keys("1111") # []
It is possible to attach an object
as metadata to a trie leaf node.
class MyObj(object):
def __init__(self, key, name):
self.key = key
self.name = name
trie = Trie(MyObj)
obj = MyObj(int_to_bitstring(10, 4), "Node 10")
trie.add(obj.key, obj)
trie.find(obj.key).name # "Node 10"
trie.n_closest(obj.key, 1) # [obj]
trie.prefix_match(obj.key[:2]) # [obj]
Note that the find(key)
method is similar to the contains(key)
method, but returns the associated metadata
if any, instead of returning a bool
.
The n_closest(key, n)
method is similar to the n_closest_keys(key, n)
method, but returns the list of metadata
associated with the closest keys, instead of the list of keys.
It is possible to assign some boolean variables to metadata
objects, and make use of them using predicate in n_closest()
, n_closest_keys()
, prefix_match()
and prefix_match_keys()
methods to filter the results.
class MyObj(object):
def __init__(self, key, name, some_bool):
self.key = key
self.name = name
self.some_bool = some_bool
trie = Trie(MyObj)
nodeIDs = [0, 1, 2] # ["0000", "0001", "0010"]
for i in nodeIDs:
obj = MyObj(int_to_bitstring(i, 4), "Node "+str(i), i % 2 == 0)
trie.add(obj.key, obj)
trie.n_closest_keys("0001", 2, predicate=lambda n: n.some_bool) # ["0000", "0010"]
trie.prefix_match_keys("000", predicate=lambda n: n.some_bool) # ["0000"]
# Note that the key "0001" matched both requests, but wasn't taken into
# consideration as it doesn't satisfy the predicate
There are 4 helpers functions to facilitate the use of this implementation with keys being not only bitstring
, but also bytes
or int
. These helper functions help translate bytes
and int
to bitstring
and reciprocally.
def bytes_to_bitstring(data: bytes, l: int=8*len(data)) -> bytes:
...
bytes_to_bitstring(b'\xff\x00') # "1111111100000000"
bytes_to_bitstring(b'\xf3',l=4) # "0011"
def bitstring_to_bytes(bs: str) -> bytes:
...
bitstring_to_bytes("1111111100000000") # b'\xff\x00'
bitstring_to_bytes("0011") # b'\x03'
def int_to_bitstring(i, l: int) -> bytes:
...
int_to_bitstring(6, 4) # "0110"
int_to_bitstring(6, 16) # "0000000000000110"
def bitstring_to_int(bs: str) -> int:
...
bitstring_to_int("0110") # 6
bitstring_to_int("0000000000000110") # 6
The following functions help uniquely encode a specific binary prefix, that are bistrings of arbitrary size. The bitstrings can be encoded in int
or in unsigned varint
. The mapping goes as follow:
bitstring code varint
"" -> 0 -> 00000000 (0x00)
"0" -> 1 -> 00000001 (0x01)
"1" -> 2 -> 00000010 (0x02)
"00" -> 3 -> 00000011 (0x03)
...
"0000000" -> 127 -> 01111111 (0x7f)
"0000001" -> 128 -> 10000000 00000001 (0x8001)
...
def encode_bitstring(bitstring: str) -> int:
...
encode_bitstring("1") # 2
encode_bitstring("010") # 9
encode_bitstring("0000001") # 128
def decode_bitstring(code: int) -> str:
...
decode_bitstring(2) # "1"
decode_bitstring(9) # "010"
decode_bitstring(128) # "0000001"
def bitstring_to_varint(bitstring: str) -> bytes:
...
bitstring_to_varint("1") # b'\x02'
bitstring_to_varint("010") # b'\x09'
bitstring_to_varint("0000001") # b'\x80\x01'
def varint_to_bitstring(varint_bytes: bytes) -> str:
...
varint_to_bitstring(b'\x02') # "1"
varint_to_bitstring(b'\x09') # "010"
varint_to_bitstring(b'\x80\x01') # "0000001"