This data structure can be used to store the history of visited paths or URLs with a file or web browser, in a way that no “forward” element is ever forgotten.
The history tree is “global” in the sense that multiple owners (e.g. tabs) can have overlapping histories. On top of that, an owner can spawn another one, starting from one of its nodes (typically when you open a URL in a new tab).
This global history tree structure reifies all this.
In many popular web browsers and file managers, the history is linear. This is unfortunate because it loses information such as the branches that are created when going back, then forward.
Take this linear history:
A -> B
We are on B
. If we go back to A
, then to C
, the linear history becomes
A -> C
We lost the information that we visited B
.
A tree history solves this issue by encoding the history as a tree-like data structure instead of a list. With a tree history, the above example would yield
A -> C
\-> B
In web browsers and file managers, content is browsed by what this library calls an owner (e.g. a tab). It’s common practice to create a new owner visiting content coming from an existing owner (e.g. opening a link in a new tab). This induces a relationship between owners.
In particular, this type of owner relationships means that the history of related owners can be related.
This library allows us to have such trees:
(X-A
(X-B1
(X-C1 X-C2))
(X-B2
(X-D1 Y-D2)))
The X-prefixed nodes belong to owner X, while the Y-ones belong to Y. X current node may be X-B2, while Y current node may be Y-D2.
X is said to be the creator
of Y, and Y-D2 is the origin
node of Y.
Y owns only 1 node, while X owns 6 nodes.
None of them overlaps, but wait.
With current owner being X, if we go forward to Y-D2, then we own it and it becomes the new current node of X. Now Y-D2 is owned both by X and Y.
Similarly, Y can “go back” to X-A, which becomes its current node, while X-B2 becomes the forward child of X-A for Y. If Y now visits X-B1, it becomes the forward child of X-A for Y (while X-B2 is still the forward child of X-A for X).
A node may have multiple children. For each owner, the “forward child” is the
default choice of which child to visit when calling the forward
function.
Observe that each node may have different forward children for each of their
owners.
- Data
- Whenever we refer to
data
, we mean the arbitrary content the user stores in the tree (such as URLs or paths). The data is automatically deduplicated. - Branch
- since “history tree” refers to the whole data structure, we avoid
using the term “tree” otherwise to avoid any confusion. Instead we talk about
“branches”.
Whenever we refer to a branch, we mean to whole set of inter-connected nodes
up to the root (a node without parent).
Note that the history tree may have multiple, non connected branches.
For entry
, node
, binding
, owner
, see the documentation of the respective
classes.
While the history tree is not immutable per se, it tries to retain as much information as possible. But keeping nodes forever would lead to an ever-growing tree, which is usually not desirable. So the policy is to delete all the nodes of a branch only when they become owner-less. So this happens only on owner deletion.
Everything else remains:
- Bindings (ownership) cannot be removed as long as the owner is not deleted.
- Nodes cannot be deleted other than by the aforementioned mechanism.
Entries can only be deleted with delete-data
if no node refers to the entry.
This can be inconvenient if there are many nodes used by many owners which refer
to the entries we would like to delete.
A few options:
delete-owner
removes an owner. If all the owners are removed from a branch, the branch is garbage-collected. If the entries that were pointed to by the branch nodes are not referenced in any other branch, the entries effectively become node-less and thus available for deletion.reset-owner
disowns all the nodes of a given owner and creates a new root node pointing to the current entry of the owner. This makes it possible to free nodes and entries without deleting an owner.
This library is not thread-safe. The user is expected to use a mechanism such as mutexes to guarantee the integrity of the global history tree.
Rationale: It’s only too common that the user wants to persist the global history tree to disk. In this case, some form of thread-safety should already be used for the persisted file. This safety can be trivially used to guarantee the integrity of the global history tree in memory as well.
The entries
must be unique in a sense that’s defined by the user.
For instance, if the user wants to store
(defclass web-page ()
((url :accessor url)
(title)))
entries, the title
might be irrelevant for uniqueness.
Thus, to store web-page
’s by unique URL, you can create a history
with the
url
accessor as a key:
(htree:make :key 'url)
When adding an entry with the same URL but with a different title, the existing entry’s title is automatically updated to the new one, but the entry object stored in the tree remains the same.
- Remove
NASDF
as a dependency.
- Switch from
hu.dwim.defclass-star
tonclasses
.
Initially it was decided to encode the set of unique entries as a hash-table for performance reasons. The reasoning was that hash-tables have constant-time access to their elements as opposed to the more practical Lisp lists, for which access is in linear time.
It turns out that element access in a list is extremely fast with SBCL, and a quick benchmark shows that it’s only when exceeding about 10.000.000 entries that hash tables start becoming more interesting. So maybe hash tables were not the best choice for a set that’s unlikely to have more than 100.000–1.000.000 entries.
Previously we explained how the uniqueness is customizable. In standard Common
Lisp, hash tables accept only eq
, eql
, equal
or equalp
as test function.
So to allow full customizability as in the previous example, we resort to the
cl-custom-hash-table library.
Custom hash tables have restricted the design somewhat. For instance, the
entries
hash table values are the entries themselves, so that we have a way to
access the stored keys in constant time. (Indeed, when you call (gethash
my-web-page entries)
, there is no guarantee that the matched key is identical
to my-web-page
.)
The global history tree strives to be as immutable as possible, as we explain in the sections on integrity and deletion. This helps both the developers and the users understand what’s going on, which is essential for such a complex data structure.
It could have been better to have a fully immutable data structure (in the functional programming sense), e.g. using the FSet library. It’s unclear whether the performance penalty would be too important. We would need some benchmark here.
One benefit of full immutability is that we can know precisely when the global
history tree was modified (e.g. when my-history
is reassigned to the
new history value). This allows us, for instance, to serialize only on
modification and thus avoid useless serializations, which may be expensive when
the history grows big.