Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Willow store #2693

Open
22 of 24 tasks
rklaehn opened this issue Sep 4, 2024 · 3 comments
Open
22 of 24 tasks

Willow store #2693

rklaehn opened this issue Sep 4, 2024 · 3 comments
Assignees
Labels
c-iroh-docs feat New feature or request

Comments

@rklaehn
Copy link
Contributor

rklaehn commented Sep 4, 2024

After a lot of discussion with Aljoscha last week I have decided to implement the willow store roughly as proposed in https://github.com/AljoschaMeyer/kv_3d_storage

TLDR: this has the upside that it is possible to quickly query 3d ranges, as well as (this is the point where it is superior to the radix tree) querying 3d ranges sorted by any of the 3 primary dimensions path, time and subspace.

The downside compared to the radix tree is that insertion requires roughly O(log n) path comparisons, and path comparisons itself are not constant time and can be expensive. Also, the paths are stored as a whole, unlike in a radix tree where they are prefix compressed. Prefix deletion will need a naive implementation where you just iterate and delete over everything below a prefix.

My current approach is to implement the tree query ops for a generic X, Y, Z (currently using u64 for testing), then implement insert and delete. Operations are tested using proptest property based testing.

I am currently working on this in a separate repository https://github.com/n0-computer/willow-store/tree/3d

Operations

  • summary(3drange)
    • operation
    • proptest
  • iterate_unordered(3drange)
    • operation
    • proptest
  • iterate_ordered(3drange, order) // order is xyz, yzx, zxy
    • operation
    • proptest
  • get
    • operation
    • proptest
  • batch build from vec
    • operation
    • proptest
  • insert (amortized log n with worst case n)
  • delete (amortized log n with worst case n)
  • (stretch goal) tree merge
  • (stretch goal) efficient retain like op

Plumbing

  • rank computation based on key
  • cheap (preferably zero copy) way to go from NodeData to db content
  • redb store impl

Docs

  • Data structure documented
  • Serialization format documented
@rklaehn rklaehn added feat New feature or request c-iroh-docs labels Sep 4, 2024
@rklaehn rklaehn self-assigned this Sep 4, 2024
@matheus23
Copy link
Contributor

Prefix deletion will need a naive implementation where you just iterate and delete over everything below a prefix.

Doesn't this have to happen on every insert? I.e. prefix deletion is exactly the same as inserting a tombstone at a prefix?

@rklaehn
Copy link
Contributor Author

rklaehn commented Sep 4, 2024

Doesn't this have to happen on every insert? I.e. prefix deletion is exactly the same as inserting a tombstone at a prefix?

Not sure what you mean with the tombstone, but yes, every insert will mean a query for possibly affected child nodes, iterating over them, and deleting those that have to go.

Twothree reasons for why this might not be quite as bad as it sounds:

  • in many (most?) cases the node you insert will be a leaf, so the query will return zero affected elements
  • for non-leaf queries initially will implement this as just iterate then delete, but longer term this can be done with a fn similar to retain that just filters the tree efficiently instead of deleting one by one
  • when writing to a DB, it often almost does not matter that much how much you change, just how many times you have to call sync. Unless it is a really huge change. So maybe this won't be that bad.

@AljoschaMeyer
Copy link

Another thought to keep in mind about naive prefix deletion vs more efficient radix-tree-based implementations: notifying subscribers about deletions might turn this into an O(n) (for n deleted entries) anyways. O-notation might be less important here than actual running times, but still, could be helpful to keep this in mind from the start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c-iroh-docs feat New feature or request
Projects
Status: 🏗 In progress
Development

No branches or pull requests

3 participants