-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIDv2 idea: include the heights of trees in the CID #62
Comments
Have a read through #49 and you'll get a sense for the kinds of things people want out of a possible CIDv2. Particularly the last comment where @vmx has a link to a hackmd writeup of a proposal he championed. Something a lot of people want is the ability to have something arbitrarily parameterised, and in that framework a "depth" would certainly be an option. Aside from the challenges that come from getting to a CIDv2, I think your biggest problem is that lack of verification that a "depth" parameter gives. Kind of like the |
Hey Rod,
Thanks! I'll read through that.
So just to double check: with "depth" you mean counting the distance of a block with respect to the root? Yes, I agree that that's a property that you can only validate by traversing the entire DAG, as you can't rule out that a block isn't referenced by other parts of a DAG, which may affect the depth. In this specific case I'm interested in tracking the "height", which is the length of the longest downward path to a leaf from a given block. It's the inverse. As far as I know, that information can be verified by only considering the contents of a single block: If such a check is performed for every block in the DAG independently (similar to validating the hash), then you can be certain that the height of the overall DAG is correct as well.
In my case I'm explicitly interested in having it part of the CID, so that I can come up with a schema independent way of efficiently replicating DAGs. |
The other week I was brainstorming whether it would be possible to use IPLD as a potential data storage/transfer format for a future version of Bazel’s Remote Execution protocol, most notably its Content Addressable Storage (CAS). See bazelbuild/remote-apis#250 for details. Where this use case differs from the IPFS is that Bazel remote execution follows a more traditional client-server model. There is no immediate intent to use peer-to-peer sharing of objects.
One thing I was thinking about, was what an efficient algorithm for replicating a DAG from a client to a server (i.e., uploading source code to build), or from a server to a client (i.e., downloading build outputs) would look like. Considering that IPLD/IPFS relies on chunking more heavily compared to what Bazel does right now, it’d be pretty important for build clients/servers to use heavy parallelism to transfer such data across.
That said, you do want to place bounds to the amount of parallelism to prevent exhaustion in case of large data sets. It’s fine for a DAG to have large fanout, and it’s fine for a DAG to be deep. But if a DAG has large fanout and is very deep, then it might be necessary to limit the amount of parallelism traversing the DAG to avoid keeping too many partially replicated blocks in memory.
One piece of information that would be very useful to have to be able to implement a properly bounded parallel replication algorithm is tree height. If this information was attached to every link, then a replication algorithm could at any point make smart choices on how aggressively fan out when traversing.
Unfortunately, this information is not part of CIDv0/CIDv1, meaning that if a storage system using IPLD wants to use such information, it would need to track it out of band. Alternative, one could use IPLD with a custom link system, but that has all sorts of unfortunate implications, such as being unable to export build results into IPFS for archiving purposes.
My question is therefore whether a future version of CID, if ever created, could include tree height (in blocks) as well.
The text was updated successfully, but these errors were encountered: