Replies: 7 comments 6 replies
-
Do we have any concrete use cases for cyclic graphs? I admite I don't have a comprehensive understanding of all the ways in which we would be using RDF here (thinking in the context of a practical use case or application), but i struggle to think of use cases that require cyclic graphs for us. Blockchain is indeed a different context than W3C is normally used to operating in, so I do believe that we probably are operating under a different set of constraints than the standards bodies here.... that much doesn't surprise me. But i'm also feeling wary of us brewing our own everything... We are already down the path of doing our own protobuf code generation (which I think sounds like the right decision), but I would love for us to be able to actually leverage open source tooling if we're adopting open source standards (such as RDF generally)... The nice thing about using subsets of RDF that we support though, is that the inplementation of tooling around it can be more customized and better suited towards our actual needs (and ideally be smaller and more maintainable as well). @aaronc can you give some examples of OWL ontologies or other RDF graphs that we would want to store on Regen Ledger that have cycles? What are the end-user tradeoffs of not supporting cyclic graphs? If there aren't any real use cases for cyclic graphs in Regen's use case, then it's really a question of evaluating the engineering costs of writing / maintaining such tooling. |
Beta Was this translation helpful? Give feedback.
-
As @clevinson pointed out, I believe we first need to evaluate whether or not we wanna support cyclic graphs. I'd also be curious to hear about such examples. I'm not feeling super confident in working on our own implementation restricted to acyclic graphs given our resources and other projects at Regen, although I don't have a clear estimation on the work needed in this case. Could you speak a bit more to that? That being said, SHACL doesn't seem to have implementation in Go so we might need to implement our own anyway. If we go for more computationally complex canonicalization and schema validation, then doing some benchmarking would be useful. |
Beta Was this translation helpful? Give feedback.
-
Actually I think it's possible to process a subset of SHACL with reasonable time complexity if everything is indexed in maps the right way... |
Beta Was this translation helpful? Give feedback.
-
So one thing I didn't mention here is that I would like to add another (optional) step after canonicalization which is storing the canonicalized quads in a merkle tree (see https://www.notion.so/regennetwork/RDF-Graphs-and-Datasets-0313390483fd41d1aa0c02cb6caacec9#c4ce32ed0a8d451d9e7d1677f21c6b28). This will allow revealing parts of graphs provably without revealing the full graph (as a privacy use case). I did some benchmarks comparing the cost of URDNA2015 canonicalization vs building an SMT merkle tree (which we are evaluating as a replacement for IAVL). (Benchmark code here: https://github.com/regen-network/regen-ledger/blob/aaronc/rdf-c14n-benchmarks/x/data/rdf_test.go) This is a rough test and I think SMT could be optimized more, but basically the cost of building the SMT tree is about 2x the cost of running URDNA2015 in my benchmarks... What I'm taking from this is that maybe URDNA2015 is fine as is considering its cheaper than the other thing I wanted to do when verifying hashes... Also, the first place for optimizing the canonicalization might be a faster hash function rather than an algorithm which restricts graphs for a speed up... So maybe we more or less try to stick with the standards route (URDNA2015 + a subset of SHACL) and see how that looks in terms of code complexity for reviewing?? I am thinking it would be a custom implementation of both (as I'm not comfortable just including json-gold in the state machine as is since it's not designed with blockchains in mind). |
Beta Was this translation helpful? Give feedback.
-
Did a quick spike on what a SHACL (or partial SHACL) implementation might look like and I think this model is probably feasible and can be made highly concurrent... so I'm less concerned about performance. (Code here if you're curious: https://github.com/regen-network/regen-ledger/compare/aarond/rdf-schema-spike). |
Beta Was this translation helpful? Give feedback.
-
@aaronc and I just spoke about this today in a bit more detail. Seems like through his findings there aren't that many performance concerns with using the standard algorithms (URDNA2015 + subset of SHACL). It sounds like it will be a lot easier for us to get started as well by just leveraging these standards, and seeing that the original concern was mostly about computational complexity (which now seems less an issue), i'm fine with us just proceeding with the standards described here and not having to worry about deviating with our own specific requirements. @blushi How does that sound? |
Beta Was this translation helpful? Give feedback.
-
So I realize for verifying URDNA2015, it might be reasonable to not actually very that quads are fully normalized on-chain, just that they are sorted and use proper This would mean that someone could store a dataset on-chain that just randomly assigned Does that seem reasonable as a starting point? Then I don't even need to have an audited URDNA2015 implementation, just the SHACL subset code. |
Beta Was this translation helpful? Give feedback.
-
The most widely accepted method for canonicalizing RDF is URDNA2015 and SHACL is the only official W3C recommendation for RDF schema validation.
I have a few concerns about trying to use these models on-chain mainly related to computational complexity.
Take a look at the golang code for canonicalization: https://github.com/piprate/json-gold/blob/master/ld/api_normalize.go. It's fairly complex though maybe that's not a deal breaker. This relatively simple RDF takes about 36 microseconds to canonicalize. Either way we would need to audit the implementation and possibly customize it for gas &/or efficiency.
An alternative would be to place a restriction on graphs that they are acyclic. For instance a graph like the one above is basically the acyclic JSON structure:
It should be possible to verify that an acyclic graph has proper blank node canonicalization in linear time. So I think we'd be talking about an order of magnitude improvement in performance and significantly simpler code. The downside of this approach is that 1) not all RDF graphs are acyclic, even common graphs like OWL ontologies generally have cycles and 2) there are no existing client implementations - URDNA2015 isn't super widespread but it has some level of adoption.
Regarding schemas, I can get into more details later but I basically the same arguments apply. A validation language that applies to generalized RDF graphs (with cycles) like SHACL will be more computationally complex than a schema for an acyclic graph (such as JSON schema) which can probably be evaluated in linear time. Again, there is the same issue in that the RDF community is trying to adopt SHACL and deviating from that would require different client implementations. I believe, however, that implementing all of SHACL on chain would be quite a stretch and that likely we would implement a subset at best, or maybe something similar that covers our use cases.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions