-
-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAET index #351
Comments
For all use-cases AVET covers what VAET does. I’m not sure what’s its purpose in Datomic is. If you have a use-case, I might consider adding it. Otherwise, it’ll just be a performance penalty |
VAET is only necessary when you don't know which reverse reference-type attributes might be applicable for a given entity. For a small DataScript-sized system this is probably never an issue. At a larger scale, with potentially millions of attributes, you need VAET in order to do explorative navigation of the graph efficiently (when you can't be sure which reverse attributes might be relevant beforehand). Without VAET you have to scan through all the possible combinations in AVET. |
This only arises when deleting entities, right? |
I'm saying that if you have many different I don't know exactly how VAET would be wired up internally, but I would expect to see something else more advanced than the EAVT filter here: datascript/src/datascript/db.cljc Line 504 in 6e80073
Again, I doubt most users would be storing enough attributes or overall data in DataScript to warrant changing anything. Most of the time people know exactly which attributes they need in their queries. For the curious, this is where near enough the same kind of scanning occurs for deleting entities: datascript/src/datascript/db.cljc Line 1278 in 6e80073
|
In the Datalog-derived system I built, VAET was used to determine which records had a particular attribute-value pair, where the cardinality of the attribute was low; I found it to be essential in deletion.
But this was for a multi-terabyte database with billions of tuples.
… On Jun 28, 2020, at 7:14 PM, Jeremy Taylor ***@***.***> wrote:
I'm saying that if you have many different :db.type/ref attributes then you can't query [?e _ :some-id] without naively scanning through all combinations in AVET.
I don't know exactly how VAET would be wired up internally, but I would expect to see something else more advanced than the EAVT filter here:
https://github.com/tonsky/datascript/blob/6e80073355ef35bf0b0d94afd3dd5d0f97ded2c1/src/datascript/db.cljc#L504
Again, I doubt most users would be storing enough attributes or overall data in DataScript to warrant changing anything. Most of the time people know exactly which attributes they need in their queries.
For the curious, this is where near enough the same kind of scanning occurs for deleting entities: https://github.com/tonsky/datascript/blob/6e80073355ef35bf0b0d94afd3dd5d0f97ded2c1/src/datascript/db.cljc#L1278
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
That would require VAET, of course, but I can’t imagine an use-case where anybody would need that type of query. The only use-case I come up with is deletion, and it’s a tradeoff — faster deletions or faster insertions (one less index to fill). |
Yeah, it's certainly a tradeoff. I suspect that without VAET the deletion time cost might be too unpredictable at a large-enough scale to maintain consistent throughput, whereas the space (+ time) costs of inserting into VAET are amortised which makes it a much safer default for a highly-distributed production system like Datomic. I think the non-deletion use-cases can be described as "exploratory" queries, where your data set is large (many attributes) and not well understood (can't predict which attributes might be relevant). I imagine Roam's knowledge graph "backlinks" feature might be a good use-case for benefiting from a native VAET (backlink!) index, if such a graph ever grew large enough (i.e. multi-user). Incidentally, I just came across these discussions that mention Datomic's ability to do wildcard reverse lookups with the Entity API via
Providing a similar index to VAET is something that the Crux team has been contemplating recently, motivated by our work on a generic entity navigation UI. Crux avoids the deletion issue by choosing not to enforce referential integrity, i.e. there are no explicit reference-type attributes, only values and IDs that might happen to correspond. I'll leave it at that for my contributions to this thread but hopefully it's useful context for some. I remember thinking about this very question several years ago, so it's good to have finally written it down :) |
Here is a use case that produces potentially large numbers of attributes and in which a query of the form of [?es _ e123] is relevant: One way to import large vectors (for example an ordered sequence of events) is to use “numbered” attributes. For example [event-source-id :345 event-id] where event-source is the entity representing the source of the ordered sequence of events and event is the event at index 345 in the vector. In this case, [?event-source _ specific-event-id] is a legitimate query clause to find the source metadata relating to a specific event. This use case uses potentially a large number of attributes. And this data design is the one of the few ones having similar performance characteristic to a vector in Clojure or array in other languages. This use case was important enough to be integrated in the RDF schema standard: |
Hi @tonsky , I am curious about your feedback on this point. I can easily extend the argumentation if you are interested... |
Thank you @bluesun, your argument is clear. I do not like the approach, it feels like misuse of the attribute field, it produces an unbounded amount of keywords, requires string concatenation/parsing if you want two ordered attributes. And, it works poorly with existing indexes structure :) |
Why the VAET index wasn't implemented?
Will be it useful to implement it?
The text was updated successfully, but these errors were encountered: