-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avet index out of sync with datoms #386
Comments
Interesting! I suspect NaNs might be the reason, since they are pretty weird:
The fact that you only see difference in Can you try modified Datascript build? I suggest you insert NaN checks around here datascript/src/datascript/db.cljc Lines 335 to 343 in 82487ee
|
Hi again @tonsky! I didn't pursue the NaN issue yet, because my results so far didn't lead towards it being the root of the problem. Our NaNs are within maps that we store as values. I do think it's something around value comparison though. At the moment I'm able to isolate the transaction that takes a db's avet index to a broken state (I'll describe what a broken state is later), but attempting to do the transaction in isolation on a toy db does not seem to break it. The only thing odd about this transaction is that it has a number value instead of a string (like other existing entities) for a Regarding what constitutes a broken db, I define it as a db that's not equal to a clone of itself, where equality is defined by schemas, indexes, and datoms being the same: (ns relemma.shared.logic.datascript.equality
(:require
[clojure.data :as data]
[clojure.walk :as walk]
[datascript.core :as datascript]))
(defn NaN? [x]
;; NaN is the only number that isn't equal to itself.
;; NB: both (= ##NaN ##NaN) and (not= ##NaN ##NaN) are false in CLJ (but not CLJS) for some reason.
(and (number? x) (false? (= x x))))
(defn strip-NaN [x]
(walk/postwalk #(if (NaN? %) 0 %) x))
(defn datom->vec [[e a v t]]
[e a v t])
(defn normalized-datoms [db]
(->> db
seq
(map datom->vec)
set
strip-NaN))
(defn datom-diff [db1 db2]
(data/diff (normalized-datoms db1) (normalized-datoms db2)))
(defn datoms= [db1 db2]
(= (normalized-datoms db1) (normalized-datoms db2)))
(defn diff-idx [idx x y]
(data/diff (idx x) (idx y)))
(defn index= [db1 db2]
(let [aevt-diff (diff-idx :aevt db1 db2)
eavt-diff (diff-idx :eavt db1 db2)
avet-diff (diff-idx :avet db1 db2)]
(= (first aevt-diff) (second aevt-diff)
(first eavt-diff) (second eavt-diff)
(first avet-diff) (second avet-diff)
nil)))
(defn schema= [db1 db2]
(= (:schema db1) (:schema db2)))
(defn db= [db1 db2]
(and (schema= db1 db2)
(index= db1 db2)
(datoms= db1 db2)))
(defn clone-db [db]
;; NB: (seq db) is the same as (d/datoms db :eavt)
(datascript/init-db (seq db) (:schema db)))
(defn broken-db? [db] (not (db= db (clone-db db)))) In the case I describe above, the db before the transaction returns false from There's something else interesting about the avet index diff though. Below is the way by which the indexes between the db and the clone differ. I've redacted the uids, but I think the only relevant thing was that they didn't repeat across or inside the sets. Notice how the second set has the numeric uid in the middle.
For the second set, ;; before the numeric block uid d/entity doesn't work
(datascript.core/entity broken-db-clone [:block/uid "redacted-006"]) => nil
;; after the numeric block uid d/entity works
(datascript.core/entity broken-db-clone [:block/uid "redacted-020"]) => {:db/id 4100} This behaviour makes me thing there's something related to order going on here. Worth mentioning that the diff result set is always the same, so I don't think there's any race conditions going on. I'm still trying to make an isolated repro but like I mentioned the toy example I attempted did not reproduce the issue. I was hoping you might have some hints of what I could be looking for to narrow it down to a usable repro. |
Probably not a good idea either, same problem as with NaNs: they violate comparison invariants
|
That makes sense with what I'm seeing then. I guess what's happening is that one of the protocols for search that Datascript implements is hitting that comparison and then thinks the element isn't there at all? The question then is what the intended behaviour from Datascript is at this point. Should CLJS Datascript fail on transact (like the CLJ version) when the comparison invariants are violated, or should the user attempt to enforce this? |
I think it should fail, but it’s hard to figure out which values are comparable and which are not. That’s why it was left as a “loose constraint” for users to enforce. If values are not comparable, don’t put them in the same indexed attribute (non-indexed should be fine). If you have an idea how to enforce it, I’ll be happy to implement. |
Here's the minimal repro I could make without any dependencies other than datascript. It shows how ent lookups stop working for some entities, but not all, and how the avet index is no longer equal between cloned dbs. (ns filipe.playground
(:require
[datascript.core :as ds]
[clojure.data :as data]))
;; I'm not sure why only some sets of txs reproduce this problem, but
;; guess it has to do with how Datascript implements searches.
(def txs [[{:block/uid "2LB4tlJGy"}]
[{:block/uid "2ON453J0Z"}]
[{:block/uid "2KqLLNbPg"}]
[{:block/uid "2L0dcD7yy"}]
[{:block/uid "2KqFNrhTZ"}]
[{:block/uid "2KdQmItUD"}]
[{:block/uid "2O8BcBfIL"}]
[{:block/uid "2L4ZbI7nK"}]
[{:block/uid "2KotiW36Z"}]
[{:block/uid "2O4o-y5J8"}]
[{:block/uid "2KimvuGko"}]
[{:block/uid "dTR20ficj"}]
[{:block/uid "wRmp6bXAx"}]
[{:block/uid "rfL-iQOZm"}]
[{:block/uid "tya6s422-"}]
;; This is the transaction that breaks the db.
;; It has a number instead of a string for a unique attr.
[{:block/uid 45619}]])
(def schema {:block/uid {:db/unique :db.unique/identity}})
(def conn (ds/create-conn schema))
;; Make a working db by applying all txs but the one with a different type,
;; then also make the broken db.
(doseq [tx (butlast txs)]
(ds/transact! conn tx))
(def working-db @conn)
;; CLJ will throw on this transact (can't compare long to string), but CLJS will accept it.
(ds/transact! conn (last txs))
(def broken-db @conn)
;; We can find broken lookups by attempting to look up all the datoms.
(defn broken-ent-lookups [db]
(->> (seq db)
(map (fn [[_ a v]] [a v]))
(remove #(ds/entity db %))))
;; The broken db cannot lookup a few of the entities in its datoms.
(broken-ent-lookups broken-db) ;; => ([:block/uid "wRmp6bXAx"] [:block/uid "rfL-iQOZm"])
;; The working db doesn't have this problem.
(broken-ent-lookups working-db) ;; => ()
;; Just to reiterate how adding that last transaction broke existing lookups:
(ds/entity working-db [:block/uid "wRmp6bXAx"]) ;; => #:db{:id 13}
(ds/entity broken-db [:block/uid "wRmp6bXAx"]) ;; => nil
(ds/entity working-db [:block/uid "rfL-iQOZm"]) ;; => #:db{:id 14}
(ds/entity broken-db [:block/uid "rfL-iQOZm"]) ;; => nil
;; Another way to identify that the db is broken is by diffing the avet index on db clones.
(defn clone-db [db]
(ds/init-db (seq db) (:schema db)))
(defn diff-avet-idx [x y]
(data/diff (:avet x) (:avet y)))
;; The broken db diff will say there's two datoms that look identical but are exclusive to each index.
(diff-avet-idx (clone-db broken-db) (clone-db broken-db))
;; => [#{#datascript/Datom [16 :block/uid 45619 536870928 true]
;; #datascript/Datom [14 :block/uid "rfL-iQOZm" 536870926 true]}
;; #{#datascript/Datom [16 :block/uid 45619 536870928 true]
;; #datascript/Datom [14 :block/uid "rfL-iQOZm" 536870926 true]}
;; ...]
;; The working db diff will say the indexes don't have any differences.
(diff-avet-idx (clone-db working-db) (clone-db working-db))
;; => [nil nil ...] |
The first thing that comes to mind is But I'm not sure if this has some weird unforeseen interaction that ends up being a regression. And I think the principle behind this change should be to prevent cases where the indexes are broken without affecting any other cases. So maybe the better approach would be to verify the invariant, and throw if it is not ensured when adding the index entry. e.g. when adding |
Also relevant is how such a change would affect serialization with How would a user recover from such a case? Probably by trying to remove the problematic datoms. But I'm not sure if This is especially bad because a user can have a partially broken index without knowing, since the symptom is that some entities are unavailable for lookups, but others are still fine. |
The issue is very nasty, but tricky part is that it’s hard to figure out that values are comparable in Clojure. Same class might implement comparison incorrectly. Different classes might be comparable (e.g. Vector and Seq). Different classes might be not comparable and still return some value (string and number). It’s also tricky to decide where this check should be inserted. When adding new value, which values should I compare it with? With all already added is too expensive. Even a check during a binary search is taxing. The best would probably be to fix type in the schema, just like Datomic does. That would limit some valid use-cases (e.g. storing vectors and seqs next to each other, or storing user-defined types), but we’ll solve that issue. Unfortunately, this route is too backwards-incompatible to actually implement. |
I'm looking around the Datascript codebase to see if there's something I could do, and I noticed something interesting: trying to evaluate the code below in a CIDER cljs lein nodes repl actually errors out: (let [conn (d/create-conn {:block/uid {:db/unique :db.unique/identity}})
tx1 [{:block/uid "tya6s422-"}]
_ (d/transact! conn tx1)
tx2 [{:block/uid 45619}]
_ (d/transact! conn tx2)]
)
The verbiage around the real fns themselves is java, but the eval'd string looks like it's javascript. I'm not really sure what throws the error either. But this looked odd and interesting. |
Another different approach here might be to actually allows these "bad" comparisons via the fallback hash in I'm still trying to understand where the comparison goes wrong, because that hash compare is there, but this seems like an interesting approach. |
I'm looking at this fn:
Should this be using |
This compares attribute keywords themselves, not the values |
Oh I think I figured out why in #386 (comment) I saw the value comparison error - I was on an old version of the Datascript repo, 0.18.8 actually. In the latest master there's no error. So in 0.18.8 transacting different data types was actually an error but it's not anymore. Is this issue a regression then? |
Also figured out where the datascript/src/datascript/db.cljc Lines 368 to 373 in a769e0f
|
That's interesting, they are comparable by hash just fine. Then this probably has nothing to do with insertion, and is about the entity lookup instead. Maybe that's not using the hash comparison? Will look into that next. |
Sure. It’s not the question of comparing hashes, that’s the simple part. The hard part is figuring out that your default compare is no good and you need to fall back to hashes |
I'm a bit confused about that part. My impression from reading bits around (defn value-compare [x y]
(cond
(= x y) 0
#?@(:clj [(instance? Number x) (clojure.lang.Numbers/compare x y)])
#?@(:clj [(instance? Comparable x) (.compareTo ^Comparable x y)]
:cljs [(satisfies? IComparable x) (-compare x y)])
#?@(:cljs [(and (or (string? x) (array? x) (true? x) (false? x))
(identical? (type x) (type y))) (garray/defaultCompare x y)])
:else (- (hash x) (hash y)))) This in turn does the fallback to hashes for the string/number case, and seems to compare them correctly: (value-compare 45619 "tya6s422-") ;; => -458678613
(value-compare "tya6s422-" 45619) ;; => 458678613
(- (hash 45619) (hash "tya6s422-")) ;; => -458678613
(- (hash "tya6s422-") (hash 45619)) ;; => 458678613 So at first blush this seems like it should be fine? That's the part that confuses me. Is there some other value comparator elsewhere that I'm missing? |
Hmm, yes, if it gets to hash, it should be fine. |
So far I've traced the bad behaviour down to I added some logging to the comparator fn:
Then, still using the repro code in #386 (comment), I called the following:
I find odd here is that the comparator is being called a second set of times after finding the right thing in the first case. But then I looked at the source in https://github.com/tonsky/persistent-sorted-set/blob/e29c02cb1991e29f968c78eb9760566f3d7231b3/src-clojure/me/tonsky/persistent_sorted_set.cljs#L744-L768 and The other thing thing that crossed my mind was "it's common enough to get - operations the other way around so why don't I just try flipping the And that did the trick. ;; Now the broken-db lookups work instead of retuning nil
(d/entity working-db [:block/uid "wRmp6bXAx"]) ;; => #:db{:id 13}
(d/entity broken-db [:block/uid "wRmp6bXAx"]) ;; => #:db{:id 13}
(d/entity working-db [:block/uid "rfL-iQOZm"]) ;; => #:db{:id 14}
(d/entity broken-db [:block/uid "rfL-iQOZm"]) ;; => #:db{:id 13} I guess this never came up in tests because the problem was the flipped comparator value needed to take you down a wrong path in a b-tree to reproduce. Putting together a PR and tests. |
Thinking about it some more, I'm still confused as to why the hash comparison needed to be flipped. As long as it was self-consistent, there shouldn't be a problem right? |
I haven't done the PR because I'm pretty sure flipping the hash comparison is the wrong fix and would just move the problem to some other path of the binary tree. Not really sure how to proceed from here, but at this point the problem looks related to where some values are ending up in the binary tree given the comparator. It's interesting that the two lookups that fail share a lot of the comparisons with the insertion of the number value:
|
I suppose hash comparison is not going to work out. We lose transitivity:
E.g. comparing string-string and number-number gives you different order than comparing string-number or number-string. To make this order correct I think we should do this: compare type first (as string, for example), if equal, then hashes inside that type. That should give us a total order and get rid of the error. Can you test it out? Something like (defn value-compare [x y]
(cond
(= x y) 0
#?@(:clj [(instance? Number x) (clojure.lang.Numbers/compare x y)])
#?@(:clj [(instance? Comparable x) (.compareTo ^Comparable x y)]
:cljs [(satisfies? IComparable x) (-compare x y)])
(not (identical? (type x) (type y))) (garray/defaultCompare (type x) (type y))
#?@(:cljs [(or (string? x) (array? x) (true? x) (false? x)) (garray/defaultCompare x y)])
:else (- (hash x) (hash y)))) |
I think I see the transitivity problem. So the solution you're proposing is basically saying "anything in this type is always bigger/smaller than anything in that type", right? Going to give it a try. |
Yes, well put! |
Yes I can confirm that in the sample I put above this approach does indeed fix it. I'll put up a PR with the fix, and a test with the repro. |
Here it is, PTAL: #388 Also, it was very nice to have the back and forth here with you while trying to get to the bottom of this, thank you 😀 . |
Fixed in 1.0.7 |
Hi there 👋
I'm seeing some odd behaviour with Datascript where two databases with the same datoms and schema show different results for entity lookups.
For some background context, we keep a persistent log of serialized datascript transactions on a server and also save a serialized version of database every ~500 txs. Serialization used
datascript-transit
.When the app starts, it will load the most recent serialized db and then apply all txs since then.
We noticed this problem because the we saw a serialized db render differently than the previously db plus the applied transactions.
In the code examples below,
old-db
is the db that had a series of transactions applied to it in-memory, andnew-db
is that same db after a serialization round trip.To check these two databases were the same, I compared the datoms and schema.
I tracked it down to the entity lookup on these two databases.
I thought this was really weird so then I compared the indexes for a sanity check.
Worth mentioning in case it's relevant: this db includes a few
js/NaN
values that I filtered out of the results, and nested maps as values.I'm trying to get a standalone minimal repro up but this is not trivial since this DB contains a lot of sensitive information. I think I'm able to pare it down though.
I wanted to open the issue as soon as possible in case you had some tips on what might be happening or what I should focus on for the repro.
I couldn't find any other open issues that seemed to be the same as this one, but #373 sounds similar.
I think this is a bug because I expect databases that have the same datoms and schema to also have the same index, and because the entity lookup should return the entity if it exists.
If there are legitimate cases that don't conform to these expectations, please let me know.
Cheers, and thanks for the awesome database!
The text was updated successfully, but these errors were encountered: