Inquiry on optimizing local cache sync when resuming a query #12270
-
BackgroundThere is a prevalent technique used in Firestore No-SQL database that saves document read count. This requires two fields added to every document.
Whenever a document is updated, a new timestamp is issued When a client reruns a query, only updated documents are received. It is done by the following steps
if cache is empty, get from server:
The problem with the above is that pagination becomes complex, and you cannot delete documents from the server as they are needed to tell the clients that they have been deleted, or else cache will be out of date and client will still see deleted documents. So we need to modify step 3 from above:
So now, deleting a document protocol is not deleting anymore, it is switching the 'deleted' field to true, and updating 'lastUpdated' timestamp. Question#11457 Firestore: Optimize local cache sync when resuming a query that had docs deleted
Also from the same link
It says exactly what I described above, but automatically done, and keeping documents is no longer required, now we can deleted documents completely in the server and it will be synced. Right? So does that mean implementing the above technique is obsolete now? TestingI have 10 documents in "Users" Collection Am I actually only getting 2 documents from the server, then the sdk merges with cache automatically like it does in a listener. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Thanks for the question, @KhaledShehadeh. I'll reply on Monday when I'm back to work. |
Beta Was this translation helpful? Give feedback.
-
tl;dr No, the PR you mentioned, #11457, does not make the The optimization in #11457 improves upon how Firestore internally implements a solution to the problem that your workaround solves using the Background InformationWhenever Firestore executes a query, the server's response includes a "resume token". This resume token is saved into the client's local persistence along with the document data that was received. If the query is later executed again then the client includes the saved resume token in the request sent to the server. If the server sees a resume token in the request then its response only includes documents that have been created or modified since the resume token was sent to the client. The problem with deleted documentsThis algorithm works fine, as long as documents are never deleted (or modified such that they no longer match the query's filters). If a document is deleted then it is simply omitted from the server's response. The client, then, has no way to tell if the document was deleted or if it simply was not modified since the resume token. To solve this, the server includes a document count in its response to indicate the total number of documents that matched the query, even though only a subset of those documents may have actually been sent by the server. The client, then, can compare this count with the count of documents in its local cache. If the count matches then everything is good; however, if the count does not match then one or more documents must have been deleted. This is internally called an "existence filter mismatch". The full requery and limbo resolutionThe client first needs to figure out which documents were deleted, which were modified to no longer match the query's filters, and which were not modified at all. To do this, the client re-runs the entire query from scratch to get the full result set from the server. This is internally referred to as a "full requery", and is the costly thing that the PR attempts to avoid. With the full result set, the client can determine which documents in its local cache were deleted or modified to no longer match the query's filters. To bring the local cache back into sync, the client issues individual document reads for each of these documents which is internally called "limbo resolution". The server's response to the individual document reads tells the client if the documents were deleted or modified to no longer match the query. The client updates its local cache and is then back in sync with the server. How the PR solves the full requery using a bloom filterThe "full requery" mentioned above can be quite costly, both in terms of bytes sent over the network and the number of billed document reads. The PR #11457 nearly eliminates the need for the full requery, allowing the client to go straight to limbo resolution. It achieves this by adding a "bloom filter" to the server's response. The bloom filter contains the names of all documents that would have been returned by the full requery, and, using that information, the client can determine which documents need to undergo limbo resolution without having to run a full requery. But since bloom filters are probabilistic data structures, occasionally they don't work as desired and the client falls back to a full requery in these cases. A problem in the workaround due to clock synchronizationOne problem that I think the workaround you linked to at betterprogramming.pub suffers from is that of clock synchronization. That algorithm assumes that the clocks on all devices are synchronized; however, in practice that is not necessarily true. If the clocks on different devices are out of sync then the algorithm could miss updates. It could even be a problem if only 1 device is used due to clock skew, such as the time going back 1 hour due to daylight savings or the user manually changing the device's time. To fix this, the clients need to use a shared clock or shared distributed counter. Both of these would require the device to be online. Another solution could leverage Firestore's "serverTimestamp" field value. That way, when you create or modify the Anyways, I hope this information is helpful. Thanks again for the question. It's clear that you've done your research and did your best to understand the PR. |
Beta Was this translation helpful? Give feedback.
tl;dr No, the PR you mentioned, #11457, does not make the
lastUpdated
/deleted
workaround obsolete. That workaround still has the benefit of reducing billed document reads.The optimization in #11457 improves upon how Firestore internally implements a solution to the problem that your workaround solves using the
deleted
property. It does not, however, solve the optimization gained by using thelastUpdated
property.Background Information
Whenever Firestore executes a query, the server's response includes a "resume token". This resume token is saved into the client's local persistence along with the document data that was received. If the query is later executed again then the client includ…