Inquiry on optimizing local cache sync when resuming a query #12270

KhaledShehadeh · 2024-01-13T15:49:41Z

KhaledShehadeh
Jan 13, 2024

Background

There is a prevalent technique used in Firestore No-SQL database that saves document read count.

This requires two fields added to every document.

lastUpdate : Timestamp
deleted : Bool

Whenever a document is updated, a new timestamp is issued

When a client reruns a query, only updated documents are received. It is done by the following steps

From cache get the latest 'lastUpdated' field:

CollectionRef.order(by: "lastUpdated", descending: true).limit(to: 1)

From server get after that date:

CollectionRef.whereField("lastUpdated", isGreaterThanOrEqualTo: latestDate)

From cache get the merged documents:

CollectionRef.getDocuments()

if cache is empty, get from server:

CollectionRef.whereField("deleted", isEqualTo: false)

For more details, check out this article:
https://betterprogramming.pub/firebase-firestore-cut-costs-by-reducing-reads-edfccb538285

The problem with the above is that pagination becomes complex, and you cannot delete documents from the server as they are needed to tell the clients that they have been deleted, or else cache will be out of date and client will still see deleted documents.

So we need to modify step 3 from above:
3. From cache

CollectionRef.whereField("deleted", isEqualTo: false)

So now, deleting a document protocol is not deleting anymore, it is switching the 'deleted' field to true, and updating 'lastUpdated' timestamp.

Question

#11457 Firestore: Optimize local cache sync when resuming a query that had docs deleted

This update is linked in Firebase IOS SDK release notes, version 10.12.0 under 'Cloud Firestore' 'FEATURE'
https://firebase.google.com/support/release-notes/ios

Also from the same link

Firebase IOS SDK release notes, version 10.19.0 under 'Cloud Firestore' 'FIXED'
Is relevant to my question too

It says exactly what I described above, but automatically done, and keeping documents is no longer required, now we can deleted documents completely in the server and it will be synced. Right?

So does that mean implementing the above technique is obsolete now?

Testing

I have 10 documents in "Users" Collection
I fetch all documents and received 10
I change 2 documents
I rerun the query and receive 10 again.

Am I actually only getting 2 documents from the server, then the sdk merges with cache automatically like it does in a listener.
So am I being billed 2 documents or 10 here? From my understanding in above linked #11457, I am only getting 2.

@dconeybe

Thank you.

Answered by dconeybe

Jan 15, 2024

tl;dr No, the PR you mentioned, #11457, does not make the lastUpdated/deleted workaround obsolete. That workaround still has the benefit of reducing billed document reads.

The optimization in #11457 improves upon how Firestore internally implements a solution to the problem that your workaround solves using the deleted property. It does not, however, solve the optimization gained by using the lastUpdated property.

Background Information

Whenever Firestore executes a query, the server's response includes a "resume token". This resume token is saved into the client's local persistence along with the document data that was received. If the query is later executed again then the client includ…

View full answer

dconeybe · 2024-01-13T16:05:56Z

dconeybe
Jan 13, 2024
Collaborator

Thanks for the question, @KhaledShehadeh. I'll reply on Monday when I'm back to work.

0 replies

dconeybe · 2024-01-15T18:05:30Z

dconeybe
Jan 15, 2024
Collaborator

tl;dr No, the PR you mentioned, #11457, does not make the lastUpdated/deleted workaround obsolete. That workaround still has the benefit of reducing billed document reads.

The optimization in #11457 improves upon how Firestore internally implements a solution to the problem that your workaround solves using the deleted property. It does not, however, solve the optimization gained by using the lastUpdated property.

Background Information

Whenever Firestore executes a query, the server's response includes a "resume token". This resume token is saved into the client's local persistence along with the document data that was received. If the query is later executed again then the client includes the saved resume token in the request sent to the server. If the server sees a resume token in the request then its response only includes documents that have been created or modified since the resume token was sent to the client.

The problem with deleted documents

This algorithm works fine, as long as documents are never deleted (or modified such that they no longer match the query's filters). If a document is deleted then it is simply omitted from the server's response. The client, then, has no way to tell if the document was deleted or if it simply was not modified since the resume token. To solve this, the server includes a document count in its response to indicate the total number of documents that matched the query, even though only a subset of those documents may have actually been sent by the server. The client, then, can compare this count with the count of documents in its local cache. If the count matches then everything is good; however, if the count does not match then one or more documents must have been deleted. This is internally called an "existence filter mismatch".

The full requery and limbo resolution

The client first needs to figure out which documents were deleted, which were modified to no longer match the query's filters, and which were not modified at all. To do this, the client re-runs the entire query from scratch to get the full result set from the server. This is internally referred to as a "full requery", and is the costly thing that the PR attempts to avoid. With the full result set, the client can determine which documents in its local cache were deleted or modified to no longer match the query's filters. To bring the local cache back into sync, the client issues individual document reads for each of these documents which is internally called "limbo resolution". The server's response to the individual document reads tells the client if the documents were deleted or modified to no longer match the query. The client updates its local cache and is then back in sync with the server.

How the PR solves the full requery using a bloom filter

The "full requery" mentioned above can be quite costly, both in terms of bytes sent over the network and the number of billed document reads. The PR #11457 nearly eliminates the need for the full requery, allowing the client to go straight to limbo resolution. It achieves this by adding a "bloom filter" to the server's response. The bloom filter contains the names of all documents that would have been returned by the full requery, and, using that information, the client can determine which documents need to undergo limbo resolution without having to run a full requery. But since bloom filters are probabilistic data structures, occasionally they don't work as desired and the client falls back to a full requery in these cases.

A problem in the workaround due to clock synchronization

One problem that I think the workaround you linked to at betterprogramming.pub suffers from is that of clock synchronization. That algorithm assumes that the clocks on all devices are synchronized; however, in practice that is not necessarily true. If the clocks on different devices are out of sync then the algorithm could miss updates. It could even be a problem if only 1 device is used due to clock skew, such as the time going back 1 hour due to daylight savings or the user manually changing the device's time. To fix this, the clients need to use a shared clock or shared distributed counter. Both of these would require the device to be online. Another solution could leverage Firestore's "serverTimestamp" field value. That way, when you create or modify the lastUpdated field, instead of setting it to a concrete value you set it to FieldValue.serverTimestamp(). Then, when the document is saved to the server the value will be replaced by the server's globally-shared and globally consistent timestamp. The challenge would be that clients would not know this value until the write is committed and the write response from the server is received (which could be a long time if the device is offline).

Anyways, I hope this information is helpful. Thanks again for the question. It's clear that you've done your research and did your best to understand the PR.

1 reply

KhaledShehadeh Jan 16, 2024
Author

Thank you for this complete answer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry on optimizing local cache sync when resuming a query #12270

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Inquiry on optimizing local cache sync when resuming a query #12270

KhaledShehadeh Jan 13, 2024

Background

Question

Testing

Background Information

Replies: 2 comments · 1 reply

dconeybe Jan 13, 2024 Collaborator

dconeybe Jan 15, 2024 Collaborator

Background Information

The problem with deleted documents

The full requery and limbo resolution

How the PR solves the full requery using a bloom filter

A problem in the workaround due to clock synchronization

KhaledShehadeh Jan 16, 2024 Author

KhaledShehadeh
Jan 13, 2024

Replies: 2 comments 1 reply

dconeybe
Jan 13, 2024
Collaborator

dconeybe
Jan 15, 2024
Collaborator

KhaledShehadeh Jan 16, 2024
Author