Search quality decreases significantly in case of a rapidly growing faiss index #3582

mkapry · 2023-12-28T13:53:51Z

mkapry
Dec 28, 2023

Summary

Hi!
In my project I have to deal with a rapidly growing collection of objects (the indexation process is ongoing), which are indexed into faiss index. To be more accurate, let's say that the training set (initial collection) of ~5-7 million objects is ~100 times smaller than the potential size of the whole collection (after the indexation ends). According the guidelines, for this use case I should use something like OPQ32_128,IVF262144,PQ32. But it turns out that the search quality rapidly falls as the index grows. For example, for 10 million collection recall would be 85%, and for 52 million it is 69%. Seems a bit too fast for me)
I also discovered that only about 60% of all centroids are non-empty, which means that the index is quite unbalanced. Is it likely to be some kind of issue with the faiss build I use?
Would really appreciate any advice on index choice and hyperparameter tuning or any other suggestions.

Faiss version: 1.7.3

Installed from: conda

Running on:
CPU

Interface:
Python

mdouze · 2024-01-02T18:57:41Z

mdouze
Jan 2, 2024
Collaborator

what is your accuracy measure?
is there a distribution drift when increasing the index size? it is not a good sign that 60% centroids are non-empty.

Normally, the 1-recall@1 should improve for fixed nprobe and nlist when the index size increases.

0 replies

mkapry · 2024-01-03T11:49:14Z

mkapry
Jan 3, 2024
Author

@mdouze, thanks for the fast answer)
The accuracy measure I use is Recall@2048, which is recall for top 2048 candidates. The metric is calculated on a fixed test set.
About the distribution drift: I can't tell for sure (as I receive data from different sources), but it shouldn't be any significant drift.
The question is if it is supposed to be so or it's likely to be something wrong with the index.

0 replies

mdouze · 2024-03-20T01:36:55Z

mdouze
Mar 20, 2024
Collaborator

Is ground-truth brute force search ground truth or some application ground-truth?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search quality decreases significantly in case of a rapidly growing faiss index #3582

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Search quality decreases significantly in case of a rapidly growing faiss index #3582

mkapry Dec 28, 2023

Summary

Replies: 3 comments

mdouze Jan 2, 2024 Collaborator

mkapry Jan 3, 2024 Author

mdouze Mar 20, 2024 Collaborator

mkapry
Dec 28, 2023

mdouze
Jan 2, 2024
Collaborator

mkapry
Jan 3, 2024
Author

mdouze
Mar 20, 2024
Collaborator