Replies: 3 comments
-
what is your accuracy measure? Normally, the 1-recall@1 should improve for fixed nprobe and nlist when the index size increases. |
Beta Was this translation helpful? Give feedback.
-
@mdouze, thanks for the fast answer) |
Beta Was this translation helpful? Give feedback.
-
Is ground-truth brute force search ground truth or some application ground-truth? |
Beta Was this translation helpful? Give feedback.
-
Summary
Hi!
In my project I have to deal with a rapidly growing collection of objects (the indexation process is ongoing), which are indexed into faiss index. To be more accurate, let's say that the training set (initial collection) of ~5-7 million objects is ~100 times smaller than the potential size of the whole collection (after the indexation ends). According the guidelines, for this use case I should use something like OPQ32_128,IVF262144,PQ32. But it turns out that the search quality rapidly falls as the index grows. For example, for 10 million collection recall would be 85%, and for 52 million it is 69%. Seems a bit too fast for me)
I also discovered that only about 60% of all centroids are non-empty, which means that the index is quite unbalanced. Is it likely to be some kind of issue with the faiss build I use?
Would really appreciate any advice on index choice and hyperparameter tuning or any other suggestions.
Faiss version: 1.7.3
Installed from: conda
Running on:
CPU
Interface:
Python
Beta Was this translation helpful? Give feedback.
All reactions