-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance difference between files getting opened with IOContext.RANDOM vs IOContext.READ during merges #13920
Comments
Thanks for opening the issue. I already made similar suggestion in another PR and also the mailing list. I'd go the route and temporarily change the IOContext to SEQUENTIAL. This may of course slow down random reads, but on the other hand once the whole file is merged away (and was therefor read) it should be in FS cache anyways. If not, you have too less memory, like @s1monw says: "Add more RUM" :-) Users of the old segment which was merged away will only use it till the next IndexReader reopen, soby signaling that we read it only once it's a good idea to get rid of it from cache soon. So my proposal is:
|
@jpountz ping. |
That makes sense to me. I wonder if we actually need to revert the advice back to normal in the end, or if we can optimistically assume that it's unlikely that we'll need to reclaim that RAM for something else before the next refresh picks up this segment. In terms of hooking into existing APIs, the |
@uschindler On a high level it makes sense to me. I have a couple of questions so that I understand this better
I am interested in understanding when its appropriate to clone. Based on the javadoc for IndexInput, for multithreaded use IndexInput must be cloned. My understanding is that merges will have a separate thread. Since the Readers are pooled and the same instance is used - I cloned it when I was trying to benchmark the solution. Would appreciate if you can give insights on why the indexInput shouldn't be cloned in this case.
In the benchmarking code, I did not revert it back thinking the reader will be closed and a new reader will be opened with the intended IOContext in this case random. Would you be able to share insights on reverting it back considering there will be new reader. |
Cloning a reader wont clone any input which is a fully different thing. Don't care about cloning or not for implementing this issue. The Javadocs and usage pattern (when to clone) is more about stateful use of IndexInputs (they have a read position which can't be updated from multiple threads). If you are about pure random access, you can get a view on it as RandomAccessInput (which is sometimes used for Lucene). In all cases: The cloned inputs use exactly the same MemorySegments behind scenes (in former days it was ByteBuffers, those were also duped for clones). What I wanted to say is: When you change the read advice, it will affect all clones, too. Therefor it is not needed to create a clone of the IndexInput. So basically it simplifies thigs: The CodecReader that is used for merging (and used at same time also for searching) can just be instucted to change the read advice on its backing IndexInput. Thats relatively simple to implenment and won't affect the current behaviour how merging works.
As this is just an advise: When done with merging, just revert back. Opening new readers is too expensive and mostly not useful. In addition, you can't control if searchers using the index in parallel reopen the readers soon. @jpountz already discussed that. This largely depends on how often you reopen IndexReaders. So in general it would be a good idea to revert back to the original state if the IndexReader is still used for longer time. This really depends on the usage pattern. Reverting to normal use is as simple to implement as the initial change. Just change the advice on any of the open IndexInputs, no matter if it is shared by multiple readers and revert back at end. Thats totally uncoupled from the internals of IndexReaders.
Uwe |
Understood, so with not cloning we are avoiding any ambiguity whether the original and other clones are getting affected or not. Not cloning makes it clear that any thread using the reader will get affected With regards to reverting, its restoring the previous state. Wrapper approach sounds good to me, I will try it out. Thanks @uschindler and @jpountz for your inputs! |
@uschindler and @jpountz thanks for your inputs and detailed explanation. @shatejas I think all the required details are present, so are you going to raise a PR for this? @uschindler and @jpountz 1 more thing I would like to confirm once the PR is raised, till what version of Lucene the change can be backported to? Ideally I would like to get it backported till 9.12 version. But would like to know your inputs too. |
@uschindler one question on this, the reason why you say opening new readers is expensive because readers mostly open new IndexInput on a file? Also, one more question not related to particularly to this gh issue, does doing multiple times |
Yeah I am working on it, I have the changes and I am trying to figure out a good way to benchmark lucene |
Please stick with the approach: Plan to implement an API to tell an existing IndexReader to switch to "merge" mode and the underlying codec then can optionally change the madvise for the already open indexinputs. When done, switch back. |
@uschindler, sorry for the confusion. The plan was never to open multiple files. The question was more of general question from my side to understand the behavior. The implementation plan is still the same which you have recommended. I hope this clarify things. |
Description
For past 1 month we have been testing difference in performance for a files getting opened with IOContext.RANDOM vs IOContext.READ specially during merges with Lucene version 9.11.1 and Opensearch version 2.17. We started this deep-dive when we saw an increase in time for our force merges.
Background
Opensearch k-NN plugin provides the vector search capabilities with Opensearch. The architecture of k-NN plugin is very similar to how Lucene implements the vector search with few small key differences. Before 2.17 version of Opensearch, opensearch was using dvd files to store the raw the vectors and graphs files are stored separately. These graphs are not built using Lucene HNSW but with some external libraries like Faiss.
In 2.17 version of Opensearch, we started using KNNFlatVectorFormat to store and read the raw vectors in place of dvd files. As reading vectors from .vec files as float[] is more efficient than reading byte[] and then converting to float[]. Ref: opensearch-project/k-NN#1853
Observations
After doing the switch what we observed that our merge time for a 10M 768D dataset is increased by 20%. We did an extensive deep-dive/experiments on the root cause(ref1, ref2) and code difference between dvd files and .vec files format and was able to see that IOContext.RANDOM with .vec files is causing this regression.
This regression comes because during merges for every Lucene99FlatVectorsReader there are some operations happens like checkIntegrity(which does checksum of whole file), reading of all vector values to create new segment which are more of sequential reads than random reads.
I do believe that having a
RANDOM
madvise on.vec
file is more beneficial for search and graph merges as Lucene uses this file as a way to store raw vectors for HNSW. BTW This PR added the capability: #13267 for Random IOContext to .vec files.Solutions Tried:
We have been trying multiple solution(all on Lucene version 9.11.1) and have been in touch with @uschindler and @mikemccand over
[email protected] :
.vec
added the extra latency for Lucene HNSW search. We saw 2x increase in latency because of this on a 10M 768D dataset. Code refSEQUENTIAL
. One of the biggest con it has is we are creating 2 indexinputs for same file in different threads and it not been recommended as per this.What is an ideal solution?
In my mind an ideal solution will be the one which takes the advantage of different type of madvise and changes the madvise for the file based on the need(if merge of flatvectors is happening use Sequential, but if HNSW graph is building/searching then flip back to RANDOM). I am not sure what could be a consequence of this would like to know what community thinks about it. Similar to option 4.
Also, I do believe that since Lucene99FlatVectorsFormat now extends KNNVectorsFormat thanks to this PR: #13469, having an ability to change the madvise from consumers of this format is needed. So that Lucene99FlatVectorsFormat can be used independently and not always tied with HNSW.
FAQ
On what machines the benchmarks were performed with 10M 768D dataset?
Since Opensearch is distributed Lucene, the setup was like this:
Is the benchmarks performed on Lucene library independently?
No, we have not performed any benchmarks with Lucene library independently, but I am working on having a reproducible setup. If there are some easy way to setup and reproduce, please share.
cc: @shatejas, @vamshin, @jmazanec15
The text was updated successfully, but these errors were encountered: