Move vector search from IndexInput to RandomAccessInput #13938

jpountz · 2024-10-21T07:04:11Z

Description

Vector search currently loads vectors from disk by issuing a seek() followed by a readFloats(). We should instead:

Add an absolute readFloats() method to RandomAccessInput
Refactor the latest vector search file format to use RandomAccessInput instead of IndexInput to read vectors from disk.

The text was updated successfully, but these errors were encountered:

dungba88 · 2024-10-31T08:58:16Z

Hi, I'm learning Lucene KNN and this seems to be a workable PR for beginner. Just curious about the motivation behind this change. Is it only for cleaner code, or are we also suppose to make any latency improvement on the absolute readFloats method compare to the current seek() + readFloats()?

msokolov · 2024-10-31T12:22:44Z

I think this will be helpful since currently we cannot share these readers across threads -- they retain the state information about the current position. Not sure how much benefit that will be since they must still typically maintain some local temporary storage to retain the value that is read

dungba88 · 2024-11-01T06:11:29Z

I think this will be helpful since currently we cannot share these readers across threads -- they retain the state information about the current position. Not sure how much benefit that will be since they must still typically maintain some local temporary storage to retain the value that is read

Gotcha, the current usage of seek + readFloats requires the Reader to keep the seek position. When we change to the RandomAccessInput, we expect the operation to have no side-effect to the Reader and thus they will be sharable.

dungba88 · 2024-11-06T03:43:03Z

I looked at some implementation of RandomAccessInput, such as BufferedIndexInput. This particular class holds a single buffer for all reads, thus it cannot be shared. If we use temporary buffer (to make it shareable), then it kinda defeats the purpose of the single-buffer, which is to avoid excessive temporary buffers and GC. So it's unavoidable to have side-effects in read.

jpountz added the type:task label Oct 21, 2024

dungba88 added a commit to dungba88/lucene that referenced this issue Nov 8, 2024

Move vector search from IndexInput to RandomAccessInput (apache#13938)

b97aadb

This was referenced Nov 8, 2024

[DRAFT] Move vector search from IndexInput to RandomAccessInput (#13938) dungba88/lucene#28

Closed

[DRAFT] Change vector input from IndexInput to RandomAccessInput #13981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move vector search from IndexInput to RandomAccessInput #13938

Move vector search from IndexInput to RandomAccessInput #13938

jpountz commented Oct 21, 2024

dungba88 commented Oct 31, 2024

msokolov commented Oct 31, 2024

dungba88 commented Nov 1, 2024

dungba88 commented Nov 6, 2024

Move vector search from IndexInput to RandomAccessInput #13938

Move vector search from IndexInput to RandomAccessInput #13938

Comments

jpountz commented Oct 21, 2024

Description

dungba88 commented Oct 31, 2024

msokolov commented Oct 31, 2024

dungba88 commented Nov 1, 2024

dungba88 commented Nov 6, 2024