perf: teach SegmentReader
to lazily open/read its various SegmentComponents
#20
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This overhauls
SegmentReader
to put its various components behindOnceLock
s such that they can be opened and read on their first use, as oppoed when a SegmentReader is constructed -- which is once for every segment when an Index is opened.This has a negative impact on some of Tantivy's expectations in that an existing SegementReader can still read from physical files that were deleted by a merge. This isn't true now that the segment's physical files aren't opened until needed. As such, I've
#[ignore]
'd six tests that expose this problem.From our (pg_search's) side of things, we don't really have physical files and don't need to rely on the filesystem/kernel to allow reading unlinked files that are still open.
Overall, this cuts down a signficiant number of disk reads during pg_search's query planning. With my test data it goes from 808 individual reads totalling 999,799 bytes, to 18 reads totalling 814,514 bytes.
This reduces the time it takes to plan a simple query from about 1.4ms to 0.436ms -- roughly a 3.2x improvement.