-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF]: better locking of uncommitted tracking maps (decrease compaction time by 3x) #2736
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
rust/index/src/fulltext/types.rs
Outdated
@@ -92,60 +93,71 @@ impl<'me> FullTextIndexWriter<'me> { | |||
|
|||
async fn populate_frequencies_and_posting_lists_from_previous_version( | |||
&self, | |||
token: &str, | |||
tokens: &Vec<Token>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is slightly weird, but the tantivy token type is already leaking out of our abstraction and this avoids having to remap the tokens to strs
not strongly opinionated here, ok going the remapping route
tracing::error!( | ||
"Error populating frequencies and posting lists from previous version" | ||
); | ||
return Err(FullTextIndexError::InvariantViolation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add an extra Vec to track seen tokens in the above loop so we can retain this error if it's important
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine, not worth the extra overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care of the comments!
Description of changes
Instead of acquiring 2 locks for every token, we now acquire the locks at the document level. This improves compaction time by up to 3x.
I ran an experiment where I:
The partition size was 10,000.
Results (also includes changes from #2729 up to 14a744b):
Before:
After:
Here's a trace file for the two runs: locking-changes.trace.zip.
(Run 1 is after these changes, run 2 is before these changes.)
Test plan
How are these changes tested?
covered by existing tests
Documentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?
n/a