-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize slice calculation in IndexSearcher a little #13860
base: main
Are you sure you want to change the base?
Optimize slice calculation in IndexSearcher a little #13860
Conversation
Fewer volatile reads, less indirection and a fast-path for when there's no executor. Also, saving some copies, sorting array instead of list and saving allocations.
.map(LeafReaderContextPartition::createForEntireSegment) | ||
.toList())); | ||
for (List<LeafReaderContextPartition> currentGroup : groupedLeafPartitions) { | ||
slices[upto] = new LeafSlice(currentGroup); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with extracting the partitioning bit to a separate method. Ideally that is made as a single mechanical change though. The diff is hard to diff otherwise.
return res; | ||
} | ||
|
||
private synchronized LeafSlice[] computeAndCacheSlices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks ok to me, and potentially simpler than the supplier. Nitpicking, I would prefer that made alone in its own PR. This does not affect when the below error gets thrown compared to before right? It is still thrown the first time the slices are retrieved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right behavior is completely unchanged. It's still the same amount guarantees around ordering as before :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool can you open a separate PR for it? Easier to review then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, finally got around to it in #13893 :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved it, sorry about the lag. I will look again at this one once it's rebased.
final C collector = collectorManager.newCollector(); | ||
collectors.add(collector); | ||
final Weight weight = createWeight(rewrite(query, scoreMode.needsScores()), scoreMode, 1); | ||
if (leafSlices.length == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how I feel about specializing the single slice codepath and having a searchMultipleSlices method. I would prefer to avoid that I think and to share the same codepath between these two scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm a big fan of having separate paths here. Single slice vs. multiple slice execution simply has an extreme performance impact depending on the query.
With wikimedium in Luceneutil, this is running 8 queries in parallel across 8 threads, slicing into up to 8 slices vs always slicing into a single slice:
TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value
HighIntervalsOrdered 2.64 (5.0%) 0.64 (1.3%) -75.7% ( -78% - -73%) 0.000
HighSpanNear 5.97 (4.4%) 2.01 (0.9%) -66.4% ( -68% - -63%) 0.000
MedSloppyPhrase 7.31 (5.7%) 2.76 (1.6%) -62.2% ( -65% - -58%) 0.000
HighTermTitleBDVSort 12.58 (6.3%) 4.90 (2.4%) -61.1% ( -65% - -55%) 0.000
MedSpanNear 14.89 (6.4%) 6.93 (1.2%) -53.4% ( -57% - -48%) 0.000
LowIntervalsOrdered 20.75 (11.2%) 10.13 (1.8%) -51.2% ( -57% - -43%) 0.000
IntNRQ 39.41 (15.3%) 23.34 (8.5%) -40.8% ( -56% - -19%) 0.000
MedIntervalsOrdered 32.83 (8.3%) 19.60 (3.9%) -40.3% ( -48% - -30%) 0.000
HighSloppyPhrase 29.22 (7.4%) 18.36 (2.9%) -37.2% ( -44% - -28%) 0.000
OrHighMedDayTaxoFacets 8.60 (5.6%) 5.63 (3.8%) -34.5% ( -41% - -26%) 0.000
AndHighHighDayTaxoFacets 15.40 (8.5%) 10.55 (2.7%) -31.5% ( -39% - -22%) 0.000
AndHighMedDayTaxoFacets 27.94 (7.9%) 22.57 (2.4%) -19.2% ( -27% - -9%) 0.000
LowPhrase 45.52 (12.6%) 38.08 (1.8%) -16.4% ( -27% - -2%) 0.000
AndHighHigh 36.50 (14.6%) 30.68 (5.2%) -15.9% ( -31% - 4%) 0.000
MedTermDayTaxoFacets 15.19 (8.4%) 13.38 (5.1%) -11.9% ( -23% - 1%) 0.000
OrHighHigh 48.10 (14.5%) 43.38 (8.4%) -9.8% ( -28% - 15%) 0.019
BrowseDateSSDVFacets 1.20 (6.7%) 1.11 (12.1%) -7.3% ( -24% - 12%) 0.035
BrowseMonthTaxoFacets 11.02 (38.3%) 10.28 (40.7%) -6.7% ( -61% - 117%) 0.634
BrowseDayOfYearTaxoFacets 5.24 (10.1%) 5.03 (7.0%) -3.9% ( -19% - 14%) 0.200
BrowseDateTaxoFacets 5.17 (10.2%) 4.96 (7.0%) -3.9% ( -19% - 14%) 0.207
LowSpanNear 62.43 (5.1%) 60.97 (3.1%) -2.3% ( -10% - 6%) 0.117
Wildcard 78.35 (5.8%) 76.52 (2.3%) -2.3% ( -9% - 6%) 0.137
MedPhrase 42.79 (8.5%) 42.43 (2.3%) -0.8% ( -10% - 10%) 0.702
BrowseRandomLabelTaxoFacets 4.27 (4.3%) 4.28 (5.3%) 0.4% ( -8% - 10%) 0.824
PKLookup 242.91 (1.7%) 245.39 (2.7%) 1.0% ( -3% - 5%) 0.203
Respell 45.76 (0.9%) 46.34 (2.1%) 1.3% ( -1% - 4%) 0.025
Prefix3 304.37 (1.8%) 308.85 (3.6%) 1.5% ( -3% - 7%) 0.145
HighPhrase 32.28 (9.5%) 32.90 (2.5%) 1.9% ( -9% - 15%) 0.434
BrowseDayOfYearSSDVFacets 4.37 (5.2%) 4.48 (10.1%) 2.5% ( -12% - 18%) 0.371
BrowseMonthSSDVFacets 4.40 (9.0%) 4.51 (18.2%) 2.6% ( -22% - 32%) 0.612
Fuzzy1 92.79 (0.5%) 95.29 (2.3%) 2.7% ( 0% - 5%) 0.000
Fuzzy2 88.78 (0.7%) 91.62 (2.2%) 3.2% ( 0% - 6%) 0.000
LowSloppyPhrase 78.35 (3.8%) 80.93 (3.7%) 3.3% ( -4% - 11%) 0.013
AndHighMed 55.83 (6.4%) 58.27 (5.7%) 4.4% ( -7% - 17%) 0.041
BrowseRandomLabelSSDVFacets 3.23 (4.8%) 3.39 (7.4%) 4.8% ( -7% - 17%) 0.028
OrHighMed 101.82 (5.0%) 108.12 (6.6%) 6.2% ( -5% - 18%) 0.003
AndHighLow 604.08 (3.4%) 654.10 (4.8%) 8.3% ( 0% - 17%) 0.000
OrHighLow 461.81 (4.5%) 516.22 (4.8%) 11.8% ( 2% - 22%) 0.000
HighTerm 361.41 (6.6%) 449.94 (12.5%) 24.5% ( 5% - 46%) 0.000
OrNotHighLow 435.40 (3.6%) 557.47 (7.0%) 28.0% ( 16% - 40%) 0.000
OrHighNotMed 275.75 (6.1%) 376.17 (12.5%) 36.4% ( 16% - 58%) 0.000
OrNotHighHigh 183.01 (5.4%) 250.78 (12.3%) 37.0% ( 18% - 57%) 0.000
OrNotHighMed 273.62 (4.1%) 377.47 (9.9%) 38.0% ( 23% - 54%) 0.000
OrHighNotLow 290.01 (6.9%) 422.33 (14.0%) 45.6% ( 23% - 71%) 0.000
LowTerm 610.48 (3.9%) 906.71 (9.5%) 48.5% ( 33% - 64%) 0.000
OrHighNotHigh 293.17 (5.7%) 438.61 (13.0%) 49.6% ( 29% - 72%) 0.000
MedTerm 309.52 (6.4%) 544.79 (16.0%) 76.0% ( 50% - 105%) 0.000
TermDTSort 91.38 (3.3%) 201.06 (16.2%) 120.0% ( 97% - 144%) 0.000
HighTermMonthSort 982.63 (2.5%) 2855.09 (9.7%) 190.6% ( 174% - 207%) 0.000
HighTermDayOfYearSort 153.03 (4.5%) 609.28 (18.0%) 298.1% ( 263% - 335%) 0.000
HighTermTitleSort 36.27 (10.4%) 200.12 (18.3%) 451.8% ( 383% - 536%) 0.000
Depending on what kind of query you run, parallelism will either get you a huge boost from better resource utilisation or a huge slowdown from contention and/or redundant work.
I think the two should go through different code paths a. allow for more optimisations down the line and b. make profiling easier (at leat for me personally there is lots of value in knowing that something did or didn't get forked or sliced :)).
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution! |
Fewer volatile reads, less indirection and a fast-path for when there's no executor. Also, saving some copies, sorting array instead of list, and saving allocations all around. This PR is obviously not a big win but in aggregate it's quite measurable and mostly deals with tiny regressions introduced recently. So opening this as a suggestion for dealing with that boiling frog :)
Luceneutil over 40 rounds does show small but significant improvements: