-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve #3008: Support Lucene index scrubbing #3009
base: main
Are you sure you want to change the base?
Conversation
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
b754ff7
to
8e23ece
Compare
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
To validate Lucene index validity, support "Report Only" scrubbing for: Dangling Lucene index entries: Iterate "all entries" (similar toLuceneScanAllEntriesTest), validate that all pointers lead to existing records. Missing Lucene index entries: iterate all records, validate that their primary keys are represented in the “primary key to Lucene segment” map, and that the Lucene segment exists
0f8f0d8
to
df17ad5
Compare
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
Result of fdb-record-layer-pr on Linux CentOS 7
|
@@ -33,7 +33,7 @@ Our API stability annotations have been updated to reflect greater API instabili | |||
* **Feature** Feature 2 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN) | |||
* **Feature** Feature 3 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN) | |||
* **Feature** Feature 4 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN) | |||
* **Feature** Feature 5 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN) | |||
* **Feature** Support Lucene index scrubbing [(Issue #3008)](https://github.com/FoundationDB/fdb-record-layer/issues/3008) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this now only supports missing entries, should the issue be changed, or should a new sub-issue be created for missing entries only?
The PR description also seems to need updating.
* | ||
* This source file is part of the FoundationDB open source project | ||
* | ||
* Copyright 2015-2024 Apple Inc. and the FoundationDB project authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright 2015-2024 Apple Inc. and the FoundationDB project authors | |
* Copyright 2015-2025 Apple Inc. and the FoundationDB project authors |
* Index Scrubbing Toolbox for a Lucene index maintainer. Scrub missing value index entries - i.e. detect record(s) that should | ||
* cannot be found in the segment index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Index Scrubbing Toolbox for a Lucene index maintainer. Scrub missing value index entries - i.e. detect record(s) that should | |
* cannot be found in the segment index. | |
* Index Scrubbing Toolbox for a Lucene index maintainer. Scrub missing value index entries - i.e. detect record(s) that should be in the Lucene index, but cannot be found. |
|
||
|
||
@Override | ||
public void presetCommonParams(Index index, boolean allowRepair, boolean isSynthetic, Collection<RecordType> types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to record isSynthetic
|
||
|
||
@Override | ||
public void presetCommonParams(Index index, boolean allowRepair, boolean isSynthetic, Collection<RecordType> types) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this throw an error if allowRepair==true
?
return true; | ||
} | ||
} catch (IOException ex) { | ||
// Here: probably an fdb exception. Unwrap and rethrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this comment is correct, but I can't think of anything where calling toRecordCoreException
isn't the right thing to do.
@@ -56,7 +56,7 @@ public void finishDocument() throws IOException { | |||
public void writeField(FieldInfo info, IndexableField field) throws IOException { | |||
super.writeField(info, field); | |||
try { | |||
if (LuceneIndexMaintainer.PRIMARY_KEY_FIELD_NAME.equals(info.name)) { | |||
if (LuceneIndexMaintainer.PRIMARY_KEY_FIELD_NAME.equals(info.name) && lucenePrimaryKeySegmentIndex != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being added? This class should never be used if the primaryKeySegementIndex isn't enabled.
} | ||
|
||
@Test | ||
void luceneIndexScrubMissingSimpleNoIssues() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you switched this test to use LuceneIndexTestDataModel
, it could make it substantially easier to create an @ParameterizedTest
that runs across a variety of different configurations.
Particularly, I think we need to test both the happy, and un-happy paths with:
- synthetic vs non-synthetic
- partitioned vs non-partitioned
- grouped vs non-grouped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look at #3010 for how to do this. I don't think you should need any of the changes in there.
|
||
final InjectedFailureRepository injectedFailures = new InjectedFailureRepository(); | ||
final TestingIndexMaintainerRegistry registry = new TestingIndexMaintainerRegistry(); | ||
registry.overrideFactory(new MockedLuceneIndexMaintainerFactory(injectedFailures)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be excessively low-level, and I gather is causing you to have to change the production code to handle a non-existent situation.
I think having this test override the factory with a no-op factory, and then add a new document would get sufficient coverage, and be more in-line with the types of issues the scrubbing is looking for. You wouldn't cover the situation where it is stored in the partition metadata, in a partitioned index, but not the segment index, but that's probably fine.
context.commit(); | ||
} | ||
|
||
try (final FDBRecordContext context = openContext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably remove the factory override before this code.
To validate Lucene index validity, support "Report Only" scrubbing for:
Dangling Lucene index entries: Iterate "all entries" (similar toLuceneScanAllEntriesTest), validate that all pointers lead to existing records.
Missing Lucene index entries: iterate all records, validate that their primary keys are represented in the “primary key to Lucene segment” map, and that the Lucene segment exists