Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve #3008: Support Lucene index scrubbing #3009

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jjezra
Copy link
Contributor

@jjezra jjezra commented Dec 18, 2024

To validate Lucene index validity, support "Report Only" scrubbing for:
Dangling Lucene index entries: Iterate "all entries" (similar toLuceneScanAllEntriesTest), validate that all pointers lead to existing records.
Missing Lucene index entries: iterate all records, validate that their primary keys are represented in the “primary key to Lucene segment” map, and that the Lucene segment exists

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0769dbc
  • Duration 0:37:04
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 00d9136
  • Duration 0:46:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: b754ff7
  • Duration 0:37:57
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 8e23ece
  • Duration 0:36:28
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: a31ae21
  • Duration 0:36:29
  • Result: ❌ FAILED
  • Error: Error while executing command: ./gradlew --no-daemon --console=plain -b ./build.gradle build destructiveTest -PcoreNotStrict -PreleaseBuild=false -PpublishBuild=false -PspotbugsEnableHtmlReport. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 1d6bab9
  • Duration 0:49:10
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0f8f0d8
  • Duration 0:46:40
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

  To validate Lucene index validity, support "Report Only" scrubbing for:
     Dangling Lucene index entries: Iterate "all entries" (similar toLuceneScanAllEntriesTest), validate that all pointers lead to existing records.
     Missing Lucene index entries: iterate all records, validate that their primary keys are represented in the “primary key to Lucene segment” map, and that the Lucene segment exists
@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: df17ad5
  • Duration 0:52:29
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 1dade63
  • Duration 0:53:11
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: e66cd65
  • Duration 0:54:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@jjezra jjezra requested a review from ScottDugas January 13, 2025 22:17
@jjezra jjezra marked this pull request as ready for review January 13, 2025 22:17
@foundationdb-ci
Copy link
Contributor

Result of fdb-record-layer-pr on Linux CentOS 7

  • Commit ID: 0e049d9
  • Duration 0:52:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@@ -33,7 +33,7 @@ Our API stability annotations have been updated to reflect greater API instabili
* **Feature** Feature 2 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN)
* **Feature** Feature 3 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN)
* **Feature** Feature 4 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN)
* **Feature** Feature 5 [(Issue #NNN)](https://github.com/FoundationDB/fdb-record-layer/issues/NNN)
* **Feature** Support Lucene index scrubbing [(Issue #3008)](https://github.com/FoundationDB/fdb-record-layer/issues/3008)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this now only supports missing entries, should the issue be changed, or should a new sub-issue be created for missing entries only?
The PR description also seems to need updating.

*
* This source file is part of the FoundationDB open source project
*
* Copyright 2015-2024 Apple Inc. and the FoundationDB project authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Copyright 2015-2024 Apple Inc. and the FoundationDB project authors
* Copyright 2015-2025 Apple Inc. and the FoundationDB project authors

Comment on lines +58 to +59
* Index Scrubbing Toolbox for a Lucene index maintainer. Scrub missing value index entries - i.e. detect record(s) that should
* cannot be found in the segment index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Index Scrubbing Toolbox for a Lucene index maintainer. Scrub missing value index entries - i.e. detect record(s) that should
* cannot be found in the segment index.
* Index Scrubbing Toolbox for a Lucene index maintainer. Scrub missing value index entries - i.e. detect record(s) that should be in the Lucene index, but cannot be found.



@Override
public void presetCommonParams(Index index, boolean allowRepair, boolean isSynthetic, Collection<RecordType> types) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to record isSynthetic



@Override
public void presetCommonParams(Index index, boolean allowRepair, boolean isSynthetic, Collection<RecordType> types) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this throw an error if allowRepair==true?

return true;
}
} catch (IOException ex) {
// Here: probably an fdb exception. Unwrap and rethrow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this comment is correct, but I can't think of anything where calling toRecordCoreException isn't the right thing to do.

@@ -56,7 +56,7 @@ public void finishDocument() throws IOException {
public void writeField(FieldInfo info, IndexableField field) throws IOException {
super.writeField(info, field);
try {
if (LuceneIndexMaintainer.PRIMARY_KEY_FIELD_NAME.equals(info.name)) {
if (LuceneIndexMaintainer.PRIMARY_KEY_FIELD_NAME.equals(info.name) && lucenePrimaryKeySegmentIndex != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added? This class should never be used if the primaryKeySegementIndex isn't enabled.

}

@Test
void luceneIndexScrubMissingSimpleNoIssues() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you switched this test to use LuceneIndexTestDataModel, it could make it substantially easier to create an @ParameterizedTest that runs across a variety of different configurations.

Particularly, I think we need to test both the happy, and un-happy paths with:

  • synthetic vs non-synthetic
  • partitioned vs non-partitioned
  • grouped vs non-grouped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look at #3010 for how to do this. I don't think you should need any of the changes in there.


final InjectedFailureRepository injectedFailures = new InjectedFailureRepository();
final TestingIndexMaintainerRegistry registry = new TestingIndexMaintainerRegistry();
registry.overrideFactory(new MockedLuceneIndexMaintainerFactory(injectedFailures));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be excessively low-level, and I gather is causing you to have to change the production code to handle a non-existent situation.
I think having this test override the factory with a no-op factory, and then add a new document would get sufficient coverage, and be more in-line with the types of issues the scrubbing is looking for. You wouldn't cover the situation where it is stored in the partition metadata, in a partitioned index, but not the segment index, but that's probably fine.

context.commit();
}

try (final FDBRecordContext context = openContext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably remove the factory override before this code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants