Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nested field missing sub embedding field #913

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

wdongyu
Copy link

@wdongyu wdongyu commented Sep 20, 2024

Description

Detail description in #909, this pull add empty check before filling in the result

Related Issues

Resolves #909

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@martin-gaievski
Copy link
Member

@wdongyu is this PR is work in progress or ready for review? bunch of tests are failing...

@wdongyu
Copy link
Author

wdongyu commented Sep 23, 2024

@martin-gaievski will rerun and fix them... :(

@yuye-aws
Copy link
Member

@wdongyu
Copy link
Author

wdongyu commented Sep 26, 2024

@yuye-aws working in process, I have found the problem and try to reproduce it on my dev box. It's related to the inference text which is null but sent to mlCommonsClientAccessor

@yuye-aws
Copy link
Member

Great! Looking forward to see your new commits.

@wdongyu
Copy link
Author

wdongyu commented Sep 26, 2024

In the last commit, I have done:

  1. Filter out the null entry in createInferenceListForMapTypeInput, so that we can avoid sending null inference text to mlCommonsClientAccessor.
  2. Not filter out the null entry in createInferenceList when sourceValue instanceof List, because if there exists any null entry for List, an IllegalArgumentException will be thrown, I also add a test to confirm that.
private List<String> createInferenceList(Map<String, Object> knnKeyMap) {
   ...
        if (sourceValue instanceof List) {
            texts.addAll(((List<String>) sourceValue));   --> not filter out null entry
        } else if (sourceValue instanceof Map) {
            createInferenceListForMapTypeInput(sourceValue, texts);    --> filter out null entry
        } 
   ...
}
  1. Refine two tests, one is to make sure the failed doc count is corrected, another is to make sure the model is uploaded completely.

Please correct me if I am wrong. @martin-gaievski @yuye-aws

@yuye-aws
Copy link
Member

yuye-aws commented Sep 26, 2024

Thanks! Will review this PR tomorrow. @martin-gaievski please start the CI workflow.

@@ -203,6 +203,7 @@ protected void loadModel(final String modelId) throws Exception {
isComplete = checkComplete(taskQueryResult);
Thread.sleep(DEFAULT_TASK_RESULT_QUERY_INTERVAL_IN_MILLISECOND);
}
assertTrue(isComplete);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@yuye-aws
Copy link
Member

@wdongyu Please run ./gradlew spotlessapply to fix linting isssue

Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wdongyu It is nice of you to implement high-quality ut and it code. Just need to resolve the following comments.

@vibrantvarun
Copy link
Member

Triggered checks. I will review the PR today

@martin-gaievski martin-gaievski added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Sep 28, 2024
@wdongyu
Copy link
Author

wdongyu commented Sep 29, 2024

Resolve all above conversations @yuye-aws @martin-gaievski @vibrantvarun

@yuye-aws
Copy link
Member

Good. Taking a look

@yuye-aws
Copy link
Member

The code looks good to me. Thanks for the contribution @wdongyu . Can @martin-gaievski or @vibrantvarun start the CI workflow?

@martin-gaievski
Copy link
Member

Changes looks good to me, thank you! @wdongyu can you please rebase on main? you need to have changes from #916 in order to fix the CI. Once that's done and CI is green I'll approve and merge this PR.

@vibrantvarun
Copy link
Member

vibrantvarun commented Oct 1, 2024

Please rebase with main and LGTM, Thanks

@wdongyu wdongyu force-pushed the fix_nested_field_missing_sub_embedding_field branch from d3dae05 to de1a736 Compare October 8, 2024 09:49
@wdongyu
Copy link
Author

wdongyu commented Oct 8, 2024

Rebase done, please start the CI workflow when you are available @martin-gaievski @vibrantvarun.

@yuye-aws
Copy link
Member

yuye-aws commented Oct 8, 2024

Rebase done, please start the CI workflow when you are available @martin-gaievski @vibrantvarun.

Thank you! CI is running now. Will approve your PR once the CI gets all passed.

@wdongyu
Copy link
Author

wdongyu commented Oct 8, 2024

Thank you! CI is running now. Will approve your PR once the CI gets all passed.

Bunch of tests are still failing, most of them are NoClassDefFoundError, and one of them is for testAgainstOldCluster. Am I missing somethings? @yuye-aws

@yuye-aws
Copy link
Member

yuye-aws commented Oct 8, 2024

Hi @wdongyu ! I do not think you have missed anything. The following error log seems to be a problem from k-NN repo. Might have time to check it tomorrow.

Caused by:
java.lang.ExceptionInInitializerError: Exception java.util.ServiceConfigurationError: org.apache.lucene.codecs.Codec: Provider org.opensearch.knn.index.codec.KNN910Codec.KNN910Codec could not be instantiated [in thread "SUITE-MLCommonsClientAccessorTests-seed#[E12FC70B18037AED]-worker"]
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:813)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:68)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:52)
at org.apache.lucene.util.NamedSPILoader.<init>(NamedSPILoader.java:38)
at org.apache.lucene.codecs.Codec$Holder.<clinit>(Codec.java:45)
... 18 more

@yuye-aws
Copy link
Member

yuye-aws commented Oct 9, 2024

A PR has been merged into the k-NN repo: opensearch-project/k-NN#2195. Hopefully the k-NN artifact can fix the CI error. I will rerun the CI later.

@martin-gaievski
Copy link
Member

You need to wait for #927 to be merged, CI will keep failing until then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch bug Something isn't working v2.18.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Nested field missing sub embedding field will cause the IndexOutOfBoundsException
4 participants