-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] unable to run knn search with neural query on OS 2.16.0 #2838
Comments
Does it happen when you run knn query directly as well or it happens only when you query knn field through neural plugin? |
This issue is coming from ml-commons through pytorch/djl. Similar issue - #2563 @IanMenendez Can you please share the configuration and mapping details of the models and indices to replicate the issue. @ylwu-amzn Can you please take a look into the above issue and confirm if it is coming from ml-commons. Thanks! |
Tested this and it is only happening with neural query, so I think it's better to move this to neural search or ml commons repo? can you do this? |
Yes, I updated the issue description |
I figured out that the issue is solved by itself if you restart the cluster. For some reason then the libraries seem to be there. The problem is that we use neural query in our testing pipeline and we cannot just restart a cluster during the testing pipeline. |
@IanMenendez as provided by @naveentatikonda the issue seems to be coming from ML Commons. So, ideally this issue should be moved to ML Commons. But we don't have permission to transfer the issue to ML Commons. @opensearch-project/admin can you move this issue to ML Commons repo. |
what is the operation system you used for producing this error? @IanMenendez |
@Zhangxunmt I replicated the issue in the OS 2.16 docker container But It's also happening on my local machine with |
From the error It's KNN plugin can't find @IanMenendez , can you run predict API directly ?
If this can work, then it's not from ml-commons. KNN team can help take a look. |
@ylwu-amzn Using neural query crashes the cluster |
@navneet1v , as predict API works correctly, I think the issue is from KNN, the log also shows it's related with KNN I have no permission to k-NN plugin repo, so can't transfer it to k-NN repo |
Also meet with the same issue. Can you take a look @navneet1v @martin-gaievski ? |
@IanMenendez Maybe you can try this workaround: https://forum.opensearch.org/t/issue-with-opensearch-knn/12633/3 |
Here is my error log
|
@yuye-aws this error looks specific to native knn engines. Can you check which one you're using, if it's one of native ones (faiss or nmslib) can you try defining |
@martin-gaievski , can you transfer this issue to k-nn plugin repo ? I think change to lucene can work , but we should also support other two native engines. Suggest K-NN team to try this, seems very easy to reproduce. |
My problem has been resolved after this index mapping. Thank you @martin-gaievski ! Do you think the original issue problem can be resolved in a similar manner?
|
Changing the index mapping to use lucene as it was suggested worked. But I still think OS should not crash the cluster if libstdc++.so.6 is not found. It should catch the exception and throw an error |
Any update on this issue ? I had a similar error (this time when indexing the documents on an index that uses faiss engine)
Error sometimes occur which make the cluster crash and then cluster restarts and for some reason library seems to be found. Solution to switch on Lucene doesn't work in my case because of opensearch-project/k-NN#2347 Opensearch version on my cluster is 2.17 |
@CorentinLimier Could you confirm if the issue happen when you use knn field directly without neural? |
@heemin32 Sadly I can't. We rollbacked the index to lucene engine because it happened on a production cluster. Switching all the searches from neural to knn field would be too costly. And I'm still unsure on when this error happened : my first lead was that it occured when indexing documents, but since we do use neural queries as well, maybe in parallel a neural query was launched ?
Model :
We will try to upgrade to 2.18.0 and see if we still have the issue. |
@IanMenendez So the hot fix you can take is to build a new image based on the one that you're using then upgrade GCC to 7 inside. (actually it will work any versions later than 5)
Could you share the output of running |
@peterzhuamazon |
@0ctopus13prime awesome, thanks a lot for the lead, I will try that and tell you if it works + the output of One question though, from the logs I can see that it uses libstdc++.so.6 from this path : Full log :
Would updating the image and installing another version of libstdc++.so.6 fix the one that is used by opensearch there ? Thanks 🙏 |
In 2.16 we switch from centos7 (glibc 2.17 gcc9) as it deprecated, and switch to AL2 (gblic 2.26 gcc10) for the build image. It is well-documented here:
In 2.19 we plan to upgrade to gcc 13 for a new knn feature soon: In June 2025 we plan to move again from AL2 to Almalinux8 (glibc 2.28 gcc11/13) due to AL2 deprecation. |
The issue here is very similar on libstdc++ here: cc: @ylwu-amzn |
Indeed when doing
We can see that GLIBCXX stops at version 3.4.19 (which explain the missing 3.4.21) We did install latest libstdc++ and updated env var LD_LIBRARY_PATH in docker image, then rebuilt.
We have two data nodes, and while both have a correct libstdc++ in Both links to the same exact docker image. I feel the file We are considering trying to replace
What do you think ? |
We solved this issue by replacing Issue seems to remain in 2.18.0 |
What is the bug?
Searching with neural query brings down OS 2.16.0.
This is happening in OS 2.16.0 Image
FROM opensearchproject/opensearch:2.16.0
but not in 2.15.0 or lowerHow to reproduce the bug?
POST /testing/_doc
What is the expected behavior?
Dont crash and perform a search
What is your host/environment?
Docker with
FROM opensearchproject/opensearch:2.16.0
ImageNOTE: This issue only happens the first time you run a neural query. After running a neural query sometime after the first one, the cluster seems to get the libraries. I do not know what was changed from OS 2.15 to 2.16
The text was updated successfully, but these errors were encountered: