-
Notifications
You must be signed in to change notification settings - Fork 278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the Artifact Build Process for k-NN Plugin #4386
Comments
[Triage] Removing the untriage label here since @peterzhuamazon @prudhvigodithi @bbarani are aware of the issue and have discussed the possible next steps. Would be helpful if someone can post them here. |
Work with Naveen to help him disable avx (avx / avx2 / avx512......) instruction set on an existing EC2 instance:
Before disable AVX:
After disable AVX:
@naveentatikonda will take a look and test both lib files to confirm the results. Also, will post comparisons on the SIMD disabled vs. enabled performance metrics. Thanks. |
Tagging @dblock @elfisher K-NN will be adding optional experimental feature flag @naveentatikonda Do we need to add any documentation around this flag? If so, can you please create a documentation issue and associated PR's in documentation-website repo? |
@bbarani I have opened a PR for Faiss SQFP16 feature documentation. Will include the documentation for these changes in that PR. |
The cluster crashed resulting in below error when I tried to copy the AVX2 enabled Faiss library after disabling AVX2 on the machine, spin up the cluster and ran a benchmarking test which hits the AVX2 logic in Faiss. Later, I replaced the Faiss library with the default one (without AVX2) and spinned the cluster. Now, it works fine when I resumed the same test and there is no loss of data.
Similarly, I ran another test on AVX2 enabled machine. I spinned up the cluster with the default Faiss library (without AVX2), and ingested some data. Later, I terminated the cluster and replaced the Faiss library with the one that has AVX2 enabled and spinned up the cluster. I don't see any data loss and the test ran successfully without any drop in recall. |
After talking to @prudhvigodithi @naveentatikonda @navneet1v here is the path forward:
Thanks. |
@peterzhuamazon any specific naming convention for the subfolders ? |
sure 👍 |
Changed to
|
If this change is resulting in significant impact to the build process, we should revisit if 2.12 is the right version given its short notice. |
I agree. This requires good amount of effort from build team to accommodate this change and impacts other RC related activities. I would recommend to move this feature to 2.13.0 release as this is an experimental feature and requires additional testing before it can become GA feature. CC: @vamshin @naveentatikonda |
Couple options we see here Option 1: Remove the SIMD library and ship fp_16 feature in 2.12 Option 2: Move the feature to 2.13 and have SIMD library shipped in the artifacts Considering the impact to search performance with Option1, we are ok to defer to 2.13 release and have the SIMD library support for better customer experience. For 2.13 we cannot afford to have this as experimental release as we now have enough head start to test mechanism to ship these libraries. cc: @bbarani @Pallavi-AWS |
@elfisher @dblock We need your inputs on the above options. Basically, we would need to ship a new SIMD library along with existing library to support new feature opensearch-project/k-NN#1138 through AVX2 on x86 architecture and using NEON on ARM architecture. This might be a breaking change for the users who are upgrading from version 2.11 to 2.12 if their system or processor doesn’t support these optimizations which might result in a crash. To solve this issue, we were planning to build two different versions of Faiss library(with same name) in k-NN plugin, one with SIMD optimization enabled and the other one without SIMD optimization and allow user to take action but need your inputs before we finalize this approach for 2.13.0 release. |
How does this break users if both libraries are packaged? |
By default, during runtime they load the lib without SIMD optimization. @naveentatikonda Can you confirm the experience if user access the new feature opensearch-project/k-NN#1138 using the library without SIMD optimization? For those users who know that their system supports these latest optimizations and want to try them, the plan was to enable it using a flag ( We initially thought of using experimental flag due to time crunch but I think we should add a dedicated flag for 2.13.0 release). |
Without SIMD Optimization, users can still use the Faiss SQFP16 feature and there is no change in the UX or API parameters input. But, there is a considerable drop in the search performance upto 3X or 4X (depending on the dataset). |
These are some of the benchmarking results on x86 for Faiss SQFP16 comparing with and without AVX2. We can see a performance boost in Indexing Throughput and Search after enabling AVX2.
|
I don't think I understand the concern: after reading this there's no breaking change, and a new feature flag to enable an optimization.
So there's no breaking change to the user.
This is like all feature flags. For the long run you want to flip this when SIMD support is available, dynamically. Meaning that the software should be smart enough to pick the optimized path when possible (packaging both libraries and dynamically loading the right one). |
After today's discussion with @peterzhuamazon and @prudhvigodithi we figured out these four options which could be feasible:
|
Update centos7 to support arm64 neon simd for knnlib, with gcc9: |
Close this as all the necessary changes are done for 2.13.0. |
@prudhvigodithi I disabled the avx2 as mentioned in the above steps. But when I tried to enable it back by reverting the changes I did above, it didn't enable back the avx2. Can you please provide some resolution for this. cc: @naveentatikonda |
@navneet1v Can you please open a separate issue to track this issue since this issue is for the implementation? |
its not the issue with implementation. I just have a specific question on how to revert the change of disabling simd on a machine. |
So I am finally able to resolve the issue. Thanks to @naveentatikonda . So after removing the |
Problem Statement
For the 2.12 release, in k-NN plugin a new feature Faiss SQFP16 is being supported which also includes the SIMD optimization (Single Instruction Multiple Data) through AVX2 on x86 architecture and using NEON on ARM architecture. This SIMD optimization helps to boost the overall performance a lot. But, this might be a breaking change for the users who are upgrading from version 2.11 to 2.12 if their system or processor doesn’t support these optimizations which might result in a crash.
Proposed Solution
To solve this issue, we will be building two different versions of Faiss library(with same name) in k-NN plugin, one with SIMD optimization enabled and the other one without SIMD optimization and store them in different sub directories (as they both have same name). By default, during runtime we will load the lib without SIMD optimization.
For those users who know that their system supports these latest optimizations and want to try them, they can enable it by using an optional experimental feature flag something like
-Dopensearch.kn.experimental.feature.faiss.simd.enabled=true
. For this, they need to stop the existing cluster and need to pass the above optional flag during bootup which will load the other Faiss library with SIMD optimization(AVX2 or NEON depending on architecture).The text was updated successfully, but these errors were encountered: