Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add changes for AVX-512 support in k-NN. #2110

Merged

Conversation

akashsha1
Copy link
Contributor

@akashsha1 akashsha1 commented Sep 16, 2024

Description

This change adds support to speed up vector search and indexing in faiss using AVX512 hardware accelerator.

Related Issues

Resolves #2056

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@akashsha1 akashsha1 mentioned this pull request Sep 16, 2024
5 tasks
@assanedi
Copy link
Contributor

assanedi commented Sep 18, 2024

Benchmark was run using opensearch-benchmark with cohere dataset(768 dimensions).
Her are some configuration details for indexing:
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"target_index_body": "indices/faiss-index.json",
"target_index_primary_shards": 4,
"target_index_replica_shards": 1,
"target_index_dimension": 768,
"target_index_space_type": "innerproduct",
"target_index_bulk_size": 100,
"target_index_bulk_index_data_set_format": "hdf5",
"target_index_bulk_index_data_set_path": "/mnt/nvme1/documents-1m.hdf5",
"target_index_bulk_indexing_clients": 20,
"target_index_max_num_segments": 1,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256
}

Her are some configuration details for search:
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"query_k": 100,
"query_body": {
"docvalue_fields" : ["_id"],
"stored_fields" : "none"
},
"query_data_set_format": "hdf5",
"query_data_set_path": "/mnt/nvme1/queries-1m-100k.hdf5",
"query_count": 30000,
"search_clients": 20
}

A forcemerge to reduce the number of max_num_segments to 1 is executed via the API before the seach.

The opensearch cluster was deployed with 2 data nodes (r7i.2xlarges), 1 replica and 4 shards.
Using this setup AVX512 shows 15% improvement over AVX2 on indexing and 7 % on search as shown below:

image

@naveentatikonda
Copy link
Member

Benchmark was run using opensearch-benchmark with cohere dataset(768 dimensions). The opensearch cluster was deployed with 2 data nodes (r7i.2xlarges), 1 replica and 4 shards. Using this setup AVX512 shows 15% improvement over AVX2 on indexing and 7 % on search as shown below:

image

@assanedi Can you also pls add other configuration details like the indexing clients, query clients, ef_construction, ef_search, etc

Copy link
Member

@naveentatikonda naveentatikonda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @akashsha1 @assanedi

@naveentatikonda
Copy link
Member

"target_index_bulk_indexing_clients": 20,
"target_index_max_num_segments": 10,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256

@assanedi Isn't the max_num_segments was 1 during forcemerge ?

@assanedi
Copy link
Contributor

"target_index_bulk_indexing_clients": 20,
"target_index_max_num_segments": 10,
"hnsw_ef_search": 256,
"hnsw_ef_construction": 256

@assanedi Isn't the max_num_segments was 1 during forcemerge ?

Yes I run the forcemerge API, here is the results of it:
curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1
{"_shards":{"total":8,"successful":8,"failed":0}}

@naveentatikonda
Copy link
Member

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

Yes, but in the configuration you mentioned it as 10 instead of 1 for target_index_max_num_segments

@naveentatikonda
Copy link
Member

For FP32 we don’t need to make any changes in Faiss as they are using auto-vectorization to achieve the optimization with AVX512. But, for Scalar Quantization Intel have raised a PR to Faiss which is under review
facebookresearch/faiss#3853

@assanedi
Copy link
Contributor

Yes I run the forcemerge API, here is the results of it: curl -X POST -k --user admin:admin http://10.0.0.80:9200/_forcemerge?max_num_segments=1 {"_shards":{"total":8,"successful":8,"failed":0}}

Yes, but in the configuration you mentioned it as 10 instead of 1 for target_index_max_num_segments

I updated the configuration details


```
# While building OpenSearch k-NN
./gradlew build -Dsimd.enabled=true
./gradlew build -Davx2.enabled=true -Davx512.enabled=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt these be mutually exclusive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we want simd (avx2, avx512 - highest of whichever is present) to be enabled by default. The order of checks would be:
if (AVX512 enabled and present) { use avx512 }
else if (AVX2 enabled and present) { use avx2 }
else { use generic version }

by making it mutually exclusive, the intermediate step cannot be achieved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Jack's point is to update it as something like this ./gradlew build -Davx2.enabled=false -Davx512.enabled=true as we can't use both of them at the same time

@@ -499,6 +512,22 @@ public static boolean isFaissAVX2Disabled() {
}
}

public static boolean isFaissAVX512Disabled() {
try {
return KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do proper null checks here? In general, I think its best to avoid catching all exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like java boolean cannot be null. So null check won't be possible.
Your second point on exceptions is valid, and this code shouldn't throw exceptions as a default value is set. I've removed the try/catch block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akashsha1 as users will manually set this setting in opensearch.yml, there is a chance of getting null. To avoid it shall we change it to
return Booleans.parseBoolean(KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED).toString(), false);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Booleans.parseBoolean will do the null validation and if it is null, it will return the default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spoke with Naveen on slack, and updated to
return Booleans.parseBoolean(KNNSettings.state().getSettingValue(KNNSettings.KNN_FAISS_AVX512_DISABLED).toString(), KNN_DEFAULT_FAISS_AVX512_DISABLED_VALUE);

src/main/java/org/opensearch/knn/jni/FaissService.java Outdated Show resolved Hide resolved
@naveentatikonda naveentatikonda added enhancement backport 2.x Features Introduces a new unit of functionality that satisfies a requirement and removed enhancement labels Sep 18, 2024
Copy link
Member

@jmazanec15 jmazanec15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks

@naveentatikonda naveentatikonda merged commit 5423cc1 into opensearch-project:main Sep 19, 2024
38 of 39 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Sep 19, 2024
* changes for AVX-512. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to security workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to backward compat test workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix bwc  workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* address PR feedback. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* update KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

---------

Signed-off-by: Akash Shankaran <[email protected]>
(cherry picked from commit 5423cc1)
ryanbogan pushed a commit that referenced this pull request Sep 20, 2024
* changes for AVX-512. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to security workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* add cpu detection logic to backward compat test workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix bwc  workflow. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* address PR feedback. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

* update KNNSettings. Signed-off by: Akash Shankaran <[email protected]>

Signed-off-by: Akash Shankaran <[email protected]>

---------

Signed-off-by: Akash Shankaran <[email protected]>
(cherry picked from commit 5423cc1)
Signed-off-by: Ryan Bogan <[email protected]>
naveentatikonda pushed a commit that referenced this pull request Sep 23, 2024
* changes for AVX-512. Signed-off by: Akash Shankaran <[email protected]>



* add cpu detection logic to security workflow. Signed-off by: Akash Shankaran <[email protected]>



* add cpu detection logic to backward compat test workflow. Signed-off by: Akash Shankaran <[email protected]>



* fix bwc  workflow. Signed-off by: Akash Shankaran <[email protected]>



* address PR feedback. Signed-off by: Akash Shankaran <[email protected]>



* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>



* fix a bug in KNNSettings. Signed-off by: Akash Shankaran <[email protected]>



* update KNNSettings. Signed-off by: Akash Shankaran <[email protected]>



---------


(cherry picked from commit 5423cc1)

Signed-off-by: Akash Shankaran <[email protected]>
Signed-off-by: Ryan Bogan <[email protected]>
Co-authored-by: akashsha1 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Features Introduces a new unit of functionality that satisfies a requirement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add support for FAISS AVX512
4 participants