-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for k-NN Faiss SQfp16 #6249
Changes from 8 commits
8980923
e9db03f
9266d6e
9ecab9d
e93e142
51cafbe
90c66bb
365a10c
1e57c91
533f594
3e2f0be
318ab5b
b98837f
b26511a
e84d905
4d62a9b
6a6d38e
341daad
74c4c75
9a6c4e2
900478d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,10 +11,62 @@ has_children: false | |
|
||
The k-NN plugin introduces a custom data type, the `knn_vector`, that allows users to ingest their k-NN vectors into an OpenSearch index and perform different kinds of k-NN search. The `knn_vector` field is highly configurable and can serve many different k-NN workloads. For more information, see [k-NN vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/). | ||
|
||
To create a k-NN index, set the `settings.index.knn` parameter to `true`: | ||
|
||
```json | ||
PUT /test-index | ||
{ | ||
"settings": { | ||
"index": { | ||
"knn": true | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"my_vector1": { | ||
"type": "knn_vector", | ||
"dimension": 3, | ||
"method": { | ||
"name": "hnsw", | ||
"space_type": "l2", | ||
"engine": "lucene", | ||
"parameters": { | ||
"ef_construction": 128, | ||
"m": 24 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Lucene byte vector | ||
|
||
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of storage space needed. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). | ||
|
||
## SIMD optimization for the Faiss engine | ||
|
||
Starting with version 2.13, the k-NN plugin supports [Single Instruction Multiple Data (SIMD)](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) processing if the underlying hardware supports SIMD instructions (AVX2 on x64 architecture and Neon on ARM64 architecture). SIMD is supported by default on Linux machines only for the Faiss engine. SIMD architecture helps boost the overall performance by improving indexing throughput and reducing search latency. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
<!-- vale off --> | ||
### x64 architecture | ||
<!-- vale on --> | ||
|
||
For the x64 architecture, two different versions of the Faiss library are built and shipped with the artifact: | ||
|
||
- `libopensearchknn_faiss.so`: The non-optimized Faiss library without SIMD instructions. | ||
- `libopensearchknn_faiss_avx2.so`: The Faiss library that contains AVX2 SIMD instructions. | ||
|
||
If your hardware supports AVX2, the k-NN plugin loads the `libopensearchknn_faiss_avx2.so` library at runtime. | ||
|
||
To disable AVX2 and load the non-optimized Faiss library (`libopensearchknn_faiss.so`), specify the `knn.faiss.avx2.disabled` static setting as `true` in `opensearch.yml` (default is `false`). Note that to update a static setting, you must stop the cluster, change the setting, and restart the cluster. For more information, see [Static settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). | ||
|
||
### ARM64 architecture | ||
|
||
For the ARM64 architecture, only one performance-boosting Faiss library (`libopensearchknn_faiss.so`) is built and shipped. The library contains Neon SIMD instructions and cannot be disabled. | ||
|
||
## Method definitions | ||
|
||
A method definition refers to the underlying configuration of the Approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model). | ||
|
@@ -48,7 +100,7 @@ For nmslib, *ef_search* is set in the [index settings](#index-settings). | |
An index created in OpenSearch version 2.11 or earlier will still use the old `ef_construction` value (`512`). | ||
{: .note} | ||
|
||
### Supported faiss methods | ||
### Supported Faiss methods | ||
|
||
Method name | Requires training | Supported spaces | Description | ||
:--- | :--- | :--- | :--- | ||
|
@@ -107,25 +159,21 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e | |
{: .note} | ||
|
||
```json | ||
{ | ||
"type": "knn_vector", | ||
"dimension": 100, | ||
"method": { | ||
"name":"hnsw", | ||
"engine":"lucene", | ||
"space_type": "l2", | ||
"parameters":{ | ||
"m":2048, | ||
"ef_construction": 245 | ||
} | ||
"method": { | ||
"name":"hnsw", | ||
"engine":"lucene", | ||
"space_type": "l2", | ||
"parameters":{ | ||
"m":2048, | ||
"ef_construction": 245 | ||
} | ||
} | ||
``` | ||
|
||
### Supported faiss encoders | ||
### Supported Faiss encoders | ||
|
||
You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. faiss has | ||
several encoder types, but the plugin currently only supports *flat* and *pq* encoding. | ||
You can use encoders to reduce the memory footprint of a k-NN index at the expense of search accuracy. Faiss has | ||
several encoder types, but the plugin currently only supports `flat`, `pq`, and `sq` encoding. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
k-NN plugin currently supports There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack |
||
|
||
The following example method definition specifies the `hnsw` method and a `pq` encoder: | ||
|
||
|
@@ -151,11 +199,27 @@ The `hnsw` method supports the `pq` encoder for OpenSearch versions 2.10 and lat | |
|
||
Encoder name | Requires training | Description | ||
:--- | :--- | :--- | ||
`flat` | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. | ||
`flat` (Default) | false | Encode vectors as floating point arrays. This encoding does not reduce memory footprint. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`pq` | true | An abbreviation for _product quantization_, it is a lossy compression technique that uses clustering to encode a vector into a fixed size of bytes, with the goal of minimizing the drop in k-NN search accuracy. At a high level, vectors are broken up into `m` subvectors, and then each subvector is represented by a `code_size` code obtained from a code book produced during training. For more information about product quantization, see [this blog post](https://medium.com/dotstar/understanding-faiss-part-2-79d90b1e5388). | ||
`sq` | false | Stands for _scalar quantization_. Starting with k-NN plugin version 2.13, you can use the `sq` encoder to quantize 32-bit floating-point vectors into 16-bit floats. In version 2.13, the built-in `sq` encoder is the SQFP16 Faiss encoder. The encoder reduces memory footprint with a minimal loss of precision and improves performance by using SIMD optimization (using AVX2 on x86 architecture or Neon on ARM architecture). For more information, see [Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization#faiss-scalar-quantization). | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
naveentatikonda marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### Examples | ||
#### PQ parameters | ||
|
||
Parameter name | Required | Default | Updatable | Description | ||
:--- | :--- | :--- | :--- | :--- | ||
`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Maximum value is 1,024. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should IVF be defined? |
||
|
||
#### SQ parameters | ||
|
||
Parameter name | Required | Default | Updatable | Description | ||
:--- | :--- | :-- | :--- | :--- | ||
`type` | false | `fp16` | false | The type of scalar quantization to be used to encode 32-bit float vectors into the corresponding type. As of OpenSearch 2.13, only the `fp16` encoder type is supported. For the `fp16` encoder, vector values must be in the [-65504.0, 65504.0] range. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
By default Also lets add above as Note and probably bold/highlight There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We normally don't format sentences as a note in the parameter table. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. got it. Shall we add a note about this inside faiss scalar quantization section ? |
||
`clip` | false | `false` | false | If `true`, any vector values that are out of the supported range for the specified vector type are rounded so they are in the range. If `false`, the request is rejected if any vector values are out of the supported range. Setting `clip` to `true` may decrease recall. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
For more information and examples, see [Using Faiss scalar quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#using-faiss-scalar-quantization). | ||
|
||
#### Examples | ||
|
||
The following example uses the `ivf` method without specifying an encoder (by default, OpenSearch uses the `flat` encoder): | ||
|
||
|
@@ -204,12 +268,46 @@ The following example uses the `hnsw` method without specifying an encoder (by d | |
} | ||
``` | ||
|
||
#### PQ parameters | ||
The following example uses the `hnsw` method with an `sq` encoder of type `fp16` with `clip` enabled: | ||
|
||
Paramater Name | Required | Default | Updatable | Description | ||
:--- | :--- | :--- | :--- | :--- | ||
`m` | false | 1 | false | Determines the number of subvectors into which to break the vector. Subvectors are encoded independently of each other. This dimension of the vector must be divisible by `m`. Maximum value is 1,024. | ||
`code_size` | false | 8 | false | Determines the number of bits into which to encode a subvector. Maximum value is 8. For IVF, this value must be less than or equal to 8. For HNSW, this value can only be 8. | ||
```json | ||
"method": { | ||
"name":"hnsw", | ||
"engine":"faiss", | ||
"space_type": "l2", | ||
"parameters":{ | ||
"encoder": { | ||
"name": "sq", | ||
"parameters": { | ||
"type": "fp16", | ||
"clip": true | ||
} | ||
}, | ||
"ef_construction": 256, | ||
"m": 8 | ||
} | ||
} | ||
``` | ||
|
||
The following example uses the `ivf` method with an `sq` encoder of type `fp16`: | ||
|
||
```json | ||
"method": { | ||
"name":"ivf", | ||
"engine":"faiss", | ||
"space_type": "l2", | ||
"parameters":{ | ||
"encoder": { | ||
"name": "sq", | ||
"parameters": { | ||
"type": "fp16", | ||
"clip": false | ||
} | ||
}, | ||
"nprobes": 2 | ||
} | ||
} | ||
``` | ||
|
||
### Choosing the right method | ||
|
||
|
@@ -221,6 +319,8 @@ If you want to use less memory and index faster than HNSW, while maintaining sim | |
|
||
If memory is a concern, consider adding a PQ encoder to your HNSW or IVF index. Because PQ is a lossy encoding, query quality will drop. | ||
|
||
If you want to reduce the memory requirements by a factor of 2 (with very minimal loss of search quality) or by a factor of 4 (with a significant drop in search quality), consider vector quantization. To learn more about vector quantization options, see [k-NN vector quantization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can reduce the memory footprint by factor of 2 by using fp_16 encoder technique(provide link?) with minimal loss in search quality. If your vector dimensions fit in the byte range [-128, 128] we recommend using byte quantizer(provide link?) to cut down memory footprint by factor of 4. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ack There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The byte range is [-128, 127], correct? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, byte range is [-128 to 127] |
||
|
||
### Memory estimation | ||
|
||
In a typical OpenSearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates | ||
|
@@ -230,6 +330,9 @@ the `circuit_breaker_limit` cluster setting. By default, the limit is set at 50% | |
Having a replica doubles the total number of vectors. | ||
{: .note } | ||
|
||
For memory estimation when using vector quantization, see the [vector quantization documentation]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-vector-quantization/#memory-estimation). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "For information about using memory estimation with vector quantization"?
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{: .note } | ||
|
||
#### HNSW memory estimation | ||
|
||
The memory required for HNSW is estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
--- | ||
naveentatikonda marked this conversation as resolved.
Show resolved
Hide resolved
|
||
layout: default | ||
title: k-NN vector quantization | ||
nav_order: 27 | ||
parent: k-NN search | ||
grand_parent: Search methods | ||
has_children: false | ||
has_math: true | ||
--- | ||
|
||
# k-NN vector quantization | ||
|
||
By default, the k-NN plugin supports indexing and querying vectors of type `float`, where each dimension of the vector occupies 4 bytes of memory. For use cases that require ingestion on a large scale, keeping `float` vectors is expensive because OpenSearch needs to construct, load, save, and search graphs (for native `nmslib` and `faiss` engines). To reduce the memory footprint, you can use vector quantization. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Lucene byte vector | ||
|
||
Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `lucene` engine in order to reduce the amount of required memory. This requires quantizing the vectors outside of OpenSearch before ingesting them into an OpenSearch index. For more information, see [Lucene byte vector]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector#lucene-byte-vector). | ||
|
||
## Faiss scalar quantization | ||
|
||
Starting with version 2.13, the k-NN plugin supports performing vector quantization for the Faiss engine within OpenSearch. Within the Faiss engine, a scalar quantizer (SQfp16) performs the conversion between 32-bit and 16-bit vectors. At ingestion time, when you upload 32-bit floating-point vectors to OpenSearch, SQfp16 quantizes them into 16-bit floating-point vectors and stores the quantized vectors in a k-NN index. At search time, SQfp16 decodes the vector values back into 32-bit floating-point values for distance computation. The SQfp16 vector quantization can decrease the memory footprint by a factor of 2, with minimal loss in recall when vector values are not very similar. When used with [SIMD Optimization]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index#simd-optimization-for-the-faiss-engine), vector quantization can also significantly reduce search latencies and improve indexing throughput. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Using Faiss scalar quantization | ||
|
||
To use Faiss scalar quantization, set the `method.parameters.encoder.name` to `sq` for the [k-NN vector field]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/) when creating a k-NN index: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "To use Faiss scalar quantization, set the k-NN vector field's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reworded
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```json | ||
PUT /test-index | ||
{ | ||
"settings": { | ||
"index": { | ||
"knn": true, | ||
"knn.algo_param.ef_search": 100 | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"my_vector1": { | ||
"type": "knn_vector", | ||
"dimension": 3, | ||
"method": { | ||
"name": "hnsw", | ||
"engine": "faiss", | ||
"space_type": "l2", | ||
"parameters": { | ||
"encoder": { | ||
"name": "sq", | ||
}, | ||
"ef_construction": 256, | ||
"m": 8 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
Optionally, you can specify the parameters in `method.parameters.encoder`. For more information about parameters within the `encoder` object, see [SQ parameters]({{site.url}}{{site.baseurl}}/search-plugins/knn/knn-index/#sq-parameters). | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The `fp16` encoder converts 32-bit vectors into their 16-bit counterparts. For this encoder type, the vector values must be in the [-65504.0, 65504.0] range. To define handling out-of-range values, the preceding request specifies the `clip` parameter. By default, this parameter is `false` and any vectors containing out-of-range values are rejected. When `clip` is set to `true` (as in the preceding request), out-of-range vector values are rounded up or down so that they are in the supported range. For example, if the original 32-bit vector is `[65510.82, -65504.1]`, the vector will indexed as a 16-bit vector `[65504.0, -65504.0]`. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do we mean by "To define handling"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reworded. |
||
|
||
We recommend setting `clip` to `true` only if very few elements lie outside the supported range. Rounding the values might cause a drop in recall. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
{: .note} | ||
|
||
The following example method definition specifies the Faiss SQfp16 encoder, which rejects any indexing request that contains out-of-range vector values (because the `clip` parameter is `false` by default): | ||
|
||
```json | ||
PUT /test-index | ||
{ | ||
"settings": { | ||
"index": { | ||
"knn": true, | ||
"knn.algo_param.ef_search": 100 | ||
} | ||
}, | ||
"mappings": { | ||
"properties": { | ||
"my_vector1": { | ||
"type": "knn_vector", | ||
"dimension": 3, | ||
"method": { | ||
"name": "hnsw", | ||
"engine": "faiss", | ||
"space_type": "l2", | ||
"parameters": { | ||
"encoder": { | ||
"name": "sq", | ||
"parameters": { | ||
"type": "fp16" | ||
} | ||
}, | ||
"ef_construction": 256, | ||
"m": 8 | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
During ingestion, make sure each dimension of the vector is in the supported range ([-65504.0, 65504.0]): | ||
|
||
```json | ||
PUT test-index/_doc/1 | ||
{ | ||
"my_vector1": [-65504.0, 65503.845, 55.82] | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
During querying, there is no range limitation for the query vector: | ||
|
||
```json | ||
GET test-index/_search | ||
{ | ||
"size": 2, | ||
"query": { | ||
"knn": { | ||
"my_vector1": { | ||
"vector": [265436.876, -120906.256, 99.84], | ||
"k": 2 | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
{% include copy-curl.html %} | ||
|
||
## Memory estimation | ||
|
||
In the best case scenario, 16-bit vectors produced by the Faiss SQfp16 quantizer require 50% of the memory that 32-bit vectors require. | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
#### HNSW memory estimation | ||
|
||
The memory required for HNSW is estimated to be `1.1 * (2 * dimension + 8 * M)` bytes/vector. | ||
|
||
As an example, assume you have a million vectors with a dimension of 256 and M of 16. The memory requirement can be estimated as follows: | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```bash | ||
1.1 * (2 * 256 + 8 * 16) * 1,000,000 ~= 0.656 GB | ||
``` | ||
|
||
#### IVF memory estimation | ||
|
||
The memory required for IVF is estimated to be `1.1 * (((2 * dimension) * num_vectors) + (4 * nlist * d))` bytes/vector. | ||
|
||
As an example, assume you have a million vectors with a dimension of 256 and `nlist` of 128. The memory requirement can be estimated as follows: | ||
kolchfa-aws marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
```bash | ||
1.1 * (((2 * 256) * 1,000,000) + (4 * 128 * 256)) ~= 0.525 GB | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIMD should be CPU architecture dependent right? Why do we say only Linux machine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, SIMD is CPU architecture dependent. But, right now we are running into some issues on Windows OS due to some limitations with compiler and supporting SIMD for linux OS and mac OS (for development only). So, that's the reason we are explicitly calling it out that it works on linux.