faiss interface refactoring to support multiple methods #344

jmazanec15 · 2021-04-20T19:31:30Z

Issue #, if available:
#225

Description of changes:
This PR focuses on refactoring current faiss-support branch's interface to support several additional features including:

IVF index type - a cell probe based method that allows a user to reduce search space using a k-Means clustering algorithm. It takes "ncentroids" and "nprobes" as parameters
Product quantization - a method to encode vectors to reduce size. It takes "code_size" as a parameter
Composite indices - the ability to combine different faiss features into a single index

The interface looks like:

{
   "my_vector":{
      "type":"knn_vector",
      "dimension":4,
      "method":{
         "name":"ivf",
         "engine":"faiss",
         "coarse_quantizer":{
            "name":"ivf",
            "parameters":{
               "ncentroids":15
            }
         },
         "encoder":{
            "name":"pq",
            "parameters":{
               "code_size":8
            }
         },
         "parameters":{
            "ncentroids":128
         }
      }
   }
}

The main logic where the interface has been refactored can be found in:

KNNVectorFieldMapper - where the parsing between the user provided method and the the plugin occurs
KNNMethodContext - stored structure of the user provided method configuration
KNNMethod - structure of a given method supported by a particular engine
KNNLibrary - interface for a particular library. Includes implementations for nmslib and faiss
KNNEngine - enum mapping name to KNNLibrary

A lot of code was changed in order to support these additional features:

Because we use faiss's index factory, only a certain portion of the parameters are configured through the index factory string description. To support additional parameters (for example, ef_construction for HNSW), this PR adds functionality to pass an extra parameter map to the jni to be parsed.
Because IVF and PQ require training, in the JNI save index function, this PR implements a training approach where a subset of the data to be indexed is used for training. This is inherently inefficient because it requires each segment to be trained before it can add data to it. In the future, we will introduce a train api that trains before indexing, to work around this.
Several other minor changes to make refactor cleaner/easier

Testing
For testing, this PR focuses on addings tests that exercise the interface as opposed to adding end to end tests testing each jni libraries functionality. This is because that functionality will change in the future. Right now, it is just a place holder to get the interface functionality working. That being said, the following test refactoring was done:

Added additional unit tests to test faiss interface
Refactored old tests so that gradle build passes

Future Development

Introduce training api
Add additional end to end tests
Investigate storing data exclusively with faiss (as opposed to storing vectors in doc values in Lucene)

Notes
We are in the process of migrating from ODFE to OpenSearch. Included in this will be porting over the faiss-support branch to OpenSearch. Because porting requires significant refactoring, we will merge this PR and then port the faiss-support branch to OpenSearch.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

src/main/java/com/amazon/opendistroforelasticsearch/knn/common/KNNConstants.java

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNSettings.java

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNWeight.java

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/MethodComponent.java

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/Parameter.java

...com/amazon/opendistroforelasticsearch/knn/index/codec/KNN80Codec/KNN80DocValuesConsumer.java

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/IndexUtil.java

VijayanB · 2021-04-30T20:23:39Z

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/IndexUtil.java

+     * @return length of the file in kilobytes
+     */
+    public static long getFileSizeInKB(String filePath) {
+        if (filePath == null || filePath.isEmpty()) {


This will not differentiate empty file with invalid file path or null. Is this intended?

So I guess it would say an empty file has a size of 1 Kb, where as a non-existent file has a size of 0.

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNIndex.java

src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNIndexCache.java

jmazanec15 · 2021-05-18T19:56:18Z

Closing PR now. Will continue work on OpenSearch repo.

jmazanec15 added 30 commits March 8, 2021 20:17

Refactoring plugin code for supporting multiple engines

63a8a9f

Refactor spaceType to be flat in plugin

56a4c73

Refactor ANN scoring to support multiple engines

4bbfc26

Refactor inner product score translation

7e6efd8

Add method parameter for parsing method config

b16abea

Add support for faiss indices that require training

16eaf35

add PQ encoding support for faiss

96cb855

Modify training to use index data

629d205

Switch to debug log statements

1aa25f9

Add support for faiss flat index

632d8dc

Adjust training points to 5K

80f8f9e

Add support for extra parameters in jni and clean code

705f794

Clean up lib versioning

786ca2d

Remove unnecessary params from faiss jni

210a948

Dont generate extra parameters for nmslib

45f80ed

Set default parameter values for faiss

fa84683

Refactor structure of engine functions

fd33d64

Rename course to coarse

d1417a9

Support method context for nmslib hnsw parameters

fdcdb43

Pull strings out into constants

62cd44d

Refactor spaceType passing logic

952dfd6

Fix case for null parameter

cafc36e

Rename FAISSLibVersion to FaissLibVersion

aa77e34

Improve parsing implementation

dc381fb

Make training limits configurable

608c8c1

Allow pq for flat faiss index

6695cb8

Add extra params for hnsw

dcc05d1

Minor clean up

75e43d8

Refactor engine logic

3dce15d

Minor refactoring to validation logic

99ebf56

jmazanec15 added 4 commits April 28, 2021 10:26

Minor refactoring

fbcb76d

Move parameter and methodcomponent into individual file

18ca977

Use builder to build MethodComponent

22501e3

Add builder for KNNMethod

5139e7a