Add support for asymmetric embedding models #710

br3no · 2024-04-25T19:45:38Z

Description

This PR adds support for asymmetric embedding models such as https://huggingface.co/intfloat/multilingual-e5-small to the neural-search plugin.

It builds on the work done in opensearch-project/ml-commons#1799.

Asymmetric embedding models behave differently when embedding passages and queries. For that end, the model must "know" on inference time, what kind of data it is embedding.

The changes are:

1. `src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java`

The processor signals it is embedding passages, by passing the new AsymmetricTextEmbeddingParameters using the content type EmbeddingContentType.PASSAGE.

2. `src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java`

Analogously, the query builder uses EmbeddingContentType.QUERY.

3. `src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java`

Here is where most of the work was done. The class has been extended in a backwards-compatible way with inference methods that allow one to pass MLAlgoParams objects. Usage of AsymmetricTextEmbeddingParameters (which implements MLAlgoParams) is mandatory for asymmetric models. At the same time symmetric models do not accept them.

The only way to know whether a model is asymmetric or symmetric is by reading its model configuration (if the models' configuration contains a passage_prefix and/or a query_prefix, they are asymmetric, otherwise they are symmetric).

The src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java class deals with this, keeping the complexity in one place and not requiring any API change to the neural-search plugin (as proposed in #620). When calling the inference methods, clients (such as the TextEmbeddingProcessor) may pass the AsymmetricTextEmbeddingParameters object without caring if the model they are using is symmetric or asymmetric. The accessor class will first read the model's configuration (by calling the getModel API of the mlClient) and deal appropriately.

To avoid adding this extra roundtrip to every inference call, the asymmetry information is kept in a cache in memory.

Issues Resolved

#620

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: br3no <[email protected]>

navneet1v · 2024-04-26T06:10:22Z

@br3no can you add an entry in the changelog.

navneet1v · 2024-04-26T06:13:03Z

@br3no Thanks for raising the PR. I am wondering do we require this change? In MLCommons repository a generic MLInference processor is getting launched which is supposed to do the inference of any kind of model both during ingestion and search. RFC: opensearch-project/ml-commons#2173

That capability is getting build as of now. Do you think we still need this feature then?

Signed-off-by: br3no <[email protected]>

br3no · 2024-04-26T07:23:28Z

@navneet1v I have been loosely following the discussions in the mentioned RFC. It's a large change that I don't expect to be stable soon – the PR is very much in flux. Also, I don't see the use-case of asymmetric embedding models being addressed.

This PR here is much smaller in comparison and is not in any way in conflict with the RFC work. If once the work on the ML Inference Processors is finished and the use-case is addressed there as well, we can deprecate and eventually remove the functionality again.

Until then, this PR offers users the chance to use more modern local embeddings. I'm eager to put this to spin, tbh.

navneet1v · 2024-04-26T08:16:50Z

Also, I don't see the use-case of asymmetric embedding models being addressed.

If that is the case I would recommend posting the same on the RFC to ensure that your use case is handled.

On the other hand, I do agree this is an interesting feature. I would like to get some eyes on this change mainly in terms of should this be added or not given a more generic processor is around the corner. As I am of my opinion is concerned the main reason of generic processor was to avoid creating new/updating processors to support new model types which is happening in this PR.

Thoughts? @jmazanec15 , @martin-gaievski , @vamshin , @vibrantvarun .

Let me add some PMs too for Opensearch-project to know their thoughts. @dylan-tong-aws

codecov · 2024-04-26T08:19:22Z

Codecov Report

Attention: Patch coverage is 87.12871% with 13 lines in your changes missing coverage. Please review.

Project coverage is 84.41%. Comparing base (7c54c86) to head (44f14ec).
Report is 12 commits behind head on main.

❗ Current head 44f14ec differs from pull request most recent head 6d3dba6

Please upload reports for the commit 6d3dba6 to get more accurate results.

Files	Patch %	Lines
...earch/neuralsearch/ml/MLCommonsClientAccessor.java	85.22%	9 Missing and 4 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #710      +/-   ##
============================================
- Coverage     85.02%   84.41%   -0.61%     
+ Complexity      790      785       -5     
============================================
  Files            60       59       -1     
  Lines          2430     2464      +34     
  Branches        410      409       -1     
============================================
+ Hits           2066     2080      +14     
- Misses          202      215      +13     
- Partials        162      169       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

br3no · 2024-04-26T09:41:52Z

@navneet1v I have added a comment earlier today to the RFC (cf. opensearch-project/ml-commons#2173 (comment)).

Sure, let's open the discussion and get some PMs into it.

I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly...

It's not about my contribution – I need the feature. 🙃

navneet1v · 2024-04-26T17:01:03Z

I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly...

I can see the feature is marked for 2.14 release of Opensearch. Let me add maintainers from ML team too. @mingshl , @ylwu-amzn

Signed-off-by: br3no <[email protected]>

br3no · 2024-04-29T14:05:46Z

@mingshl @ylwu-amzn, I'd really like to have this feature in 2.14.

Do you think this use-case will be fully supported with opensearch-project/ml-commons#2173? Cf. opensearch-project/ml-commons#2173 (comment)

If not, I'd be happy to help this PR get merged as an interim solution! Let me know what you think!

mingshl · 2024-04-29T17:29:14Z

@br3no ml inference processor is targeting at first supporting remote model only. How did you usually connect this model? is it local or remote?

if remote, can you please provide a SageMaker deployment code piece then I can quickly test it in 2.14 test cluster. Thanks

br3no · 2024-05-13T09:41:48Z

@mingshl sorry for taking so long to answer!

The use-case for now is to use a local, asymmetric model such as https://huggingface.co/intfloat/multilingual-e5-small.

This PR here is the last puzzle piece to allow one to use these kinds of model and should in principle also work with remote models. It makes sure that the neural-search plugin uses the correct inference parameters when embedding passages and queries with asymmetric models. Regardless of whether the model is local or remote, if you are using asymmetric models, you will need to provide this information anyway.

The thing is that asymmetric models need to know at inference time what exactly they are embedding. OpenSearch currently treats embedding models as symmetric, meaning that regardless of whether the text being embedded is a query or a passage, the embedding will be always the same. Asymmetric models require content "hints" to the text being embedded; the model exemplified above uses the string prefixes passage: and query: . These models perform better than similarly sized symmetric models.

In opensearch-project/ml-commons#1799 we have added the concept of asymmetric models into ml-commons, introducing the AsymmetricTextEmbeddingParameters class, used at inference time to signal if the text being embedded is a query or a passage. So this PR is only using this new infrastructure.

I would really be happy to get this merged as an interim solution until the ml inference processor fully supports this use-case.

reuschling · 2024-05-15T13:24:44Z

I also vote for this PR in need for this functionality.

navneet1v · 2024-05-15T17:03:06Z

@br3no will it possible if you can contribute back in MLInference processor for local model support? Is that even an option?

br3no · 2024-05-15T17:42:07Z

@navneet1v you mean making sure this works there as well? Sure, I can commit to that. I'd propose then to merge this PR now and then start the work to eventually replace this once the MLInference processor supports this use case...

navneet1v · 2024-05-15T18:14:42Z

I'd propose then to merge this PR now and then start the work to eventually replace this once the MLInference processor supports this use case...

The problem is once this is released it cannot be deprecated till a major version release. Hence I am bit hesitant to have this feature in neural search plugin.

br3no · 2024-05-16T12:53:16Z

This PR has a very small surface. It doesn't change any APIs. So I believe this is not something to worry about, actually. Once the MLInferece processor supports asymmetric models the neural-search plugin can be changed to start using it instead of what I built here. This would be not a breaking change, only an internal implementation detail.

ylwu-amzn · 2024-05-16T15:15:21Z

This PR has a very small surface. It doesn't change any APIs. So I believe this is not something to worry about, actually. Once the MLInferece processor supports asymmetric models the neural-search plugin can be changed to start using it instead of what I built here. This would be not a breaking change, only an internal implementation detail.

Thanks @br3no , Can you show some examples of neural search query with asymmetric embedding model? ML inference processor can support asymmetric embedding model. We are working on unifying the interface so we can also support local model in ML inference processor. I think @navneet1v concern is valid. We should consider deprecation effort. Let's check how user experience looks like with this PR.

br3no · 2024-05-17T10:03:50Z

@ylwu-amzn as I said in the comment above, there is no API change in this PR. You would use the neural-search plugin in the same way it is used today.

What is implemented here?
The main work is done in MLCommonsClientAccessor. Whenever neural-search is embedding text (either in the TextEmbeddingProcessor or in the NeuralQueryBuilder) the plugin will ensure that if the embedding model is asymmetric, the parameters will be passed on correctly. More details can be found in the description and code.

ylwu-amzn · 2024-05-17T18:43:40Z

@br3no , sorry that I didn't have enough time to read the details. Took a quick look, seems the code will specify the "QUERY" type when run query, and set "PASSAGE" type when ingest. That seems no API change, cx can use same ingest pipeline and neural search, switching to asymmetric model is seamless. I think this is a good design, so it can support BWC and seems can be deprecated/migrated together with current text embedding APIs. I'm good for merge this.

br3no · 2024-05-18T07:10:17Z

Exactly! Great!

br3no · 2024-05-22T08:14:32Z

@ylwu-amzn could you drive the review process forward then?

ylwu-amzn · 2024-06-10T15:33:21Z

@ylwu-amzn could you drive the review process forward then?

Sorry , missed your comment. Asking neural-search maintainers if they have other concerns

Update: Pinged neural-search plugin owner SDM @vamshin , he will ask team help review.

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski · 2024-06-11T16:31:24Z

src/test/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessorTests.java

+        Mockito.verifyNoMoreInteractions(singleSentenceResultListener);
+    }
+
+    public void testInferenceSentences_whenGetModelException_thenFailure() {


we need to test scenario when we're retrying 1-2 times. I see scenario when first request has failed with error that isn't retryable, this isn't a full coverage

@martin-gaievski thanks for pointing this out.

I'm wondering though, if we should make this retryable at all. Let me elaborate:

In my understanding, inference requests are retried because they tend to fail more often than regular operations in OpenSearch. I don't know the history and complete reasoning behind this, so I speculate it has to do with the fact that the inference is done natively and that many things can go wrong there.

With my change, if fetching the model information fails (mlClient.getModel(modelId, ...), there is no retry. Model information is fetched the first time inference is requested with a particular model. After that, the result is cached and the method behaves exactly as before the PR.

So my argument is: should we really add a retry logic to this relatively simple operation? If getModel fails, it is most likely to fail again, so retrying wouldn't make sense. If so, one could argue that all operations in OpenSearch should be wrapped in a retry logic.

martin-gaievski · 2024-06-11T16:36:57Z

src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

+     * @param mlAlgoParams {@link MLAlgoParams} which will be used to run the inference
+     * @param listener {@link ActionListener} which will be called when prediction is completed or errored out
+     */
+    public void inferenceSentence(


We need to refactor this method to accept single POJO object that can be built with builder pattern, similar to ml-commons class with algorithm params. Reason - we cannot add new method every time we need to add new parameter to inference ml client.
I'm fine if we do it in a separate PR, please create GH issue and post link here.

Okay. Cf. #790

martin-gaievski · 2024-06-11T16:49:58Z

src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

@@ -40,6 +48,7 @@
 public class MLCommonsClientAccessor {
    private static final List<String> TARGET_RESPONSE_FILTERS = List.of("sentence_embedding");
    private final MachineLearningNodeClient mlClient;
+    private final Map<String, Boolean> modelAsymmetryCache = new ConcurrentHashMap<>();


please add comment around this cache behavior and usage. You can start from following:

it's local for the data node

how we invalidate and evict if we ever going to do this

if cache miss, what is the latency to do retrieve value via API call and put it to cache

how big is a single object, that also implies some eviction strategy as we cannot grow indefinitely

what is the behavior is case of node restart/drop/model got redeployed

Yes, it is local for the data node.

We are never evicting entries.

The latency is that of fetching the model configuration and parsing it.

The cache will map modelId to Boolean; the size requirement is 20 chars + 1 Boolean per entry.

If the node restarts the cache will be empty; the first inference request will lead to the cache entry being populated

if the node drops the cluster, the inference request will fail. The cache will not be emptied.

if the model gets redeployed the user will need to request inference with a new modelId; which will lead to a new cache entry. The old one will continue there.

The only scenario I can think of where this design could be problematic is if a malicious actor floods the cluster with billions of inference requests with inexistent models. This would lead to an increase in heap usage that would never be GC'd.

Let me know if you think this should be changed.

reuschling · 2024-10-09T12:21:21Z

Is there some progress or news regarding this PR? It is still a small thing without Api change which is the last puzzle piece to enable asymmetric embedding models.

adding support for asymmetric embedding models

170337a

Signed-off-by: br3no <[email protected]>

br3no requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, sean-zheng-amazon, model-collapse, zane-neo, ylwu-amzn, jngz-es, vibrantvarun and zhichao-aws as code owners April 25, 2024 19:45

br3no mentioned this pull request Apr 25, 2024

[PROPOSAL] Add support for asymmetric embedding models to neural-search #620

Open

adding changelog entry

44f14ec

Signed-off-by: br3no <[email protected]>

missing paramter

6262b57

Signed-off-by: br3no <[email protected]>

br3no mentioned this pull request Apr 26, 2024

[BUG] asymmetric model inference ignores ModelResultFilter opensearch-project/ml-commons#2366

Closed

martin-gaievski force-pushed the asymmetric-embeddings-620 branch from 400cace to 230e0d2 Compare June 11, 2024 16:22

Merge branch 'main' into asymmetric-embeddings-620

6b4b68b

Signed-off-by: Martin Gaievski <[email protected]>

martin-gaievski force-pushed the asymmetric-embeddings-620 branch from 230e0d2 to 6b4b68b Compare June 11, 2024 16:28

martin-gaievski reviewed Jun 11, 2024

View reviewed changes

Merge branch 'opensearch-project:main' into asymmetric-embeddings-620

6d3dba6

br3no mentioned this pull request Jun 12, 2024

[PROPOSAL] Refactor MLCommonsClientAccessor to make it open for extension #790

Open

martin-gaievski added the Enhancements Increases software capabilities beyond original client specifications label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for asymmetric embedding models #710

Add support for asymmetric embedding models #710

br3no commented Apr 25, 2024

navneet1v commented Apr 26, 2024

navneet1v commented Apr 26, 2024 •

edited

Loading

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

codecov bot commented Apr 26, 2024 •

edited

Loading

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

br3no commented Apr 29, 2024

mingshl commented Apr 29, 2024

br3no commented May 13, 2024

reuschling commented May 15, 2024

navneet1v commented May 15, 2024

br3no commented May 15, 2024

navneet1v commented May 15, 2024

br3no commented May 16, 2024

ylwu-amzn commented May 16, 2024

br3no commented May 17, 2024

ylwu-amzn commented May 17, 2024

br3no commented May 18, 2024

br3no commented May 22, 2024

ylwu-amzn commented Jun 10, 2024 •

edited

Loading

martin-gaievski Jun 11, 2024

br3no Jun 12, 2024

martin-gaievski Jun 11, 2024

br3no Jun 12, 2024

martin-gaievski Jun 11, 2024

br3no Jun 12, 2024

reuschling commented Oct 9, 2024

Add support for asymmetric embedding models #710

Are you sure you want to change the base?

Add support for asymmetric embedding models #710

Conversation

br3no commented Apr 25, 2024

Description

1. src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java

2. src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java

3. src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java

Issues Resolved

Check List

navneet1v commented Apr 26, 2024

navneet1v commented Apr 26, 2024 • edited Loading

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

codecov bot commented Apr 26, 2024 • edited Loading

Codecov Report

br3no commented Apr 26, 2024

navneet1v commented Apr 26, 2024

br3no commented Apr 29, 2024

mingshl commented Apr 29, 2024

br3no commented May 13, 2024

reuschling commented May 15, 2024

navneet1v commented May 15, 2024

br3no commented May 15, 2024

navneet1v commented May 15, 2024

br3no commented May 16, 2024

ylwu-amzn commented May 16, 2024

br3no commented May 17, 2024

ylwu-amzn commented May 17, 2024

br3no commented May 18, 2024

br3no commented May 22, 2024

ylwu-amzn commented Jun 10, 2024 • edited Loading

martin-gaievski Jun 11, 2024

Choose a reason for hiding this comment

br3no Jun 12, 2024

Choose a reason for hiding this comment

martin-gaievski Jun 11, 2024

Choose a reason for hiding this comment

br3no Jun 12, 2024

Choose a reason for hiding this comment

martin-gaievski Jun 11, 2024

Choose a reason for hiding this comment

br3no Jun 12, 2024

Choose a reason for hiding this comment

reuschling commented Oct 9, 2024

1. `src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java`

2. `src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java`

3. `src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java`

navneet1v commented Apr 26, 2024 •

edited

Loading

codecov bot commented Apr 26, 2024 •

edited

Loading

ylwu-amzn commented Jun 10, 2024 •

edited

Loading