-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for asymmetric embedding models #710
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: br3no <[email protected]>
@br3no can you add an entry in the changelog. |
@br3no Thanks for raising the PR. I am wondering do we require this change? In MLCommons repository a generic MLInference processor is getting launched which is supposed to do the inference of any kind of model both during ingestion and search. RFC: opensearch-project/ml-commons#2173 That capability is getting build as of now. Do you think we still need this feature then? |
Signed-off-by: br3no <[email protected]>
@navneet1v I have been loosely following the discussions in the mentioned RFC. It's a large change that I don't expect to be stable soon – the PR is very much in flux. Also, I don't see the use-case of asymmetric embedding models being addressed. This PR here is much smaller in comparison and is not in any way in conflict with the RFC work. If once the work on the ML Inference Processors is finished and the use-case is addressed there as well, we can deprecate and eventually remove the functionality again. Until then, this PR offers users the chance to use more modern local embeddings. I'm eager to put this to spin, tbh. |
If that is the case I would recommend posting the same on the RFC to ensure that your use case is handled. On the other hand, I do agree this is an interesting feature. I would like to get some eyes on this change mainly in terms of should this be added or not given a more generic processor is around the corner. As I am of my opinion is concerned the main reason of generic processor was to avoid creating new/updating processors to support new model types which is happening in this PR. Thoughts? @jmazanec15 , @martin-gaievski , @vamshin , @vibrantvarun . Let me add some PMs too for Opensearch-project to know their thoughts. @dylan-tong-aws |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #710 +/- ##
============================================
- Coverage 85.02% 84.41% -0.61%
+ Complexity 790 785 -5
============================================
Files 60 59 -1
Lines 2430 2464 +34
Branches 410 409 -1
============================================
+ Hits 2066 2080 +14
- Misses 202 215 +13
- Partials 162 169 +7 ☔ View full report in Codecov by Sentry. |
@navneet1v I have added a comment earlier today to the RFC (cf. opensearch-project/ml-commons#2173 (comment)). Sure, let's open the discussion and get some PMs into it. I really don't mind leaving this out if the support is introduced in another PR in 2.14. I'm concerned opensearch-project/ml-commons#2173 is a much larger effort, that won't be ready that quickly... It's not about my contribution – I need the feature. 🙃 |
I can see the feature is marked for 2.14 release of Opensearch. Let me add maintainers from ML team too. @mingshl , @ylwu-amzn |
Signed-off-by: br3no <[email protected]>
@mingshl @ylwu-amzn, I'd really like to have this feature in 2.14. Do you think this use-case will be fully supported with opensearch-project/ml-commons#2173? Cf. opensearch-project/ml-commons#2173 (comment) If not, I'd be happy to help this PR get merged as an interim solution! Let me know what you think! |
@br3no ml inference processor is targeting at first supporting remote model only. How did you usually connect this model? is it local or remote? if remote, can you please provide a SageMaker deployment code piece then I can quickly test it in 2.14 test cluster. Thanks |
@mingshl sorry for taking so long to answer! The use-case for now is to use a local, asymmetric model such as https://huggingface.co/intfloat/multilingual-e5-small. This PR here is the last puzzle piece to allow one to use these kinds of model and should in principle also work with remote models. It makes sure that the neural-search plugin uses the correct inference parameters when embedding passages and queries with asymmetric models. Regardless of whether the model is local or remote, if you are using asymmetric models, you will need to provide this information anyway. The thing is that asymmetric models need to know at inference time what exactly they are embedding. OpenSearch currently treats embedding models as symmetric, meaning that regardless of whether the text being embedded is a query or a passage, the embedding will be always the same. Asymmetric models require content "hints" to the text being embedded; the model exemplified above uses the string prefixes In opensearch-project/ml-commons#1799 we have added the concept of asymmetric models into ml-commons, introducing the I would really be happy to get this merged as an interim solution until the ml inference processor fully supports this use-case. |
I also vote for this PR in need for this functionality. |
@br3no will it possible if you can contribute back in MLInference processor for local model support? Is that even an option? |
@navneet1v you mean making sure this works there as well? Sure, I can commit to that. I'd propose then to merge this PR now and then start the work to eventually replace this once the MLInference processor supports this use case... |
The problem is once this is released it cannot be deprecated till a major version release. Hence I am bit hesitant to have this feature in neural search plugin. |
This PR has a very small surface. It doesn't change any APIs. So I believe this is not something to worry about, actually. Once the MLInferece processor supports asymmetric models the neural-search plugin can be changed to start using it instead of what I built here. This would be not a breaking change, only an internal implementation detail. |
Thanks @br3no , Can you show some examples of neural search query with asymmetric embedding model? ML inference processor can support asymmetric embedding model. We are working on unifying the interface so we can also support local model in ML inference processor. I think @navneet1v concern is valid. We should consider deprecation effort. Let's check how user experience looks like with this PR. |
@ylwu-amzn as I said in the comment above, there is no API change in this PR. You would use the neural-search plugin in the same way it is used today. What is implemented here? |
@br3no , sorry that I didn't have enough time to read the details. Took a quick look, seems the code will specify the "QUERY" type when run query, and set "PASSAGE" type when ingest. That seems no API change, cx can use same ingest pipeline and neural search, switching to asymmetric model is seamless. I think this is a good design, so it can support BWC and seems can be deprecated/migrated together with current text embedding APIs. I'm good for merge this. |
Exactly! Great! |
@ylwu-amzn could you drive the review process forward then? |
Sorry , missed your comment. Asking neural-search maintainers if they have other concerns Update: Pinged neural-search plugin owner SDM @vamshin , he will ask team help review. |
400cace
to
230e0d2
Compare
Signed-off-by: Martin Gaievski <[email protected]>
230e0d2
to
6b4b68b
Compare
Mockito.verifyNoMoreInteractions(singleSentenceResultListener); | ||
} | ||
|
||
public void testInferenceSentences_whenGetModelException_thenFailure() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to test scenario when we're retrying 1-2 times. I see scenario when first request has failed with error that isn't retryable, this isn't a full coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martin-gaievski thanks for pointing this out.
I'm wondering though, if we should make this retryable at all. Let me elaborate:
In my understanding, inference requests are retried because they tend to fail more often than regular operations in OpenSearch. I don't know the history and complete reasoning behind this, so I speculate it has to do with the fact that the inference is done natively and that many things can go wrong there.
With my change, if fetching the model information fails (mlClient.getModel(modelId, ...
), there is no retry. Model information is fetched the first time inference is requested with a particular model. After that, the result is cached and the method behaves exactly as before the PR.
So my argument is: should we really add a retry logic to this relatively simple operation? If getModel
fails, it is most likely to fail again, so retrying wouldn't make sense. If so, one could argue that all operations in OpenSearch should be wrapped in a retry logic.
* @param mlAlgoParams {@link MLAlgoParams} which will be used to run the inference | ||
* @param listener {@link ActionListener} which will be called when prediction is completed or errored out | ||
*/ | ||
public void inferenceSentence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to refactor this method to accept single POJO object that can be built with builder pattern, similar to ml-commons class with algorithm params. Reason - we cannot add new method every time we need to add new parameter to inference ml client.
I'm fine if we do it in a separate PR, please create GH issue and post link here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Cf. #790
@@ -40,6 +48,7 @@ | |||
public class MLCommonsClientAccessor { | |||
private static final List<String> TARGET_RESPONSE_FILTERS = List.of("sentence_embedding"); | |||
private final MachineLearningNodeClient mlClient; | |||
private final Map<String, Boolean> modelAsymmetryCache = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add comment around this cache behavior and usage. You can start from following:
- it's local for the data node
- how we invalidate and evict if we ever going to do this
- if cache miss, what is the latency to do retrieve value via API call and put it to cache
- how big is a single object, that also implies some eviction strategy as we cannot grow indefinitely
- what is the behavior is case of node restart/drop/model got redeployed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, it is local for the data node.
- We are never evicting entries.
- The latency is that of fetching the model configuration and parsing it.
- The cache will map
modelId
toBoolean
; the size requirement is 20 chars + 1 Boolean per entry. - If the node restarts the cache will be empty; the first inference request will lead to the cache entry being populated
- if the node drops the cluster, the inference request will fail. The cache will not be emptied.
- if the model gets redeployed the user will need to request inference with a new
modelId
; which will lead to a new cache entry. The old one will continue there.
The only scenario I can think of where this design could be problematic is if a malicious actor floods the cluster with billions of inference requests with inexistent models. This would lead to an increase in heap usage that would never be GC'd.
Let me know if you think this should be changed.
Is there some progress or news regarding this PR? It is still a small thing without Api change which is the last puzzle piece to enable asymmetric embedding models. |
Description
This PR adds support for asymmetric embedding models such as https://huggingface.co/intfloat/multilingual-e5-small to the neural-search plugin.
It builds on the work done in opensearch-project/ml-commons#1799.
Asymmetric embedding models behave differently when embedding passages and queries. For that end, the model must "know" on inference time, what kind of data it is embedding.
The changes are:
1.
src/main/java/org/opensearch/neuralsearch/processor/TextEmbeddingProcessor.java
The processor signals it is embedding passages, by passing the new
AsymmetricTextEmbeddingParameters
using the content typeEmbeddingContentType.PASSAGE
.2.
src/main/java/org/opensearch/neuralsearch/query/NeuralQueryBuilder.java
Analogously, the query builder uses
EmbeddingContentType.QUERY
.3.
src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java
Here is where most of the work was done. The class has been extended in a backwards-compatible way with inference methods that allow one to pass
MLAlgoParams
objects. Usage ofAsymmetricTextEmbeddingParameters
(which implementsMLAlgoParams
) is mandatory for asymmetric models. At the same time symmetric models do not accept them.The only way to know whether a model is asymmetric or symmetric is by reading its model configuration (if the models' configuration contains a
passage_prefix
and/or aquery_prefix
, they are asymmetric, otherwise they are symmetric).The
src/main/java/org/opensearch/neuralsearch/ml/MLCommonsClientAccessor.java
class deals with this, keeping the complexity in one place and not requiring any API change to the neural-search plugin (as proposed in #620). When calling the inference methods, clients (such as theTextEmbeddingProcessor
) may pass theAsymmetricTextEmbeddingParameters
object without caring if the model they are using is symmetric or asymmetric. The accessor class will first read the model's configuration (by calling thegetModel
API of themlClient
) and deal appropriately.To avoid adding this extra roundtrip to every inference call, the asymmetry information is kept in a cache in memory.
Issues Resolved
#620
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.