Skip to content

Commit

Permalink
Add qa model and new settings in ml-commons (#6749)
Browse files Browse the repository at this point in the history
* add qa model and settings ocumentation in ml-commons

Signed-off-by: Bhavana Ramaram <[email protected]>

* add more settings

Signed-off-by: Bhavana Ramaram <[email protected]>

* Doc reveiw

Signed-off-by: Fanit Kolchina <[email protected]>

* Tech review comments

Signed-off-by: Fanit Kolchina <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>

* Apply suggestions from code review

Signed-off-by: kolchfa-aws <[email protected]>

---------

Signed-off-by: Bhavana Ramaram <[email protected]>
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
Co-authored-by: Fanit Kolchina <[email protected]>
Co-authored-by: kolchfa-aws <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
4 people authored Mar 25, 2024
1 parent 88242fa commit 7e8026c
Show file tree
Hide file tree
Showing 2 changed files with 204 additions and 3 deletions.
144 changes: 143 additions & 1 deletion _ml-commons-plugin/cluster-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,33 @@ plugins.ml_commons.native_memory_threshold: 90
- Default value: 90
- Value range: [0, 100]
## Set JVM heap memory threshold
Sets a circuit breaker that checks JVM heap memory usage before running an ML task. If the heap usage exceeds the threshold, OpenSearch triggers a circuit breaker and throws an exception to maintain optimal performance.
Values are based on the percentage of JVM heap memory available. When set to `0`, no ML tasks will run. When set to `100`, the circuit breaker closes and no threshold exists.
### Setting
```
plugins.ml_commons.jvm_heap_memory_threshold: 85
```
### Values
- Default value: 85
- Value range: [0, 100]
## Exclude node names
Use this setting to specify the names of nodes on which you don't want to run ML tasks. The value should be a valid node name or a comma-separated node name list.
### Setting
```
plugins.ml_commons.exclude_nodes._name: node1, node2
```
## Allow custom deployment plans
When enabled, this setting grants users the ability to deploy models to specific ML nodes according to that user's permissions.
Expand All @@ -254,6 +281,21 @@ plugins.ml_commons.allow_custom_deployment_plan: false
- Default value: false
- Valid values: `false`, `true`
## Enable auto deploy
This setting is applicable when you send a prediction request for an externally hosted model that has not been deployed. When set to `true`, this setting automatically deploys the model to the cluster if the model has not been deployed already.
### Setting
```
plugins.ml_commons.model_auto_deploy.enable: false
```
### Values
- Default value: `true`
- Valid values: `false`, `true`
## Enable auto redeploy
This setting automatically redeploys deployed or partially deployed models upon cluster failure. If all ML nodes inside a cluster crash, the model switches to the `DEPLOYED_FAILED` state, and the model must be deployed manually.
Expand Down Expand Up @@ -326,10 +368,110 @@ plugins.ml_commons.connector_access_control_enabled: true
### Values
- Default value: false
- Default value: `false`
- Valid values: `false`, `true`
## Enable a local model
This setting allows a cluster admin to enable running local models on the cluster. When this setting is `false`, users will not be able to run register, deploy, or predict operations on any local model.
### Setting
```
plugins.ml_commons.local_model.enabled: true
```
### Values
- Default value: `true`
- Valid values: `false`, `true`
## Node roles that can run externally hosted models
This setting allows a cluster admin to control the types of nodes on which externally hosted models can run.
### Setting
```
plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model: ["ml"]
```
### Values
- Default value: `["data", "ml"]`, which allows externally hosted models to run on data nodes and ML nodes.
## Node roles that can run local models
This setting allows a cluster admin to control the types of nodes on which local models can run. The `plugins.ml_commons.only_run_on_ml_node` setting only allows the model to run on ML nodes. For a local model, if `plugins.ml_commons.only_run_on_ml_node` is set to `true`, then the model will always run on ML nodes. If `plugins.ml_commons.only_run_on_ml_node` is set to `false`, then the model will run on nodes defined in the `plugins.ml_commons.task_dispatcher.eligible_node_role.local_model` setting.
### Setting
```
plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model: ["ml"]
```
### Values
- Default value: `["data", "ml"]`
## Enable remote inference
This setting allows a cluster admin to enable remote inference on the cluster. If this setting is `false`, users will not be able to run register, deploy, or predict operations on any externally hosted model or create a connector for remote inference.
### Setting
```
plugins.ml_commons.remote_inference.enabled: true
```
### Values
- Default value: `true`
- Valid values: `false`, `true`
## Enable agent framework
When set to `true`, this setting enables the agent framework (including agents and tools) on the cluster and allows users to run register, execute, delete, get, and search operations on an agent.
### Setting
```
plugins.ml_commons.agent_framework_enabled: true
```
### Values
- Default value: `true`
- Valid values: `false`, `true`
## Enable memory
When set to `true`, this setting enables conversational memory, which stores all messages from a conversation for conversational search.
### Setting
```
plugins.ml_commons.memory_feature_enabled: true
```
### Values
- Default value: `true`
- Valid values: `false`, `true`
## Enable RAG pipeline
When set to `true`, this setting enables the search processors for retrieval-augmented generation (RAG). RAG enhances query results by generating responses using relevant information from memory and previous conversations.
### Setting
```
plugins.ml_commons.agent_framework_enabled: true
```
### Values
- Default value: `true`
- Valid values: `false`, `true`
63 changes: 61 additions & 2 deletions _ml-commons-plugin/custom-local-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@ As of OpenSearch 2.11, OpenSearch supports local sparse encoding models.

As of OpenSearch 2.12, OpenSearch supports local cross-encoder models.

As of OpenSearch 2.13, OpenSearch supports local question answering models.

Running local models on the CentOS 7 operating system is not supported. Moreover, not all local models can run on all hardware and operating systems.
{: .important}

## Preparing a model

For both text embedding and sparse encoding models, you must provide a tokenizer JSON file within the model zip file.
For all the models, you must provide a tokenizer JSON file within the model zip file.

For sparse encoding models, make sure your output format is `{"output":<sparse_vector>}` so that ML Commons can post-process the sparse vector.

Expand Down Expand Up @@ -157,7 +159,7 @@ POST /_plugins/_ml/models/_register
```
{% include copy.html %}

For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`.
For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. For question answering models, set this parameter to `QUESTION_ANSWERING`.

OpenSearch returns the task ID of the register operation:

Expand Down Expand Up @@ -321,3 +323,60 @@ The response contains the tokens and weights:
## Step 5: Use the model for search

To learn how to use the model for vector search, see [Using an ML model for neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/#using-an-ml-model-for-neural-search).

## Question answering models

A question answering model extracts the answer to a question from a given context. ML Commons supports context in `text` format.

To register a question answering model, send a request in the following format. Specify the `function_name` as `QUESTION_ANSWERING`:

```json
POST /_plugins/_ml/models/_register
{
"name": "question_answering",
"version": "1.0.0",
"function_name": "QUESTION_ANSWERING",
"description": "test model",
"model_format": "TORCH_SCRIPT",
"model_group_id": "lN4AP40BKolAMNtR4KJ5",
"model_content_hash_value": "e837c8fc05fd58a6e2e8383b319257f9c3859dfb3edc89b26badfaf8a4405ff6",
"model_config": {
"model_type": "bert",
"framework_type": "huggingface_transformers"
},
"url": "https://github.com/opensearch-project/ml-commons/blob/main/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/question_answering/question_answering_pt.zip?raw=true"
}
```
{% include copy-curl.html %}

Then send a request to deploy the model:

```json
POST _plugins/_ml/models/<model_id>/_deploy
```
{% include copy-curl.html %}

To test a question answering model, send the following request. It requires a `question` and the relevant `context` from which the answer will be generated:

```json
POST /_plugins/_ml/_predict/question_answering/<model_id>
{
"question": "Where do I live?"
"context": "My name is John. I live in New York"
}
```
{% include copy-curl.html %}

The response provides the answer based on the context:

```json
{
"inference_results": [
{
"output": [
{
"result": "New York"
}
}
}
```

0 comments on commit 7e8026c

Please sign in to comment.