opensearch-project · kolchfa-aws · Mar 25, 2024 · Mar 20, 2024 · Mar 21, 2024 · Mar 22, 2024
@@ -239,6 +239,33 @@ plugins.ml_commons.native_memory_threshold: 90
 - Default value: 90
 - Value range: [0, 100]
 
+## Set JVM heap memory threshold
+
+Sets a circuit breaker that checks JVM heap memory usage before running an ML task. If the heap usage exceeds the threshold, OpenSearch triggers a circuit breaker and throws an exception to maintain optimal performance.
+
+Values are based on the percentage of JVM heap memory available. When set to `0`, no ML tasks will run. When set to `100`, the circuit breaker closes and no threshold exists.
+
+### Setting
+
+```
+plugins.ml_commons.jvm_heap_memory_threshold: 85
+```
+
+### Values
+
+- Default value: 85
+- Value range: [0, 100]
+
+## Exclude node names
+
+Use this setting to specify the names of nodes on which you don't want to run ML tasks. The value should be a valid node name or a comma-separated node name list.
+
+### Setting
+
+```
+plugins.ml_commons.exclude_nodes._name: node1, node2
+```
+
 ## Allow custom deployment plans
 
 When enabled, this setting grants users the ability to deploy models to specific ML nodes according to that user's permissions.
@@ -254,6 +281,21 @@ plugins.ml_commons.allow_custom_deployment_plan: false
 - Default value: false
 - Valid values: `false`, `true`
 
+## Enable auto deploy
+
+This setting is applicable when you send a prediction request for an externally hosted model that has not been deployed. When set to `true`, this setting automatically deploys the model to the cluster if the model has not been deployed already. 
+
+### Setting
+
+```
+plugins.ml_commons.model_auto_deploy.enable: false
+```
+
+### Values
+
+- Default value: `true`
+- Valid values: `false`, `true`
+
 ## Enable auto redeploy
 
 This setting automatically redeploys deployed or partially deployed models upon cluster failure. If all ML nodes inside a cluster crash, the model switches to the `DEPLOYED_FAILED` state, and the model must be deployed manually.
@@ -326,10 +368,110 @@ plugins.ml_commons.connector_access_control_enabled: true
 
 ### Values
 
-- Default value: false
+- Default value: `false`
 - Valid values: `false`, `true`
 
+## Enable a local model
+
+This setting allows a cluster admin to enable running local models on the cluster. When this setting is `false`, users will not be able to run register, deploy, or predict operations on any local model.
+
+### Setting
+
+```
+plugins.ml_commons.local_model.enabled: true
+```
 
+### Values
+
+- Default value: `true`
+- Valid values: `false`, `true`
 
+## Node roles that can run externally hosted models
 
+This setting allows a cluster admin to control the types of nodes on which externally hosted models can run.  
+
+### Setting
+
+```
+plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model: ["ml"]
+```
+
+### Values
+
+- Default value: `["data", "ml"]`, which allows externally hosted models to run on data nodes and ML nodes.
+
+
+## Node roles that can run local models
+
+This setting allows a cluster admin to control the types of nodes on which local models can run. The `plugins.ml_commons.only_run_on_ml_node` setting only allows the model to run on ML nodes. For a local model, if `plugins.ml_commons.only_run_on_ml_node` is set to `true`, then the model will always run on ML nodes. If `plugins.ml_commons.only_run_on_ml_node` is set to `false`, then the model will run on nodes defined in the `plugins.ml_commons.task_dispatcher.eligible_node_role.local_model` setting.
+
+### Setting
 
+```
+plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model: ["ml"]
+```
+
+### Values
+
+- Default value: `["data", "ml"]`
+
+## Enable remote inference
+
+This setting allows a cluster admin to enable remote inference on the cluster. If this setting is `false`, users will not be able to run register, deploy, or predict operations on any externally hosted model or create a connector for remote inference.
+
+### Setting
+
+```
+plugins.ml_commons.remote_inference.enabled: true
+```
+
+### Values
+
+- Default value: `true`
+- Valid values: `false`, `true`
+
+## Enable agent framework
+
+When set to `true`, this setting enables the agent framework (including agents and tools) on the cluster and allows users to run register, execute, delete, get, and search operations on an agent.
+
+### Setting
+
+```
+plugins.ml_commons.agent_framework_enabled: true
+```
+
+### Values
+
+- Default value: `true`
+- Valid values: `false`, `true`
+
+## Enable memory
+
+When set to `true`, this setting enables conversational memory, which stores all messages from a conversation for conversational search.
+
+### Setting
+
+```
+plugins.ml_commons.memory_feature_enabled: true
+```
+
+### Values
+
+- Default value: `true`
+- Valid values: `false`, `true`
+
+
+## Enable RAG pipeline
+
+When set to `true`, this setting enables the search processors for retrieval-augmented generation (RAG). RAG enhances query results by generating responses using relevant information from memory and previous conversations.
+
+### Setting
+
+```
+plugins.ml_commons.agent_framework_enabled: true
+```
+
+### Values
+
+- Default value: `true`
+- Valid values: `false`, `true`
@@ -20,12 +20,14 @@ As of OpenSearch 2.11, OpenSearch supports local sparse encoding models.
 
 As of OpenSearch 2.12, OpenSearch supports local cross-encoder models.
 
+As of OpenSearch 2.13, OpenSearch supports local question answering models.
+
 Running local models on the CentOS 7 operating system is not supported. Moreover, not all local models can run on all hardware and operating systems.
 {: .important}
 
 ## Preparing a model
 
-For both text embedding and sparse encoding models, you must provide a tokenizer JSON file within the model zip file.
+For all the models, you must provide a tokenizer JSON file within the model zip file.
 
 For sparse encoding models, make sure your output format is `{"output":<sparse_vector>}` so that ML Commons can post-process the sparse vector.
 
@@ -157,7 +159,7 @@ POST /_plugins/_ml/models/_register
 ```
 {% include copy.html %}
 
-For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`.
+For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. For question answering models, set this parameter to `QUESTION_ANSWERING`.
 
 OpenSearch returns the task ID of the register operation:
 
@@ -321,3 +323,60 @@ The response contains the tokens and weights:
 ## Step 5: Use the model for search
 
 To learn how to use the model for vector search, see [Using an ML model for neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/#using-an-ml-model-for-neural-search).
+
+## Question answering models
+
+A question answering model extracts the answer to a question from a given context. ML Commons supports context in `text` format.
+
+To register a question answering model, send a request in the following format. Specify the `function_name` as `QUESTION_ANSWERING`:
+
+```json
+POST /_plugins/_ml/models/_register
+{
+    "name": "question_answering",
+    "version": "1.0.0",
+    "function_name": "QUESTION_ANSWERING",
+    "description": "test model",
+    "model_format": "TORCH_SCRIPT",
+    "model_group_id": "lN4AP40BKolAMNtR4KJ5",
+    "model_content_hash_value": "e837c8fc05fd58a6e2e8383b319257f9c3859dfb3edc89b26badfaf8a4405ff6",
+    "model_config": { 
+        "model_type": "bert",
+        "framework_type": "huggingface_transformers"
+    },
+    "url": "https://github.com/opensearch-project/ml-commons/blob/main/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/question_answering/question_answering_pt.zip?raw=true"
+}
+```
+{% include copy-curl.html %}
+
+Then send a request to deploy the model:
+
+```json
+POST _plugins/_ml/models/<model_id>/_deploy
+```
+{% include copy-curl.html %}
+
+To test a question answering model, send the following request. It requires a `question` and the relevant `context` from which the answer will be generated:
+
+```json
+POST /_plugins/_ml/_predict/question_answering/<model_id>
+{
+  "question": "Where do I live?"
+  "context": "My name is John. I live in New York"
+}
+```
+{% include copy-curl.html %}
+
+The response provides the answer based on the context:
+
+```json
+{
+  "inference_results": [
+    {
+      "output": [
+        {
+          "result": "New York"
+        }
+    }
+}
+```