Merge branch 'main' into add_knn_sqfp16

Signed-off-by: kolchfa-aws <[email protected]>
opensearch-project · Mar 29, 2024 · 900478d · 900478d
2 parents 9a6c4e2 + 2e41a57
commit 900478d
Show file tree

Hide file tree

Showing 130 changed files with 4,377 additions and 476 deletions.
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -1 +1 @@
-*  @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99
+*  @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh
diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
@@ -26,6 +26,7 @@ Boolean
 Dev
 [Dd]iscoverability
 Distro
+[Dd]ownvote(s|d)?
 [Dd]uplicative
 [Ee]gress
 [Ee]num
@@ -124,6 +125,7 @@ stdout
 [Ss]ubvector
 [Ss]ubwords?
 [Ss]uperset
+[Ss]yslog
 tebibyte
 [Tt]emplated
 [Tt]okenization
@@ -140,6 +142,7 @@ tebibyte
 [Uu]nregister(s|ed|ing)?
 [Uu]pdatable
 [Uu]psert
+[Uu]pvote(s|d)?
 [Ww]alkthrough
 [Ww]ebpage
 xy
diff --git a/.github/workflows/vale.yml b/.github/workflows/vale.yml
@@ -20,4 +20,5 @@ jobs:
           reporter: github-pr-check
           filter_mode: added
           vale_flags: "--no-exit"
-          version: 2.28.0
+          version: 2.28.0
+        continue-on-error: true
diff --git a/MAINTAINERS.md b/MAINTAINERS.md
@@ -1,6 +1,6 @@
 ## Overview
 
-This document contains a list of maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) that explains what the role of maintainer means, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing, and becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md).
+This document lists the maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) for information about the role of a maintainer, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing or becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md).  
 
 ## Current Maintainers
 
@@ -9,8 +9,9 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
 | Heather Halter   | [hdhalter](https://github.com/hdhalter)         | Amazon      |
 | Fanit Kolchina   | [kolchfa-aws](https://github.com/kolchfa-aws)   | Amazon      |
 | Nate Archer      | [Naarcha-AWS](https://github.com/Naarcha-AWS)   | Amazon      |
-| Nate Bower       | [natebower](https://github.com/natebower)       | Amazon      |
+| Nathan Bower     | [natebower](https://github.com/natebower)       | Amazon      |
 | Melissa Vagi     | [vagimeli](https://github.com/vagimeli)         | Amazon      |
 | Miki Barahmand   | [AMoo-Miki](https://github.com/AMoo-Miki)       | Amazon      |
 | David Venable    | [dlvenable](https://github.com/dlvenable)       | Amazon      | 
 | Stephen Crawford | [scraw99](https://github.com/scrawfor99)        | Amazon      |
+| Eric Pugh        | [epugh](https://github.com/epugh)               | OpenSource Connections  | 
diff --git a/TERMS.md b/TERMS.md
@@ -236,6 +236,8 @@ Do not use *disable* to refer to users.
 
 Always hyphenated. Don’t use _double click_.
 
+**downvote**
+
 **dropdown list**
 
 **due to**
@@ -586,6 +588,10 @@ Use % in headlines, quotations, and tables or in technical copy.
 
 An agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM).
 
+**plaintext, plain text**
+
+Use *plaintext* only to refer to nonencrypted or decrypted text in content about encryption. Use *plain text* to refer to ASCII files.
+
 **please**
 
 Avoid using except in quoted text.
@@ -700,6 +706,8 @@ Never hyphenated. Use _startup_ as a noun (for example, “The following startup
 
 **Stochastic Gradient Descent (SGD)**
 
+**syslog**
+
 ## T
 
 **term frequency–inverse document frequency (TF–IDF)**
@@ -746,6 +754,8 @@ A storage tier that you can use to store and analyze your data with Elasticsearc
 
 Hyphenate as adjectives. Use instead of *top left* and *top right*, unless the field name uses *top*. For example, "The upper-right corner."
 
+**upvote**
+
 **US**
 
 No periods, as specified in the Chicago Manual of Style.

diff --git a/_analyzers/token-filters/index.md b/_analyzers/token-filters/index.md
diff --git a/_api-reference/document-apis/reindex.md b/_api-reference/document-apis/reindex.md
@@ -73,10 +73,11 @@ slice | Whether to manually or automatically slice the reindex operation so it e
 _source | Whether to reindex source fields. Specify a list of fields to reindex or true to reindex all fields. Default is true.
 id | The ID to associate with manual slicing.
 max | Maximum number of slices.
-dest | Information about the destination index. Valid values are `index`, `version_type`, and `op_type`.
+dest | Information about the destination index. Valid values are `index`, `version_type`, `op_type`, and `pipeline`.
 index | Name of the destination index.
 version_type | The indexing operation's version type. Valid values are `internal`, `external`, `external_gt` (retrieve the document if the specified version number is greater than the document’s current version), and `external_gte` (retrieve the document if the specified version number is greater or equal to than the document’s current version).
 op_type | Whether to copy over documents that are missing in the destination index. Valid values are `create` (ignore documents with the same ID from the source index) and `index` (copy everything from the source index).
+pipeline | Which ingest pipeline to utilize during the reindex.
 script | A script that OpenSearch uses to apply transformations to the data during the reindex operation.
 source | The actual script that OpenSearch runs.
 lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`.

diff --git a/_api-reference/index-apis/force-merge.md b/_api-reference/index-apis/force-merge.md
@@ -72,6 +72,7 @@ The following table lists the available query parameters. All query parameters a
 | `ignore_unavailable` | Boolean | If `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`. |
 | `max_num_segments` | Integer | The number of larger segments into which smaller segments are merged. Set this parameter to `1` to merge all segments into one segment. The default behavior is to perform the merge as necessary. |
 | `only_expunge_deletes` | Boolean | If `true`, the merge operation only expunges segments containing a certain percentage of deleted documents. The percentage is 10% by default and is configurable in the `index.merge.policy.expunge_deletes_allowed` setting. Prior to OpenSearch 2.12, `only_expunge_deletes` ignored the `index.merge.policy.max_merged_segment` setting. Starting with OpenSearch 2.12, using `only_expunge_deletes` does not produce segments larger than `index.merge.policy.max_merged_segment` (by default, 5 GB). For more information, see [Deleted documents](#deleted-documents). Default is `false`. |
+| `primary_only` | Boolean | If set to `true`, then the merge operation is performed only on the primary shards of an index. This can be useful when you want to take a snapshot of the index after the merge is complete. Snapshots only copy segments from the primary shards. Merging the primary shards can reduce resource consumption. Default is `false`. |
 
 #### Example request: Force merge a specific index
 
@@ -101,6 +102,13 @@ POST /.testindex-logs/_forcemerge?max_num_segments=1
 ```
 {% include copy-curl.html %}
 
+#### Example request: Force merge primary shards
+
+```json
+POST /.testindex-logs/_forcemerge?primary_only=true
+```
+{% include copy-curl.html %}
+
 #### Example response
 
 ```json

diff --git a/_api-reference/nodes-apis/nodes-stats.md b/_api-reference/nodes-apis/nodes-stats.md
@@ -731,7 +731,10 @@ Select the arrow to view the example response.
         "nxLWtMdXQmWA-ZBVWU8nwA": {
           "timestamp": 1698401391000,
           "cpu_utilization_percent": "0.1",
-          "memory_utilization_percent": "3.9"
+          "memory_utilization_percent": "3.9",
+          "io_usage_stats": {
+            "max_io_utilization_percent": "99.6"
+          }
         }
       },
       "admission_control": {
@@ -742,6 +745,14 @@ Select the arrow to view the example response.
               "indexing": 1
             }
           }
+        },
+        "global_io_usage": {
+          "transport": {
+            "rejection_count": {
+              "search": 3,
+              "indexing": 1
+            }
+          }
         }
       }
     }
@@ -1252,16 +1263,20 @@ The `resource_usage_stats` object contains the resource usage statistics. Each e
 Field | Field type | Description
 :--- |:-----------| :---
 timestamp | Integer    | The last refresh time for the resource usage statistics, in milliseconds since the epoch.
-cpu_utilization_percent | Float      | Statistics for the average CPU usage of OpenSearch process within the time period configured in the `node.resource.tracker.global_cpu_usage.window_duration` setting.
+cpu_utilization_percent | Float      | Statistics for the average CPU usage of any OpenSearch processes within the time period configured in the `node.resource.tracker.global_cpu_usage.window_duration` setting.
 memory_utilization_percent | Float      | The node JVM memory usage statistics within the time period configured in the `node.resource.tracker.global_jvmmp.window_duration` setting.
+max_io_utilization_percent  | Float     |  (Linux only) Statistics for the average IO usage of any OpenSearch processes within the time period configured in the `node.resource.tracker.global_io_usage.window_duration` setting.
 
 ### `admission_control`
 
 The `admission_control` object contains the rejection count of search and indexing requests based on resource consumption and has the following properties.
+
 Field | Field type | Description
 :--- | :--- | :---
-admission_control.global_cpu_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node CPU usage limit was breached. In this case, additional search requests are rejected until the system recovers.
-admission_control.global_cpu_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node CPU usage limit was breached. In this case, additional indexing requests are rejected until the system recovers.
+admission_control.global_cpu_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node CPU usage limit was met. In this case, additional search requests are rejected until the system recovers. The CPU usage limit is configured in the `admission_control.search.cpu_usage.limit` setting.
+admission_control.global_cpu_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node CPU usage limit was met. Any additional indexing requests are rejected until the system recovers. The CPU usage limit is configured in the `admission_control.indexing.cpu_usage.limit` setting.
+admission_control.global_io_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node IO usage limit was met. Any additional search requests are rejected until the system recovers. The CPU usage limit is configured in the `admission_control.search.io_usage.limit` setting (Linux only).
+admission_control.global_io_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node IO usage limit was met. Any additional indexing requests are rejected until the system recovers. The IO usage limit is configured in the `admission_control.indexing.io_usage.limit` setting (Linux only).
 
 ## Required permissions
 

diff --git a/_api-reference/snapshots/get-snapshot-status.md b/_api-reference/snapshots/get-snapshot-status.md
@@ -29,9 +29,9 @@ Three request variants provide flexibility:
 
 * `GET _snapshot/_status` returns the status of all currently running snapshots in all repositories.
 
-* `GET _snapshot/<repository>/_status` returns the status of only currently running snapshots in the specified repository. This is the preferred variant.
+* `GET _snapshot/<repository>/_status` returns all currently running snapshots in the specified repository. This is the preferred variant.
 
-* `GET _snapshot/<repository>/<snapshot>/_status` returns the status of all snapshots in the specified repository whether they are running or not.
+* `GET _snapshot/<repository>/<snapshot>/_status` returns detailed status information for a specific snapshot in the specified repository, regardless of whether it's currently running or not. 
 
 Using the API to return state for other than currently running snapshots can be very costly for (1) machine machine resources and (2) processing time if running in the cloud. For each snapshot, each request causes file reads from all a snapshot's shards. 
 {: .warning}
@@ -420,4 +420,4 @@ All property values are Integers.
 :--- | :--- | :--- |
 | shards_stats | Object | See [Shard stats](#shard-stats). |
 | stats | Object | See [Snapshot file stats](#snapshot-file-stats). |
-| shards | list of Objects | List of objects containing information about the shards that include the snapshot. Properies of the shards are listed below in bold text. <br /><br /> **stage**: Current state of shards in the snapshot. Shard states are: <br /><br /> * DONE: Number of shards in the snapshot that were successfully stored in the repository. <br /><br /> * FAILURE: Number of shards in the snapshot that were not successfully stored in the repository. <br /><br /> * FINALIZE: Number of shards in the snapshot that are in the finalizing stage of being stored in the repository. <br /><br />* INIT: Number of shards in the snapshot that are in the initializing stage of being stored in the repository.<br /><br />* STARTED:  Number of shards in the snapshot that are in the started stage of being stored in the repository.<br /><br /> **stats**: See [Snapshot file stats](#snapshot-file-stats). <br /><br /> **total**: Total number and size of files referenced by the snapshot. <br /><br /> **start_time_in_millis**: Time (in milliseconds) when snapshot creation began. <br /><br /> **time_in_millis**: Total time (in milliseconds) that the snapshot took to complete.  |
+| shards | list of Objects | List of objects containing information about the shards that include the snapshot. OpenSearch returns the following properties about the shards. <br /><br /> **stage**: Current state of shards in the snapshot. Shard states are: <br /><br /> * DONE: Number of shards in the snapshot that were successfully stored in the repository. <br /><br /> * FAILURE: Number of shards in the snapshot that were not successfully stored in the repository. <br /><br /> * FINALIZE: Number of shards in the snapshot that are in the finalizing stage of being stored in the repository. <br /><br />* INIT: Number of shards in the snapshot that are in the initializing stage of being stored in the repository.<br /><br />* STARTED:  Number of shards in the snapshot that are in the started stage of being stored in the repository.<br /><br /> **stats**: See [Snapshot file stats](#snapshot-file-stats). <br /><br /> **total**: Total number and size of files referenced by the snapshot. <br /><br /> **start_time_in_millis**: Time (in milliseconds) when snapshot creation began. <br /><br /> **time_in_millis**: Total time (in milliseconds) that the snapshot took to complete.  |
diff --git a/_automating-configurations/api/create-workflow.md b/_automating-configurations/api/create-workflow.md
@@ -7,9 +7,6 @@ nav_order: 10
 
 # Create or update a workflow
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).    
-{: .warning}
-
 Creating a workflow adds the content of a workflow template to the flow framework system index. You can provide workflows in JSON format (by specifying `Content-Type: application/json`) or YAML format (by specifying `Content-Type: application/yaml`). By default, the workflow is validated to help identify invalid configurations, including:
 
 * Workflow steps requiring an OpenSearch plugin that is not installed.
@@ -19,6 +16,8 @@ Creating a workflow adds the content of a workflow template to the flow framewor
 
 To obtain the validation template for workflow steps, call the [Get Workflow Steps API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-steps/).
 
+You can include placeholder expressions in the value of workflow step fields. For example, you can specify a credential field in a template as `openAI_key: '${{ openai_key }}'`. The expression will be substituted with the user-provided value during provisioning, using the format `${{ <value> }}`. You can pass the actual key as a parameter using the [Provision Workflow API]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/) or using this API with the `provision` parameter set to `true`.
+
 Once a workflow is created, provide its `workflow_id` to other APIs.
 
 The `POST` method creates a new workflow. The `PUT` method updates an existing workflow. 
@@ -59,12 +58,13 @@ POST /_plugins/_flow_framework/workflow?validation=none
 ```
 {% include copy-curl.html %}
 
-The following table lists the available query parameters. All query parameters are optional.
+The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`.
 
 | Parameter | Data type | Description |
 | :--- | :--- | :--- |
 | `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. |
 | `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. |
+| User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). |
 
 ## Request fields
 

diff --git a/_automating-configurations/api/delete-workflow.md b/_automating-configurations/api/delete-workflow.md
@@ -7,9 +7,6 @@ nav_order: 80
 
 # Delete a workflow
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).    
-{: .warning}
-
 When you no longer need a workflow template, you can delete it by calling the Delete Workflow API. 
 
 Note that deleting a workflow only deletes the stored template but does not deprovision its resources.  

diff --git a/_automating-configurations/api/deprovision-workflow.md b/_automating-configurations/api/deprovision-workflow.md
@@ -7,9 +7,6 @@ nav_order: 70
 
 # Deprovision a workflow
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).    
-{: .warning}
-
 When you no longer need a workflow, you can deprovision its resources. Most workflow steps that create a resource have corresponding workflow steps to reverse that action. To retrieve all resources currently created for a workflow, call the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). When you call the Deprovision Workflow API, resources included in the `resources_created` field of the Get Workflow Status API response will be removed using a workflow step corresponding to the one that provisioned them.
 
 The workflow executes the provisioning workflow steps in reverse order. If failures occur because of resource dependencies, such as preventing deletion of a registered model if it is still deployed, the workflow attempts retries.

diff --git a/_automating-configurations/api/get-workflow-status.md b/_automating-configurations/api/get-workflow-status.md
@@ -7,9 +7,6 @@ nav_order: 40
 
 # Get a workflow status
 
-This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).    
-{: .warning}
-
 [Provisioning a workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/) may take a significant amount of time, particularly when the action is associated with OpenSearch indexing operations. The Get Workflow State API permits monitoring of the provisioning deployment status until it is complete.
 
 ## Path and HTTP methods
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99
		* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh