Skip to content

Commit

Permalink
Merge branch 'main' into add_knn_sqfp16
Browse files Browse the repository at this point in the history
Signed-off-by: kolchfa-aws <[email protected]>
  • Loading branch information
kolchfa-aws authored Mar 29, 2024
2 parents 9a6c4e2 + 2e41a57 commit 900478d
Show file tree
Hide file tree
Showing 130 changed files with 4,377 additions and 476 deletions.
2 changes: 1 addition & 1 deletion .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -1 +1 @@
* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99
* @hdhalter @kolchfa-aws @Naarcha-AWS @vagimeli @AMoo-Miki @natebower @dlvenable @scrawfor99 @epugh
3 changes: 3 additions & 0 deletions .github/vale/styles/Vocab/OpenSearch/Words/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Boolean
Dev
[Dd]iscoverability
Distro
[Dd]ownvote(s|d)?
[Dd]uplicative
[Ee]gress
[Ee]num
Expand Down Expand Up @@ -124,6 +125,7 @@ stdout
[Ss]ubvector
[Ss]ubwords?
[Ss]uperset
[Ss]yslog
tebibyte
[Tt]emplated
[Tt]okenization
Expand All @@ -140,6 +142,7 @@ tebibyte
[Uu]nregister(s|ed|ing)?
[Uu]pdatable
[Uu]psert
[Uu]pvote(s|d)?
[Ww]alkthrough
[Ww]ebpage
xy
3 changes: 2 additions & 1 deletion .github/workflows/vale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,5 @@ jobs:
reporter: github-pr-check
filter_mode: added
vale_flags: "--no-exit"
version: 2.28.0
version: 2.28.0
continue-on-error: true
5 changes: 3 additions & 2 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Overview

This document contains a list of maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) that explains what the role of maintainer means, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing, and becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md).
This document lists the maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) for information about the role of a maintainer, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing or becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md).

## Current Maintainers

Expand All @@ -9,8 +9,9 @@ This document contains a list of maintainers in this repo. See [opensearch-proje
| Heather Halter | [hdhalter](https://github.com/hdhalter) | Amazon |
| Fanit Kolchina | [kolchfa-aws](https://github.com/kolchfa-aws) | Amazon |
| Nate Archer | [Naarcha-AWS](https://github.com/Naarcha-AWS) | Amazon |
| Nate Bower | [natebower](https://github.com/natebower) | Amazon |
| Nathan Bower | [natebower](https://github.com/natebower) | Amazon |
| Melissa Vagi | [vagimeli](https://github.com/vagimeli) | Amazon |
| Miki Barahmand | [AMoo-Miki](https://github.com/AMoo-Miki) | Amazon |
| David Venable | [dlvenable](https://github.com/dlvenable) | Amazon |
| Stephen Crawford | [scraw99](https://github.com/scrawfor99) | Amazon |
| Eric Pugh | [epugh](https://github.com/epugh) | OpenSource Connections |
10 changes: 10 additions & 0 deletions TERMS.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,8 @@ Do not use *disable* to refer to users.

Always hyphenated. Don’t use _double click_.

**downvote**

**dropdown list**

**due to**
Expand Down Expand Up @@ -586,6 +588,10 @@ Use % in headlines, quotations, and tables or in technical copy.

An agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM).

**plaintext, plain text**

Use *plaintext* only to refer to nonencrypted or decrypted text in content about encryption. Use *plain text* to refer to ASCII files.

**please**

Avoid using except in quoted text.
Expand Down Expand Up @@ -700,6 +706,8 @@ Never hyphenated. Use _startup_ as a noun (for example, “The following startup

**Stochastic Gradient Descent (SGD)**

**syslog**

## T

**term frequency–inverse document frequency (TF–IDF)**
Expand Down Expand Up @@ -746,6 +754,8 @@ A storage tier that you can use to store and analyze your data with Elasticsearc

Hyphenate as adjectives. Use instead of *top left* and *top right*, unless the field name uses *top*. For example, "The upper-right corner."

**upvote**

**US**

No periods, as specified in the Chicago Manual of Style.
Expand Down
71 changes: 36 additions & 35 deletions _analyzers/token-filters/index.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion _api-reference/document-apis/reindex.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,11 @@ slice | Whether to manually or automatically slice the reindex operation so it e
_source | Whether to reindex source fields. Specify a list of fields to reindex or true to reindex all fields. Default is true.
id | The ID to associate with manual slicing.
max | Maximum number of slices.
dest | Information about the destination index. Valid values are `index`, `version_type`, and `op_type`.
dest | Information about the destination index. Valid values are `index`, `version_type`, `op_type`, and `pipeline`.
index | Name of the destination index.
version_type | The indexing operation's version type. Valid values are `internal`, `external`, `external_gt` (retrieve the document if the specified version number is greater than the document’s current version), and `external_gte` (retrieve the document if the specified version number is greater or equal to than the document’s current version).
op_type | Whether to copy over documents that are missing in the destination index. Valid values are `create` (ignore documents with the same ID from the source index) and `index` (copy everything from the source index).
pipeline | Which ingest pipeline to utilize during the reindex.
script | A script that OpenSearch uses to apply transformations to the data during the reindex operation.
source | The actual script that OpenSearch runs.
lang | The scripting language. Valid options are `painless`, `expression`, `mustache`, and `java`.
Expand Down
8 changes: 8 additions & 0 deletions _api-reference/index-apis/force-merge.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ The following table lists the available query parameters. All query parameters a
| `ignore_unavailable` | Boolean | If `true`, OpenSearch ignores missing or closed indexes. If `false`, OpenSearch returns an error if the force merge operation encounters missing or closed indexes. Default is `false`. |
| `max_num_segments` | Integer | The number of larger segments into which smaller segments are merged. Set this parameter to `1` to merge all segments into one segment. The default behavior is to perform the merge as necessary. |
| `only_expunge_deletes` | Boolean | If `true`, the merge operation only expunges segments containing a certain percentage of deleted documents. The percentage is 10% by default and is configurable in the `index.merge.policy.expunge_deletes_allowed` setting. Prior to OpenSearch 2.12, `only_expunge_deletes` ignored the `index.merge.policy.max_merged_segment` setting. Starting with OpenSearch 2.12, using `only_expunge_deletes` does not produce segments larger than `index.merge.policy.max_merged_segment` (by default, 5 GB). For more information, see [Deleted documents](#deleted-documents). Default is `false`. |
| `primary_only` | Boolean | If set to `true`, then the merge operation is performed only on the primary shards of an index. This can be useful when you want to take a snapshot of the index after the merge is complete. Snapshots only copy segments from the primary shards. Merging the primary shards can reduce resource consumption. Default is `false`. |

#### Example request: Force merge a specific index

Expand Down Expand Up @@ -101,6 +102,13 @@ POST /.testindex-logs/_forcemerge?max_num_segments=1
```
{% include copy-curl.html %}

#### Example request: Force merge primary shards

```json
POST /.testindex-logs/_forcemerge?primary_only=true
```
{% include copy-curl.html %}

#### Example response

```json
Expand Down
23 changes: 19 additions & 4 deletions _api-reference/nodes-apis/nodes-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -731,7 +731,10 @@ Select the arrow to view the example response.
"nxLWtMdXQmWA-ZBVWU8nwA": {
"timestamp": 1698401391000,
"cpu_utilization_percent": "0.1",
"memory_utilization_percent": "3.9"
"memory_utilization_percent": "3.9",
"io_usage_stats": {
"max_io_utilization_percent": "99.6"
}
}
},
"admission_control": {
Expand All @@ -742,6 +745,14 @@ Select the arrow to view the example response.
"indexing": 1
}
}
},
"global_io_usage": {
"transport": {
"rejection_count": {
"search": 3,
"indexing": 1
}
}
}
}
}
Expand Down Expand Up @@ -1252,16 +1263,20 @@ The `resource_usage_stats` object contains the resource usage statistics. Each e
Field | Field type | Description
:--- |:-----------| :---
timestamp | Integer | The last refresh time for the resource usage statistics, in milliseconds since the epoch.
cpu_utilization_percent | Float | Statistics for the average CPU usage of OpenSearch process within the time period configured in the `node.resource.tracker.global_cpu_usage.window_duration` setting.
cpu_utilization_percent | Float | Statistics for the average CPU usage of any OpenSearch processes within the time period configured in the `node.resource.tracker.global_cpu_usage.window_duration` setting.
memory_utilization_percent | Float | The node JVM memory usage statistics within the time period configured in the `node.resource.tracker.global_jvmmp.window_duration` setting.
max_io_utilization_percent | Float | (Linux only) Statistics for the average IO usage of any OpenSearch processes within the time period configured in the `node.resource.tracker.global_io_usage.window_duration` setting.

### `admission_control`

The `admission_control` object contains the rejection count of search and indexing requests based on resource consumption and has the following properties.

Field | Field type | Description
:--- | :--- | :---
admission_control.global_cpu_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node CPU usage limit was breached. In this case, additional search requests are rejected until the system recovers.
admission_control.global_cpu_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node CPU usage limit was breached. In this case, additional indexing requests are rejected until the system recovers.
admission_control.global_cpu_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node CPU usage limit was met. In this case, additional search requests are rejected until the system recovers. The CPU usage limit is configured in the `admission_control.search.cpu_usage.limit` setting.
admission_control.global_cpu_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node CPU usage limit was met. Any additional indexing requests are rejected until the system recovers. The CPU usage limit is configured in the `admission_control.indexing.cpu_usage.limit` setting.
admission_control.global_io_usage.transport.rejection_count.search | Integer | The total number of search rejections in the transport layer when the node IO usage limit was met. Any additional search requests are rejected until the system recovers. The CPU usage limit is configured in the `admission_control.search.io_usage.limit` setting (Linux only).
admission_control.global_io_usage.transport.rejection_count.indexing | Integer | The total number of indexing rejections in the transport layer when the node IO usage limit was met. Any additional indexing requests are rejected until the system recovers. The IO usage limit is configured in the `admission_control.indexing.io_usage.limit` setting (Linux only).

## Required permissions

Expand Down
6 changes: 3 additions & 3 deletions _api-reference/snapshots/get-snapshot-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ Three request variants provide flexibility:

* `GET _snapshot/_status` returns the status of all currently running snapshots in all repositories.

* `GET _snapshot/<repository>/_status` returns the status of only currently running snapshots in the specified repository. This is the preferred variant.
* `GET _snapshot/<repository>/_status` returns all currently running snapshots in the specified repository. This is the preferred variant.

* `GET _snapshot/<repository>/<snapshot>/_status` returns the status of all snapshots in the specified repository whether they are running or not.
* `GET _snapshot/<repository>/<snapshot>/_status` returns detailed status information for a specific snapshot in the specified repository, regardless of whether it's currently running or not.

Using the API to return state for other than currently running snapshots can be very costly for (1) machine machine resources and (2) processing time if running in the cloud. For each snapshot, each request causes file reads from all a snapshot's shards.
{: .warning}
Expand Down Expand Up @@ -420,4 +420,4 @@ All property values are Integers.
:--- | :--- | :--- |
| shards_stats | Object | See [Shard stats](#shard-stats). |
| stats | Object | See [Snapshot file stats](#snapshot-file-stats). |
| shards | list of Objects | List of objects containing information about the shards that include the snapshot. Properies of the shards are listed below in bold text. <br /><br /> **stage**: Current state of shards in the snapshot. Shard states are: <br /><br /> * DONE: Number of shards in the snapshot that were successfully stored in the repository. <br /><br /> * FAILURE: Number of shards in the snapshot that were not successfully stored in the repository. <br /><br /> * FINALIZE: Number of shards in the snapshot that are in the finalizing stage of being stored in the repository. <br /><br />* INIT: Number of shards in the snapshot that are in the initializing stage of being stored in the repository.<br /><br />* STARTED: Number of shards in the snapshot that are in the started stage of being stored in the repository.<br /><br /> **stats**: See [Snapshot file stats](#snapshot-file-stats). <br /><br /> **total**: Total number and size of files referenced by the snapshot. <br /><br /> **start_time_in_millis**: Time (in milliseconds) when snapshot creation began. <br /><br /> **time_in_millis**: Total time (in milliseconds) that the snapshot took to complete. |
| shards | list of Objects | List of objects containing information about the shards that include the snapshot. OpenSearch returns the following properties about the shards. <br /><br /> **stage**: Current state of shards in the snapshot. Shard states are: <br /><br /> * DONE: Number of shards in the snapshot that were successfully stored in the repository. <br /><br /> * FAILURE: Number of shards in the snapshot that were not successfully stored in the repository. <br /><br /> * FINALIZE: Number of shards in the snapshot that are in the finalizing stage of being stored in the repository. <br /><br />* INIT: Number of shards in the snapshot that are in the initializing stage of being stored in the repository.<br /><br />* STARTED: Number of shards in the snapshot that are in the started stage of being stored in the repository.<br /><br /> **stats**: See [Snapshot file stats](#snapshot-file-stats). <br /><br /> **total**: Total number and size of files referenced by the snapshot. <br /><br /> **start_time_in_millis**: Time (in milliseconds) when snapshot creation began. <br /><br /> **time_in_millis**: Total time (in milliseconds) that the snapshot took to complete. |
8 changes: 4 additions & 4 deletions _automating-configurations/api/create-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ nav_order: 10

# Create or update a workflow

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).
{: .warning}

Creating a workflow adds the content of a workflow template to the flow framework system index. You can provide workflows in JSON format (by specifying `Content-Type: application/json`) or YAML format (by specifying `Content-Type: application/yaml`). By default, the workflow is validated to help identify invalid configurations, including:

* Workflow steps requiring an OpenSearch plugin that is not installed.
Expand All @@ -19,6 +16,8 @@ Creating a workflow adds the content of a workflow template to the flow framewor

To obtain the validation template for workflow steps, call the [Get Workflow Steps API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-steps/).

You can include placeholder expressions in the value of workflow step fields. For example, you can specify a credential field in a template as `openAI_key: '${{ openai_key }}'`. The expression will be substituted with the user-provided value during provisioning, using the format `${{ <value> }}`. You can pass the actual key as a parameter using the [Provision Workflow API]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/) or using this API with the `provision` parameter set to `true`.

Once a workflow is created, provide its `workflow_id` to other APIs.

The `POST` method creates a new workflow. The `PUT` method updates an existing workflow.
Expand Down Expand Up @@ -59,12 +58,13 @@ POST /_plugins/_flow_framework/workflow?validation=none
```
{% include copy-curl.html %}

The following table lists the available query parameters. All query parameters are optional.
The following table lists the available query parameters. All query parameters are optional. User-provided parameters are only allowed if the `provision` parameter is set to `true`.

| Parameter | Data type | Description |
| :--- | :--- | :--- |
| `provision` | Boolean | Whether to provision the workflow as part of the request. Default is `false`. |
| `validation` | String | Whether to validate the workflow. Valid values are `all` (validate the template) and `none` (do not validate the template). Default is `all`. |
| User-provided substitution expressions | String | Parameters matching substitution expressions in the template. Only allowed if `provision` is set to `true`. Optional. If `provision` is set to `false`, you can pass these parameters in the [Provision Workflow API query parameters]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/#query-parameters). |

## Request fields

Expand Down
3 changes: 0 additions & 3 deletions _automating-configurations/api/delete-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ nav_order: 80

# Delete a workflow

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).
{: .warning}

When you no longer need a workflow template, you can delete it by calling the Delete Workflow API.

Note that deleting a workflow only deletes the stored template but does not deprovision its resources.
Expand Down
3 changes: 0 additions & 3 deletions _automating-configurations/api/deprovision-workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ nav_order: 70

# Deprovision a workflow

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).
{: .warning}

When you no longer need a workflow, you can deprovision its resources. Most workflow steps that create a resource have corresponding workflow steps to reverse that action. To retrieve all resources currently created for a workflow, call the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). When you call the Deprovision Workflow API, resources included in the `resources_created` field of the Get Workflow Status API response will be removed using a workflow step corresponding to the one that provisioned them.

The workflow executes the provisioning workflow steps in reverse order. If failures occur because of resource dependencies, such as preventing deletion of a registered model if it is still deployed, the workflow attempts retries.
Expand Down
3 changes: 0 additions & 3 deletions _automating-configurations/api/get-workflow-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,6 @@ nav_order: 40

# Get a workflow status

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/flow-framework/issues/475).
{: .warning}

[Provisioning a workflow]({{site.url}}{{site.baseurl}}/automating-configurations/api/provision-workflow/) may take a significant amount of time, particularly when the action is associated with OpenSearch indexing operations. The Get Workflow State API permits monitoring of the provisioning deployment status until it is complete.

## Path and HTTP methods
Expand Down
Loading

0 comments on commit 900478d

Please sign in to comment.