Merge branch 'main' into agent_tools

opensearch-project · Jun 13, 2024 · 28e7127 · 28e7127
2 parents 6e5120d + 76b4f78
commit 28e7127
Show file tree

Hide file tree

Showing 4 changed files with 56 additions and 2 deletions.
diff --git a/_automating-configurations/workflow-templates.md b/_automating-configurations/workflow-templates.md
@@ -138,5 +138,8 @@ The following table lists the supported workflow templates. To use a workflow te
 | `multimodal_search_with_bedrock_titan`    | Deploys an Amazon Bedrock multimodal model and configures an ingest pipeline with a `text_image_embedding` processor and a k-NN index for [multimodal search]({{site.url}}{{site.baseurl}}/search-plugins/multimodal-search/). You must provide your AWS credentials. | `create_connector.credential.access_key`, `create_connector.credential.secret_key`, `create_connector.credential.session_token` |[Defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/multimodal-search-bedrock-titan-defaults.json)                  |
 | `hybrid_search`  | Configures [hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/): <br> - Creates an ingest pipeline, a k-NN index, and a search pipeline with a `normalization_processor`. You must provide the model ID of the text embedding model to be used. | `create_ingest_pipeline.model_id`        |[Defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/hybrid-search-defaults.json)                  |
 | `conversational_search_with_llm_deploy`  | Deploys a large language model (LLM) (by default, Cohere Chat) and configures a search pipeline with a `retrieval_augmented_generation` processor for [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/).  | `create_connector.credential.key`  |[Defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/conversational-search-defaults.json) |
+| `semantic_search_with_reindex`  | Configures [semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) with a newly deployed Cohere embedding model. The model is configured to reindex a source index into a newly configured k-NN index. You must provide the API key for the Cohere model along with the source index to be reindexed. | `create_connector.credential.key`, `reindex.source_index`|[Defaults](https://github.com/opensearch-project/flow-framework/blob/main/src/main/resources/defaults/semantic-search-with-reindex-defaults.json) |
+| `semantic_search_with_local_model`  | Configures [semantic search]({{site.url}}{{site.baseurl}}/search-plugins/semantic-search/) and deploys a pretrained model (`msmarco-distilbert-base-tas-b`). Adds a [`query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) search processor that sets a default model ID for neural queries and creates a linked k-NN index called `my-nlp-index`. You must provide the API key for the Cohere model.  | None  | [Defaults](https://github.com/opensearch-project/flow-framework/blob/main/src/main/resources/defaults/semantic-search-with-local-model-defaults.json) |
+| `hybrid_search_with_local_model`  | Configures [hybrid search]({{site.url}}{{site.baseurl}}/search-plugins/hybrid-search/) and deploys a pretrained model (`msmarco-distilbert-base-tas-b`). Creates an ingest pipeline, a k-NN index, and a search pipeline with a `normalization_processor`.  | None  | [Defaults](https://github.com/opensearch-project/flow-framework/blob/main/src/main/resources/defaults/hybrid-search-with-local-model-defaults.json) |
 
 
diff --git a/_data-prepper/pipelines/configuration/processors/obfuscate.md b/_data-prepper/pipelines/configuration/processors/obfuscate.md
@@ -67,6 +67,7 @@ Use the following configuration options with the `obfuscate` processor.
 | `source` | Yes | The source field to obfuscate. |
 | `target` | No | The new field in which to store the obfuscated value. This leaves the original source field unchanged. When no `target` is provided, the source field updates with the obfuscated value. |
 | `patterns` | No | A list of regex patterns that allow you to obfuscate specific parts of a field. Only parts that match the regex pattern will obfuscate. When not provided, the processor obfuscates the whole field. |
+| `single_word_only` | No | When set to `true`, a word boundary `\b` is added to the pattern, which causes obfuscation to be applied only to words that are standalone in the input text. By default, it is `false`, meaning obfuscation patterns are applied to all occurrences. Can be used for Data Prepper 2.8 or greater.
 | `obfuscate_when` | No | Specifies under what condition the Obfuscate processor should perform matching. Default is no condition. |
 | `tags_on_match_failure` | No | The tag to add to an event if the obfuscate processor fails to match the pattern. |
 | `action` | No | The obfuscation action. As of Data Prepper 2.3, only the `mask` action is supported. |

diff --git a/_data-prepper/pipelines/configuration/sinks/opensearch.md b/_data-prepper/pipelines/configuration/sinks/opensearch.md
@@ -65,7 +65,7 @@ Option | Required | Type | Description
 `connect_timeout` | No | Integer| The timeout value, in milliseconds, when requesting a connection from the connection manager. A timeout value of `0` is interpreted as an infinite timeout. If this timeout value is negative or not set, the underlying Apache HttpClient will rely on operating system settings to manage connection timeouts.
 `insecure` | No | Boolean  | Whether or not to verify SSL certificates. If set to `true`, then certificate authority (CA) certificate verification is disabled and insecure HTTP requests are sent instead. Default is `false`.
 `proxy` | No | String | The address of the [forward HTTP proxy server](https://en.wikipedia.org/wiki/Proxy_server). The format is `"&lt;hostname or IP&gt;:&lt;port&gt;"` (for example, `"example.com:8100"`, `"http://example.com:8100"`, `"112.112.112.112:8100"`). The port number cannot be omitted.
-`index` | Conditionally | String | The name of the export index. Only required when the `index_type` is `custom`. The index can be a plain string, such as `my-index-name`, contain [Java date-time patterns](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html), such as `my-index-${yyyy.MM.dd}` or `my-${yyyy-MM-dd-HH}-index`, be formatted using field values, such as `my-index-${/my_field}`, or use [Data Prepper expressions](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `my-index-${getMetadata(\"my_metadata_field\"}`. All formatting options can be combined to provide flexibility when creating static, dynamic, and rolling indexes. 
+`index` | Conditionally | String | The name of the export index. Only required when the `index_type` is `custom`. The index can be a plain string, such as `my-index-name`, contain [Java date-time patterns](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html), such as `my-index-%{yyyy.MM.dd}` or `my-%{yyyy-MM-dd-HH}-index`, be formatted using field values, such as `my-index-${/my_field}`, or use [Data Prepper expressions](https://opensearch.org/docs/latest/data-prepper/pipelines/expression-syntax/), such as `my-index-${getMetadata(\"my_metadata_field\"}`. All formatting options can be combined to provide flexibility when creating static, dynamic, and rolling indexes. 
 `index_type` | No | String | Tells the sink plugin what type of data it is handling. Valid values are `custom`, `trace-analytics-raw`, `trace-analytics-service-map`, or `management-disabled`. Default is `custom`.
 `template_type` | No | String | Defines what type of OpenSearch template to use. Available options are `v1` and `index-template`. The default value is `v1`, which uses the original OpenSearch templates available at the `_template` API endpoints. The `index-template` option uses composable [index templates]({{site.url}}{{site.baseurl}}/opensearch/index-templates/), which are available through the OpenSearch `_index_template` API. Composable index types offer more flexibility than the default and are necessary when an OpenSearch cluster contains existing index templates. Composable templates are available for all versions of OpenSearch and some later versions of Elasticsearch. When `distribution_version` is set to `es6`, Data Prepper enforces the `template_type` as `v1`.
 `template_file` | No | String | The path to a JSON [index template]({{site.url}}{{site.baseurl}}/opensearch/index-templates/) file, such as `/your/local/template-file.json`, when `index_type` is set to `custom`. For an example template file, see [otel-v1-apm-span-index-template.json](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/opensearch/src/main/resources/otel-v1-apm-span-index-template.json). If you supply a template file, then it must match the template format specified by the `template_type` parameter.

diff --git a/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md b/_tuning-your-cluster/availability-and-recovery/snapshots/snapshot-restore.md
@@ -207,7 +207,7 @@ You will most likely not need to specify any parameters except for `location`. F
 You will most likely not need to specify any parameters except for `bucket` and `base_path`. For allowed request parameters, see [Register or update snapshot repository API](https://opensearch.org/docs/latest/api-reference/snapshots/create-repository/).
 
 
-### Registering an Azure storage account
+### Registering a Microsoft Azure storage account using Helm 
 
 Use the following steps to register a snapshot repository backed by an Azure storage account for an OpenSearch cluster deployed using Helm.
 
@@ -296,6 +296,56 @@ Use the following steps to register a snapshot repository backed by an Azure sto
    }
    ```
 
+### Set up Microsoft Azure Blob Storage
+
+To use Azure Blob Storage as a snapshot repository, follow these steps:
+1. Install the `repository-azure` plugin on all nodes with the following command:
+
+   ```bash
+   ./bin/opensearch-plugin install repository-azure
+   ```
+
+1. After the `repository-azure` plugin is installed, define your Azure Blob Storage settings before initializing the node. Start by defining your Azure Storage account name using the following secure setting:
+
+   ```bash
+   ./bin/opensearch-keystore add azure.client.default.account
+   ```
+
+Choose one of the following options for setting up your Azure Blob Storage authentication credentials.
+
+#### Using an Azure Storage account key
+
+Use the following setting to specify your Azure Storage account key:
+
+```bash 
+./bin/opensearch-keystore add azure.client.default.key
+```
+
+#### Shared access signature
+
+Use the following setting when accessing Azure with a shared access signature (SAS):
+
+```bash
+./bin/opensearch-keystore add azure.client.default.sas_token      
+```
+
+#### Azure token credential 
+
+Starting in OpenSearch 2.15, you have the option to configure a token credential authentication flow in `opensearch.yml`. This method is distinct from connection string authentication, which requires a SAS or an account key.
+
+If you choose to use token credential authentication, you will need to choose a token credential type. Although Azure offers multiple token credential types, as of OpenSearch version 2.15, only [managed identity](https://learn.microsoft.com/en-us/entra/identity/managed-identities-azure-resources/overview) is supported.
+
+To use managed identity, add your token credential type to `opensearch.yml` using either the `managed` or `managed_identity` value. This indicates that managed identity is being used to perform token credential authentication:
+
+```yml
+azure.client.default.token_credential_type: "managed_identity"
+``` 
+
+Note the following when using Azure token credentials:
+
+- Token credential support is disabled in `opensearch.yml` by default.
+- A token credential takes precedence over an Azure Storage account key or a SAS when multiple options are configured. 
+
 ## Take snapshots
 
 You specify two pieces of information when you create a snapshot: