diff --git a/.gitignore b/.gitignore index 446d1deda6..da3cf9d144 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ Gemfile.lock .idea *.iml .jekyll-cache +.project diff --git a/_api-reference/index-apis/rollover.md b/_api-reference/index-apis/rollover.md new file mode 100644 index 0000000000..722dfe196c --- /dev/null +++ b/_api-reference/index-apis/rollover.md @@ -0,0 +1,195 @@ +--- +layout: default +title: Rollover Index +parent: Index APIs +nav_order: 63 +--- + +# Rollover Index +Introduced 1.0 +{: .label .label-purple } + +The Rollover Index API creates a new index for a data stream or index alias based on the `wait_for_active_shards` setting. + +## Path and HTTP methods + +```json +POST //_rollover/ +POST //_rollover/ +``` + +## Rollover types + +You can roll over a data stream, an index alias with one index, or an index alias with a write index. + +### Data stream + +When you perform a rollover operation on a data stream, the API generates a fresh write index for that stream. Simultaneously, the stream's preceding write index transforms into a regular backing index. Additionally, the rollover process increments the generation count of the data stream. Data stream rollovers do not support specifying index settings in the request body. + +### Index alias with one index + +When initiating a rollover on an index alias associated with a single index, the API generates a new index and disassociates the original index from the alias. + +### Index alias with a write index + +When an index alias references multiple indexes, one index must be designated as the write index. During a rollover, the API creates a new write index with its `is_write_index` property set to `true` while updating the previous write index by setting its `is_write_index property` to `false.` + +## Incrementing index names for an alias + +During the index alias rollover process, if you don't specify a custom name and the current index's name ends with a hyphen followed by a number (for example, `my-index-000001` or `my-index-3`), then the rollover operation will automatically increment that number for the new index's name. For instance, rolling over `my-index-000001` will generate `my-index-000002`. The numeric portion is always padded with leading zeros to ensure a consistent length of six characters. + +## Using date math with index rollovers + +When using an index alias for time-series data, you can leverage [date math](https://opensearch.org/docs/latest/field-types/supported-field-types/date/) in the index name to track the rollover date. For example, you can create an alias pointing to `my-index-{now/d}-000001`. If you create an alias on June 11, 2029, then the index name would be `my-index-2029.06.11-000001`. For a rollover on June 12, 2029, the new index would be named `my-index-2029.06.12-000002`. See [Roll over an index alias with a write index](#rolling-over-an-index-alias-with-a-write-index) for a practical example. + +## Path parameters + +The Rollover Index API supports the parameters listed in the following table. + +Parameter | Type | Description +:--- | :--- | :--- +`` | String | The name of the data stream or index alias to roll over. Required. | +`` | String | The name of the index to create. Supports date math. Data streams do not support this parameter. If the name of the alias's current write index does not end with `-` and a number, such as `my-index-000001` or `my-index-2`, then the parameter is required. + +## Query parameters + +The following table lists the supported query parameters. + +Parameter | Type | Description +:--- | :--- | :--- +`cluster_manager_timeout` | Time | The amount of time to wait for a connection to the cluster manager node. Default is `30s`. +`timeout` | Time | The amount of time to wait for a response. Default is `30s`. +`wait_for_active_shards` | String | The number of active shards that must be available before OpenSearch processes the request. Default is `1` (only the primary shard). You can also set to `all` or a positive integer. Values greater than `1` require replicas. For example, if you specify a value of `3`, then the index must have two replicas distributed across two additional nodes in order for the operation to succeed. + +## Request body + +The following request body parameters are supported. + +### `alias` + +The `alias` parameter specifies the alias name as the key. It is required when the `template` option exists in the request body. The object body contains the following optional parameters. + + +Parameter | Type | Description +:--- | :--- | :--- +`filter` | Query DSL object | The query that limits the number of documents that the alias can access. +`index_routing` | String | The value that routes indexing operations to a specific shard. When specified, overwrites the `routing` value for indexing operations. +`is_hidden` | Boolean | Hides or unhides the alias. When `true`, the alias is hidden. Default is `false`. Indexes for the alias must have matching values for this setting. +`is_write_index` | Boolean | Specifies the write index. When `true`, the index is the write index for the alias. Default is `false`. +`routing` | String | The value used to route index and search operations to a specific shard. +`search_routing` | String | Routes search operations to a specific shard. When specified, it overwrites `routing` for search operations. + +### `mappings` + +The `mappings` parameter specifies the index field mappings. It is optional. See [Mappings and field types](https://opensearch.org/docs/latest/field-types/) for more information. + +### `conditions` + +The `conditions` parameter is an optional object defining criteria for triggering the rollover. When provided, OpenSearch only rolls over if the current index satisfies one or more specified conditions. If omitted, then the rollover occurs unconditionally without prerequisites. + +The object body supports the following parameters. + +Parameter | Type | Description +:--- | :--- | :--- +| `max_age` | Time units | Triggers a rollover after the maximum elapsed time since index creation is reached. The elapsed time is always calculated since the index creation time, even if the index origination date is configured to a custom date, such as when using the `index.lifecycle.parse_origination_date` or `index.lifecycle.origination_date` settings. Optional. | +`max_docs` | Integer | Triggers a rollover after the specified maximum number of documents, excluding documents added since the last refresh and documents in replica shards. Optional. +`max_size` | Byte units | Triggers a rollover when the index reaches a specified size, calculated as the total size of all primary shards. Replicas are not counted. Use the `_cat indices` API and check the `pri.store.size` value to see the current index size. Optional. +`max_primary_shard_size` | Byte units | Triggers a rollover when the largest primary shard in the index reaches a certain size. This is the maximum size of the primary shards in the index. As with `max_size`, replicas are ignored. To see the current shard size, use the `_cat shards` API. The `store` value shows the size of each shard, and `prirep` indicates whether a shard is a primary (`p`) or a replica (`r`). Optional. + +### `settings` + +The `settings` parameter specifies the index configuration options. See [Index settings](https://opensearch.org/docs/latest/install-and-configure/configuring-opensearch/index-settings/) for more information. + +## Example requests + +The following examples illustrate using the Rollover Index API. A rollover occurs when one or more of the specified conditions are met: + +- The index was created 5 or more days ago. +- The index contains 500 or more documents. +- The index's largest primary shard is 100 GB or larger. + +### Rolling over a data stream + +The following request rolls over the data stream if the current write index meets any of the specified conditions: + +```json +POST my-data-stream/_rollover +{ + "conditions": { + "max_age": "5d", + "max_docs": 500, + "max_primary_shard_size": "100gb" + } +} +``` +{% include copy-curl.html %} + +### Rolling over an index alias with a write index + +The following request creates a date-time index and sets it as the write index for `my-alias`: + +```json +PUT +PUT %3Cmy-index-%7Bnow%2Fd%7D-000001%3E +{ + "aliases": { + "my-alias": { + "is_write_index": true + } + } +} +``` +{% include copy-curl.html %} + +The next request performs a rollover using the alias: + +```json +POST my-alias/_rollover +{ + "conditions": { + "max_age": "5d", + "max_docs": 500, + "max_primary_shard_size": "100gb" + } +} +``` +{% include copy-curl.html %} + +### Specifying settings during a rollover + +In most cases, you can use an index template to automatically configure the indexes created during a rollover operation. However, when rolling over an index alias, you can use the Rollover Index API to introduce additional index settings or override the settings defined in the template by sending the following request: + +```json +POST my-alias/_rollover +{ + "settings": { + "index.number_of_shards": 4 + } +} +``` +{% include copy-curl.html %} + + +## Example response + +OpenSearch returns the following response confirming that all conditions except `max_primary_shard_size` were met: + +```json +{ + "acknowledged": true, + "shards_acknowledged": true, + "old_index": ".ds-my-data-stream-2029.06.11-000001", + "new_index": ".ds-my-data-stream-2029.06.12-000002", + "rolled_over": true, + "dry_run": false, + "conditions": { + "[max_age: 5d]": true, + "[max_docs: 500]": true, + "[max_primary_shard_size: 100gb]": false + } +} +``` + + + + diff --git a/_api-reference/render-template.md b/_api-reference/render-template.md index 16bada0290..409fde5e4a 100644 --- a/_api-reference/render-template.md +++ b/_api-reference/render-template.md @@ -44,7 +44,7 @@ Both of the following request examples use the search template with the template "source": { "query": { "match": { - "play_name": "{{play_name}}" + "play_name": "{% raw %}{{play_name}}{% endraw %}" } } }, @@ -76,11 +76,11 @@ If you don't want to use a saved template, or want to test a template before sav ``` { "source": { - "from": "{{from}}{{^from}}10{{/from}}", - "size": "{{size}}{{^size}}10{{/size}}", + "from": "{% raw %}{{from}}{{^from}}0{{/from}}{% endraw %}", + "size": "{% raw %}{{size}}{{^size}}10{{/size}}{% endraw %}", "query": { "match": { - "play_name": "{{play_name}}" + "play_name": "{% raw %}{{play_name}}{% endraw %}" } } }, diff --git a/_automating-configurations/api/deprovision-workflow.md b/_automating-configurations/api/deprovision-workflow.md index e9219536ce..98c944a9d4 100644 --- a/_automating-configurations/api/deprovision-workflow.md +++ b/_automating-configurations/api/deprovision-workflow.md @@ -9,7 +9,9 @@ nav_order: 70 When you no longer need a workflow, you can deprovision its resources. Most workflow steps that create a resource have corresponding workflow steps to reverse that action. To retrieve all resources currently created for a workflow, call the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). When you call the Deprovision Workflow API, resources included in the `resources_created` field of the Get Workflow Status API response will be removed using a workflow step corresponding to the one that provisioned them. -The workflow executes the provisioning workflow steps in reverse order. If failures occur because of resource dependencies, such as preventing deletion of a registered model if it is still deployed, the workflow attempts retries. +The workflow executes the provisioning steps in reverse order. If a failure occurs because of a resource dependency, such as trying to delete a registered model that is still deployed, then the workflow retries the failing step as long as at least one resource was deleted. + +To prevent data loss, resources created using the `create_index`, `create_search_pipeline`, and `create_ingest_pipeline` steps require the resource ID to be included in the `allow_delete` parameter. ## Path and HTTP methods @@ -24,6 +26,7 @@ The following table lists the available path parameters. | Parameter | Data type | Description | | :--- | :--- | :--- | | `workflow_id` | String | The ID of the workflow to be deprovisioned. Required. | +| `allow-delete` | String | A comma-separated list of resource IDs to be deprovisioned. Required if deleting resources of type `index_name` or `pipeline_id`. | ### Example request @@ -53,6 +56,14 @@ If deprovisioning did not completely remove all resources, OpenSearch responds w In some cases, the failure happens because of another dependent resource that took some time to be removed. In this case, you can attempt to send the same request again. {: .tip} +If deprovisioning required the `allow_delete` parameter, then OpenSearch responds with a `403 (FORBIDDEN)` status and identifies the resources that were not deprovisioned: + +```json +{ + "error": "These resources require the allow_delete parameter to deprovision: [index_name my-index]." +} +``` + To obtain a more detailed deprovisioning status than is provided by the summary in the error response, query the [Get Workflow Status API]({{site.url}}{{site.baseurl}}/automating-configurations/api/get-workflow-status/). On success, the workflow returns to a `NOT_STARTED` state. If some resources have not yet been removed, they are provided in the response. \ No newline at end of file diff --git a/_data-prepper/pipelines/configuration/processors/convert_entry_type.md b/_data-prepper/pipelines/configuration/processors/convert_entry_type.md index 2fc9fdb9bd..c2c46260ed 100644 --- a/_data-prepper/pipelines/configuration/processors/convert_entry_type.md +++ b/_data-prepper/pipelines/configuration/processors/convert_entry_type.md @@ -14,10 +14,22 @@ The `convert_entry_type` processor converts a value type associated with the spe You can configure the `convert_entry_type` processor with the following options. + + | Option | Required | Description | | :--- | :--- | :--- | -| `key`| Yes | Keys whose value needs to be converted to a different type. | -| `type` | No | Target type for the key-value pair. Possible values are `integer`, `double`, `string`, and `Boolean`. Default value is `integer`. | +| `key`| Yes | Key whose value needs to be converted to a different type. | +| `keys`| Yes | Keys whose value needs to be converted to a different type. | +| `type` | No | Target type for the key-value pair. Possible values are `integer`, `long`, `double`, `big_decimal`, `string`, and `boolean`. Default value is `integer`. | +| `null_values` | No | String representation of what constitutes a `null` value. If the field value equals one of these strings, then the value is considered `null` and is converted to `null`. | +| `scale` | No | Modifies the scale of the `big_decimal` when converting to a `big_decimal`. The default value is `0`. | +| `tags_on_failure` | No | A list of tags to be added to the event metadata when the event fails to convert. | +| `convert_when` | No | Specifies a condition using a [Data Prepper expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/) for performing the `convert_entry_type` operation. If specified, the `convert_entry_type` operation runs only when the expression evaluates to `true`. | ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/delay.md b/_data-prepper/pipelines/configuration/processors/delay.md new file mode 100644 index 0000000000..c4e9d8e973 --- /dev/null +++ b/_data-prepper/pipelines/configuration/processors/delay.md @@ -0,0 +1,27 @@ +--- +layout: default +title: delay +parent: Processors +grand_parent: Pipelines +nav_order: 41 +--- + +# delay + +This processor will add a delay into the processor chain. Typically, you should use this only for testing, experimenting, and debugging. + +## Configuration + +Option | Required | Type | Description +:--- | :--- | :--- | :--- +`for` | No | Duration | The duration of time to delay. Defaults to `1s`. + +## Usage + +The following example shows using the `delay` processor to delay for 2 seconds. + +```yaml +processor: + - delay: + for: 2s +``` diff --git a/_data-prepper/pipelines/configuration/processors/delete-entries.md b/_data-prepper/pipelines/configuration/processors/delete_entries.md similarity index 99% rename from _data-prepper/pipelines/configuration/processors/delete-entries.md rename to _data-prepper/pipelines/configuration/processors/delete_entries.md index 33c54a0b29..c9a93a1f3e 100644 --- a/_data-prepper/pipelines/configuration/processors/delete-entries.md +++ b/_data-prepper/pipelines/configuration/processors/delete_entries.md @@ -3,7 +3,7 @@ layout: default title: delete_entries parent: Processors grand_parent: Pipelines -nav_order: 41 +nav_order: 43 --- # delete_entries diff --git a/_data-prepper/pipelines/configuration/processors/mutate-event.md b/_data-prepper/pipelines/configuration/processors/mutate-event.md index 9b3b2afb33..1afb34a970 100644 --- a/_data-prepper/pipelines/configuration/processors/mutate-event.md +++ b/_data-prepper/pipelines/configuration/processors/mutate-event.md @@ -13,7 +13,7 @@ Mutate event processors allow you to modify events in Data Prepper. The followin * [add_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/add-entries/) allows you to add entries to an event. * [convert_entry_type]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/convert_entry_type/) allows you to convert value types in an event. * [copy_values]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/copy-values/) allows you to copy values within an event. -* [delete_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/delete-entries/) allows you to delete entries from an event. +* [delete_entries]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/delete_entries/) allows you to delete entries from an event. * [list_to_map]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/list-to-map) allows you to convert list of objects from an event where each object contains a `key` field into a map of target keys. * `map_to_list` allows you to convert a map of objects from an event, where each object contains a `key` field, into a list of target keys. * [rename_keys]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/rename-keys/) allows you to rename keys in an event. diff --git a/_data-prepper/pipelines/configuration/processors/parse-ion.md b/_data-prepper/pipelines/configuration/processors/parse_ion.md similarity index 61% rename from _data-prepper/pipelines/configuration/processors/parse-ion.md rename to _data-prepper/pipelines/configuration/processors/parse_ion.md index 0edd446c42..8360eaa296 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-ion.md +++ b/_data-prepper/pipelines/configuration/processors/parse_ion.md @@ -14,12 +14,22 @@ The `parse_ion` processor parses [Amazon Ion](https://amazon-ion.github.io/ion-d You can configure the `parse_ion` processor with the following options. + + | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | The field in the `event` that is parsed. Default value is `message`. | | `destination` | No | String | The destination field of the parsed JSON. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only `string` because these are not valid `event` fields. | | `pointer` | No | String | A JSON pointer to the field to be parsed. There is no `pointer` by default, meaning that the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid, then the entire `source` data is parsed into the outgoing `event`. If the key that is pointed to already exists in the `event` and the `destination` is the root, then the pointer uses the entire path of the key. | -| `tags_on_failure` | No | String | A list of strings that specify the tags to be set in the event that the processors fails or an unknown exception occurs while parsing. +| `parse_when` | No | String | Specifies under which conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults is `true`. | +| `delete_source` | No | Boolean | If set to `true`, then the source field is deleted. Defaults is `false`. | +| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/parse-json.md b/_data-prepper/pipelines/configuration/processors/parse_json.md similarity index 70% rename from _data-prepper/pipelines/configuration/processors/parse-json.md rename to _data-prepper/pipelines/configuration/processors/parse_json.md index 2cbce4782e..894d5dba42 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-json.md +++ b/_data-prepper/pipelines/configuration/processors/parse_json.md @@ -15,11 +15,22 @@ The `parse_json` processor parses JSON data for an event, including any nested f You can configure the `parse_json` processor with the following options. + + | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | The field in the `event` that will be parsed. Default value is `message`. | | `destination` | No | String | The destination field of the parsed JSON. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only `string` because these are not valid `event` fields. | | `pointer` | No | String | A JSON pointer to the field to be parsed. There is no `pointer` by default, meaning the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid then the entire `source` data is parsed into the outgoing `event`. If the key that is pointed to already exists in the `event` and the `destination` is the root, then the pointer uses the entire path of the key. | +| `parse_when` | No | String | Specifies under which conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | +| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults to `true`. | +| `delete_source` | No | Boolean | If set to `true` then this will delete the source field. Defaults to `false`. | +| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/parse-xml.md b/_data-prepper/pipelines/configuration/processors/parse_xml.md similarity index 70% rename from _data-prepper/pipelines/configuration/processors/parse-xml.md rename to _data-prepper/pipelines/configuration/processors/parse_xml.md index 861705da2b..c8c9f3eebf 100644 --- a/_data-prepper/pipelines/configuration/processors/parse-xml.md +++ b/_data-prepper/pipelines/configuration/processors/parse_xml.md @@ -14,13 +14,22 @@ The `parse_xml` processor parses XML data for an event. You can configure the `parse_xml` processor with the following options. + + | Option | Required | Type | Description | | :--- | :--- | :--- | :--- | | `source` | No | String | Specifies which `event` field to parse. | | `destination` | No | String | The destination field of the parsed XML. Defaults to the root of the `event`. Cannot be `""`, `/`, or any white-space-only string because these are not valid `event` fields. | | `pointer` | No | String | A JSON pointer to the field to be parsed. The value is null by default, meaning that the entire `source` is parsed. The `pointer` can access JSON array indexes as well. If the JSON pointer is invalid, then the entire `source` data is parsed into the outgoing `event` object. If the key that is pointed to already exists in the `event` object and the `destination` is the root, then the pointer uses the entire path of the key. | | `parse_when` | No | String | Specifies under what conditions the processor should perform parsing. Default is no condition. Accepts a Data Prepper expression string following the [Data Prepper Expression Syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). | -| `tags_on_failure` | No | String | A list of strings that specify the tags to be set if the processor fails or an unknown exception occurs while parsing. +| `overwrite_if_destination_exists` | No | Boolean | Overwrites the destination if set to `true`. Set to `false` to prevent changing a destination value that exists. Defaults to `true`. | +| `delete_source` | No | Boolean | If set to `true` then this will delete the source field. Defaults to `false`. | +| `tags_on_failure` | No | String | A list of strings specifying the tags to be set in the event that the processor fails or an unknown exception occurs during parsing. ## Usage diff --git a/_data-prepper/pipelines/configuration/processors/write_json.md b/_data-prepper/pipelines/configuration/processors/write_json.md index 8f1e6851da..20414b4672 100644 --- a/_data-prepper/pipelines/configuration/processors/write_json.md +++ b/_data-prepper/pipelines/configuration/processors/write_json.md @@ -11,8 +11,15 @@ nav_order: 56 The `write_json` processor converts an object in an event into a JSON string. You can customize the processor to choose the source and target field names. -Option | Description | Example -:--- | :--- | :--- + + +Option | Description | Example +:--- | :--- | :--- source | Mandatory field that specifies the name of the field in the event containing the message or object to be parsed. | If `source` is set to `"message"` and the input is `{"message": {"key1":"value1", "key2":{"key3":"value3"}}}`, then the `write_json` processor outputs the event as `"{\"key1\":\"value1\",\"key2\":{\"key3\":\"value3\"}}"`. target | An optional field that specifies the name of the field in which the resulting JSON string should be stored. If `target` is not specified, then the `source` field is used. | `key1` diff --git a/_data-prepper/pipelines/configuration/sources/kafka.md b/_data-prepper/pipelines/configuration/sources/kafka.md index 4df72cfdd6..e8452a93c3 100644 --- a/_data-prepper/pipelines/configuration/sources/kafka.md +++ b/_data-prepper/pipelines/configuration/sources/kafka.md @@ -120,7 +120,7 @@ Use the following options when setting SSL encryption. Option | Required | Type | Description :--- | :--- | :--- | :--- `type` | No | String | The encryption type. Use `none` to disable encryption. Default is `ssl`. -`Insecure` | No | Boolean | A Boolean flag used to turn off SSL certificate verification. If set to `true`, certificate authority (CA) certificate verification is turned off and insecure HTTP requests are sent. Default is `false`. +`insecure` | No | Boolean | A Boolean flag used to turn off SSL certificate verification. If set to `true`, certificate authority (CA) certificate verification is turned off and insecure HTTP requests are sent. Default is `false`. #### AWS diff --git a/_data-prepper/pipelines/configuration/sources/s3.md b/_data-prepper/pipelines/configuration/sources/s3.md index 5a7d9986e5..7b1599f838 100644 --- a/_data-prepper/pipelines/configuration/sources/s3.md +++ b/_data-prepper/pipelines/configuration/sources/s3.md @@ -138,7 +138,7 @@ The `codec` determines how the `s3` source parses each Amazon S3 object. For inc ### `newline` codec -The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse-json/) processor to parse each line. +The `newline` codec parses each single line as a single log event. This is ideal for most application logs because each event parses per single line. It can also be suitable for S3 objects that have individual JSON objects on each line, which matches well when used with the [parse_json]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/parse_json/) processor to parse each line. Use the following options to configure the `newline` codec. diff --git a/_field-types/supported-field-types/geo-shape.md b/_field-types/supported-field-types/geo-shape.md index cbf63551df..b7b06a0d04 100644 --- a/_field-types/supported-field-types/geo-shape.md +++ b/_field-types/supported-field-types/geo-shape.md @@ -68,7 +68,7 @@ PUT testindex/_doc/1 { "location" : { "type" : "point", - "coordinates" : [74.00, 40.71] + "coordinates" : [74.0060, 40.7128] } } ``` @@ -126,10 +126,12 @@ PUT testindex/_doc/3 "location" : { "type" : "polygon", "coordinates" : [ - [[74.0060, 40.7128], - [71.0589, 42.3601], - [73.7562, 42.6526], - [74.0060, 40.7128]] + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ] ] } } @@ -159,15 +161,18 @@ PUT testindex/_doc/4 "location" : { "type" : "polygon", "coordinates" : [ - [[74.0060, 40.7128], - [71.0589, 42.3601], - [73.7562, 42.6526], - [74.0060, 40.7128]], - - [[72.6734,41.7658], - [72.6506, 41.5623], - [73.0515, 41.5582], - [72.6734, 41.7658]] + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ], + [ + [72.6734,41.7658], + [73.0515, 41.5582], + [72.6506, 41.5623], + [72.6734, 41.7658] + ] ] } } @@ -179,12 +184,12 @@ Index a polygon (triangle) with a triangular hole in WKT format: ```json PUT testindex/_doc/4 { - "location" : "POLYGON ((40.7128 74.0060, 42.3601 71.0589, 42.6526 73.7562, 40.7128 74.0060), (41.7658 72.6734, 41.5623 72.6506, 41.5582 73.0515, 41.7658 72.6734))" + "location" : "POLYGON ((74.0060 40.7128, 71.0589 42.3601, 73.7562 42.6526, 74.0060 40.7128), (72.6734 41.7658, 72.6506 41.5623, 73.0515 41.5582, 72.6734 41.7658))" } ``` {% include copy-curl.html %} -In OpenSearch, you can specify a polygon by listing its vertices clockwise or counterclockwise. This works well for polygons that do not cross the date line (are narrower than 180°). However, a polygon that crosses the date line (is wider than 180°) might be ambiguous because WKT does not impose a specific order on vertices. Thus, you must specify polygons that cross the date line by listing their vertices counterclockwise. +You can specify a polygon in OpenSearch by listing its vertices in clockwise or counterclockwise order. This works well for polygons that do not cross the date line (that are narrower than 180°). However, a polygon that crosses the date line (is wider than 180°) might be ambiguous because WKT does not impose a specific order on vertices. Thus, you must specify polygons that cross the date line by listing their vertices in counterclockwise order. You can define an [`orientation`](#parameters) parameter to specify the vertex traversal order at mapping time: @@ -295,23 +300,28 @@ PUT testindex/_doc/4 "type" : "multipolygon", "coordinates" : [ [ - [[74.0060, 40.7128], - [71.0589, 42.3601], - [73.7562, 42.6526], - [74.0060, 40.7128]], - - [[72.6734,41.7658], - [72.6506, 41.5623], - [73.0515, 41.5582], - [72.6734, 41.7658]] + [ + [74.0060, 40.7128], + [73.7562, 42.6526], + [71.0589, 42.3601], + [74.0060, 40.7128] + ], + [ + [73.0515, 41.5582], + [72.6506, 41.5623], + [72.6734, 41.7658], + [73.0515, 41.5582] + ] ], [ - [[73.9776, 40.7614], - [73.9554, 40.7827], - [73.9631, 40.7812], - [73.9776, 40.7614]] + [ + [73.9146, 40.8252], + [73.8871, 41.0389], + [73.6853, 40.9747], + [73.9146, 40.8252] ] ] + ] } } ``` @@ -322,7 +332,7 @@ Index a multipolygon in WKT format: ```json PUT testindex/_doc/4 { - "location" : "MULTIPOLYGON (((40.7128 74.0060, 42.3601 71.0589, 42.6526 73.7562, 40.7128 74.0060), (41.7658 72.6734, 41.5623 72.6506, 41.5582 73.0515, 41.7658 72.6734)), ((73.9776 40.7614, 73.9554 40.7827, 73.9631 40.7812, 73.9776 40.7614)))" + "location" : "MULTIPOLYGON (((74.0060 40.7128, 71.0589 42.3601, 73.7562 42.6526, 74.0060 40.7128), (72.6734 41.7658, 72.6506 41.5623, 73.0515 41.5582, 72.6734 41.7658)), ((73.9146 40.8252, 73.6853 40.9747, 73.8871 41.0389, 73.9146 40.8252)))" } ``` {% include copy-curl.html %} @@ -400,5 +410,5 @@ Parameter | Description :--- | :--- `coerce` | A Boolean value that specifies whether to automatically close unclosed linear rings. Default is `false`. `ignore_malformed` | A Boolean value that specifies to ignore malformed GeoJSON or WKT geoshapes and not to throw an exception. Default is `false` (throw an exception when geoshapes are malformed). -`ignore_z_value` | Specific to points with three coordinates. If `ignore_z_value` is `true`, the third coordinate is not indexed but is still stored in the _source field. If `ignore_z_value` is `false`, an exception is thrown. Default is `true`. +`ignore_z_value` | Specific to points with three coordinates. If `ignore_z_value` is `true`, then the third coordinate is not indexed but is still stored in the `_source` field. If `ignore_z_value` is `false`, then an exception is thrown. Default is `true`. `orientation` | Specifies the traversal order of the vertices in the geoshape's list of coordinates. `orientation` takes the following values:
1. RIGHT: counterclockwise. Specify RIGHT orientation by using one of the following strings (uppercase or lowercase): `right`, `counterclockwise`, `ccw`.
2. LEFT: clockwise. Specify LEFT orientation by using one of the following strings (uppercase or lowercase): `left`, `clockwise`, `cw`. This value can be overridden by individual documents.
Default is `RIGHT`. diff --git a/_im-plugin/ism/api.md b/_im-plugin/ism/api.md index 441f737e6f..e0fbb904cd 100644 --- a/_im-plugin/ism/api.md +++ b/_im-plugin/ism/api.md @@ -553,13 +553,13 @@ Introduced 2.4 ISM allows you to run an action automatically. However, running an action can fail for a variety of reasons. You can use error prevention validation to test an action in order to rule out failures. -To enable error prevention validation, set the `plugins.index_state_management.validation_service.enabled` setting to `true`: +To enable error prevention validation, set the `plugins.index_state_management.action_validation.enabled` setting to `true`: ```bash PUT _cluster/settings { "persistent":{ - "plugins.index_state_management.validation_action.enabled": true + "plugins.index_state_management.action_validation.enabled": true } } ``` @@ -692,4 +692,4 @@ GET _plugins/_ism/explain/test-000001 }, "total_managed_indices" : 1 } -``` \ No newline at end of file +``` diff --git a/_im-plugin/ism/error-prevention/api.md b/_im-plugin/ism/error-prevention/api.md index a273d25cfb..c03a62d868 100644 --- a/_im-plugin/ism/error-prevention/api.md +++ b/_im-plugin/ism/error-prevention/api.md @@ -12,7 +12,7 @@ The ISM Error Prevention API allows you to enable Index State Management (ISM) e ## Enable error prevention validation -You can configure error prevention validation by setting the `plugins.index_state_management.validation_service.enabled` parameter. +You can configure error prevention validation by setting the `plugins.index_state_management.action_validation.enabled` parameter. #### Example request @@ -20,7 +20,7 @@ You can configure error prevention validation by setting the `plugins.index_stat PUT _cluster/settings { "persistent":{ - "plugins.index_state_management.validation_action.enabled": true + "plugins.index_state_management.action_validation.enabled": true } } ``` diff --git a/_im-plugin/refresh-analyzer.md b/_im-plugin/refresh-analyzer.md index 2e50f06dc0..bff54b739f 100644 --- a/_im-plugin/refresh-analyzer.md +++ b/_im-plugin/refresh-analyzer.md @@ -10,7 +10,7 @@ redirect_from: # Refresh search analyzer -With ISM installed, you can refresh search analyzers in real time with the following API: +You can refresh search analyzers in real time using the following API. This requires the [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) (ISM) plugin to be installed. For more information, see [Installing plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/). ```json POST /_plugins/_refresh_search_analyzers/ diff --git a/_ingest-pipelines/processors/split.md b/_ingest-pipelines/processors/split.md index c424ef671c..cdb0cfe3de 100644 --- a/_ingest-pipelines/processors/split.md +++ b/_ingest-pipelines/processors/split.md @@ -30,7 +30,7 @@ Parameter | Required/Optional | Description :--- | :--- | :--- `field` | Required | The field containing the string to be split. `separator` | Required | The delimiter used to split the string. This can be a regular expression pattern. -`preserve_field` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, empty trailing fields are removed from the resulting array. Default is `false`. +`preserve_trailing` | Optional | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. `target_field` | Optional | The field where the array of substrings is stored. If not specified, then the field is updated in-place. `ignore_missing` | Optional | Specifies whether the processor should ignore documents that do not contain the specified field. If set to `true`, then the processor ignores missing values in the field and leaves the `target_field` unchanged. Default is `false`. `description` | Optional | A brief description of the processor. diff --git a/_install-and-configure/install-opensearch/index.md b/_install-and-configure/install-opensearch/index.md index 5423032797..ae01239013 100644 --- a/_install-and-configure/install-opensearch/index.md +++ b/_install-and-configure/install-opensearch/index.md @@ -131,5 +131,5 @@ Property | Description `opensearch.xcontent.string.length.max=` | By default, OpenSearch does not impose any limits on the maximum length of the JSON/YAML/CBOR/Smile string fields. To protect your cluster against potential distributed denial-of-service (DDoS) or memory issues, you can set the `opensearch.xcontent.string.length.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.string.length.max=5000000`. | `opensearch.xcontent.fast_double_writer=[true|false]` | By default, OpenSearch serializes floating-point numbers using the default implementation provided by the Java Runtime Environment. Set this value to `true` to use the Schubfach algorithm, which is faster but may lead to small differences in precision. Default is `false`. | `opensearch.xcontent.name.length.max=` | By default, OpenSearch does not impose any limits on the maximum length of the JSON/YAML/CBOR/Smile field names. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.name.length.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.name.length.max=50000`. | -`opensearch.xcontent.depth.max=` | By default, OpenSearch does not impose any limits on the maximum nesting depth for JSON/YAML/CBOR/Smile documents. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.depth.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.name.length.max=1000`. | +`opensearch.xcontent.depth.max=` | By default, OpenSearch does not impose any limits on the maximum nesting depth for JSON/YAML/CBOR/Smile documents. To protect your cluster against potential DDoS or memory issues, you can set the `opensearch.xcontent.depth.max` system property to a reasonable limit (the maximum is 2,147,483,647), for example, `-Dopensearch.xcontent.depth.max=1000`. | `opensearch.xcontent.codepoint.max=` | By default, OpenSearch imposes a limit of `52428800` on the maximum size of the YAML documents (in code points). To protect your cluster against potential DDoS or memory issues, you can change the `opensearch.xcontent.codepoint.max` system property to a reasonable limit (the maximum is 2,147,483,647). For example, `-Dopensearch.xcontent.codepoint.max=5000000`. | diff --git a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md index 9fee4dcbd2..9014c585c8 100644 --- a/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/neural-sparse-tool.md @@ -212,6 +212,7 @@ Parameter | Type | Required/Optional | Description `name` | String | Optional | The tool name. Useful when an LLM needs to select an appropriate tool for a task. `description` | String | Optional | A description of the tool. Useful when an LLM needs to select an appropriate tool for a task. `doc_size` | Integer | Optional | The number of documents to fetch. Default is `2`. +`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`. ## Execute parameters diff --git a/_ml-commons-plugin/agents-tools/tools/rag-tool.md b/_ml-commons-plugin/agents-tools/tools/rag-tool.md index 1f6fafe49a..c88c2d047b 100644 --- a/_ml-commons-plugin/agents-tools/tools/rag-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/rag-tool.md @@ -136,6 +136,7 @@ Parameter | Type | Required/Optional | Description `prompt` | String | Optional | The prompt to provide to the LLM. `k` | Integer | Optional | The number of nearest neighbors to search for when performing neural search. Default is 10. `enable_Content_Generation` | Boolean | Optional | If `true`, returns results generated by an LLM. If `false`, returns results directly without LLM-assisted content generation. Default is `true`. +`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`. ## Execute parameters diff --git a/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md b/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md index 9093541cbb..70d7e19321 100644 --- a/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md +++ b/_ml-commons-plugin/agents-tools/tools/vector-db-tool.md @@ -225,6 +225,7 @@ Parameter | Type | Required/Optional | Description `input` | String | Required for flow agent | Runtime input sourced from flow agent parameters. If using a large language model (LLM), this field is populated with the LLM response. `doc_size` | Integer | Optional | The number of documents to fetch. Default is `2`. `k` | Integer | Optional | The number of nearest neighbors to search for when performing neural search. Default is `10`. +`nested_path` | String | Optional | The path to the nested object for the nested query. Only used for nested fields. Default is `null`. ## Execute parameters diff --git a/_observing-your-data/alerting/per-cluster-metrics-monitors.md b/_observing-your-data/alerting/per-cluster-metrics-monitors.md index baea9c626b..bcaa03cc0c 100644 --- a/_observing-your-data/alerting/per-cluster-metrics-monitors.md +++ b/_observing-your-data/alerting/per-cluster-metrics-monitors.md @@ -91,7 +91,7 @@ The `script` parameter points the `source` to the Painless script `for (cluster "path": "_cluster/health/", "path_params": "", "url": "http://localhost:9200/_cluster/health/", - "cluster": ["cluster-1", "cluster-2"] + "clusters": ["cluster-1", "cluster-2"] } } ], diff --git a/_search-plugins/cross-cluster-search.md b/_search-plugins/cross-cluster-search.md index 947097e8b3..7d3ff72efb 100644 --- a/_search-plugins/cross-cluster-search.md +++ b/_search-plugins/cross-cluster-search.md @@ -9,7 +9,7 @@ redirect_from: # Cross-cluster search -You can use the cross-cluster search feature in OpenSearch to search and analyze data across multiple clusters, enabling you to gain insights from distributed data sources. Cross-cluster search is available by default with the Security plugin, but you need to configure each cluster to allow remote connections from other clusters. This involves setting up remote cluster connections and configuring access permissions. +You can use cross-cluster search (CCS) in OpenSearch to search and analyze data across multiple clusters, enabling you to gain insights from distributed data sources. Cross-cluster search is available by default with the Security plugin, but you need to configure each cluster to allow remote connections from other clusters. This involves setting up remote cluster connections and configuring access permissions. --- diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md index 144365166f..fa1b4096c7 100644 --- a/_search-plugins/knn/approximate-knn.md +++ b/_search-plugins/knn/approximate-knn.md @@ -141,7 +141,7 @@ The following table provides examples of the number of results returned by vario 10 | 1 | 1 | 4 | 4 | 1 10 | 10 | 1 | 4 | 10 | 10 10 | 1 | 2 | 4 | 8 | 2 - + The number of results returned by Faiss/NMSLIB differs from the number of results returned by Lucene only when `k` is smaller than `size`. If `k` and `size` are equal, all engines return the same number of results. Starting in OpenSearch 2.14, you can use `k`, `min_score`, or `max_distance` for [radial search]({{site.url}}{{site.baseurl}}/search-plugins/knn/radial-search-knn/). @@ -253,7 +253,54 @@ POST _bulk ... ``` -After data is ingested, it can be search just like any other `knn_vector` field! +After data is ingested, it can be searched in the same way as any other `knn_vector` field. + +### Additional query parameters + +Starting with version 2.16, you can provide `method_parameters` in a search request: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2, + "method_parameters" : { + "ef_search": 100 + } + } + } + } +} +``` +{% include copy-curl.html %} + +These parameters are dependent on the combination of engine and method used to create the index. The following sections provide information about the supported `method_parameters`. + +#### `ef_search` + +You can provide the `ef_search` parameter when searching an index created using the `hnsw` method. The `ef_search` parameter specifies the number of vectors to examine in order to find the top k nearest neighbors. Higher `ef_search` values improve recall at the cost of increased search latency. The value must be positive. + +The following table provides information about the `ef_search` parameter for the supported engines. + +Engine | Radial query support | Notes +:--- | :--- | :--- +`nmslib` | No | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. +`faiss` | Yes | If `ef_search` is present in a query, it overrides the `index.knn.algo_param.ef_search` index setting. +`lucene` | No | When creating a search query, you must specify `k`. If you provide both `k` and `ef_search`, then the larger value is passed to the engine. If `ef_search` is larger than `k`, you can provide the `size` parameter to limit the final number of results to `k`. + +#### `nprobes` + +You can provide the `nprobes` parameter when searching an index created using the `ivf` method. The `nprobes` parameter specifies the number of `nprobes` clusters to examine in order to find the top k nearest neighbors. Higher `nprobes` values improve recall at the cost of increased search latency. The value must be positive. + +The following table provides information about the `nprobes` parameter for the supported engines. + +Engine | Notes +:--- | :--- +`faiss` | If `nprobes` is present in a query, it overrides the value provided when creating the index. ### Using approximate k-NN with filters diff --git a/_search-plugins/search-pipelines/deleting-search-pipeline.md b/_search-plugins/search-pipelines/deleting-search-pipeline.md new file mode 100644 index 0000000000..3f113f7688 --- /dev/null +++ b/_search-plugins/search-pipelines/deleting-search-pipeline.md @@ -0,0 +1,26 @@ +--- +layout: default +title: Deleting search pipelines +nav_order: 30 +has_children: false +parent: Search pipelines +grand_parent: Search +--- + +# Deleting search pipelines + +Use the following request to delete a pipeline. + +To delete a specific search pipeline, pass the pipeline ID as a parameter: + +```json +DELETE /_search/pipeline/ +``` +{% include copy-curl.html %} + +To delete all search pipelines in a cluster, use the wildcard character (`*`): + +```json +DELETE /_search/pipeline/* +``` +{% include copy-curl.html %} diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md index 4630ab950c..ad515cc541 100644 --- a/_search-plugins/search-pipelines/search-processors.md +++ b/_search-plugins/search-pipelines/search-processors.md @@ -37,13 +37,16 @@ The following table lists all supported search response processors. Processor | Description | Earliest available version :--- | :--- | :--- +[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12 [`personalize_search_ranking`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/personalize-search-ranking/) | Uses [Amazon Personalize](https://aws.amazon.com/personalize/) to rerank search results (requires setting up the Amazon Personalize service). | 2.9 -[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12) [`rename_field`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rename-field-processor/)| Renames an existing field. | 2.8 [`rerank`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rerank-processor/)| Reranks search results using a cross-encoder model. | 2.12 -[`collapse`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/collapse-processor/)| Deduplicates search hits based on a field value, similarly to `collapse` in a search request. | 2.12 +[`retrieval_augmented_generation`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/rag-processor/) | Used for retrieval-augmented generation (RAG) in [conversational search]({{site.url}}{{site.baseurl}}/search-plugins/conversational-search/). | 2.10 (generally available in 2.12) +[`sort`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/sort-processor/)| Sorts an array of items in either ascending or descending order. | 2.16 +[`split`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/split-processor/)| Splits a string field into an array of substrings based on a specified delimiter. | 2.16 [`truncate_hits`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/truncate-hits-processor/)| Discards search hits after a specified target count is reached. Can undo the effect of the `oversample` request processor. | 2.12 + ## Search phase results processors A search phase results processor runs between search phases at the coordinating node level. It intercepts the results retrieved from one search phase and transforms them before passing them to the next search phase. diff --git a/_search-plugins/search-pipelines/sort-processor.md b/_search-plugins/search-pipelines/sort-processor.md new file mode 100644 index 0000000000..dde05c1b3a --- /dev/null +++ b/_search-plugins/search-pipelines/sort-processor.md @@ -0,0 +1,251 @@ +--- +layout: default +title: Sort +nav_order: 32 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# Sort processor + +The `sort` processor sorts an array of items in either ascending or descending order. Numeric arrays are sorted numerically, while string or mixed arrays (strings and numbers) are sorted lexicographically. The processor throws an error if the input is not an array. + +## Request fields + +The following table lists all available request fields. + +Field | Data type | Description +:--- | :--- | :--- +`field` | String | The field to be sorted. Must be an array. Required. +`order` | String | The sort order to apply. Accepts `asc` for ascending or `desc` for descending. Default is `asc`. +`target_field` | String | The name of the field in which the sorted array is stored. If not specified, then the sorted array is stored in the same field as the original array (the `field` variable). +`tag` | String | The processor's identifier. +`description` | String | A description of the processor. +`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. + +## Example + +The following example demonstrates using a search pipeline with a `sort` processor. + +### Setup + +Create an index named `my_index` and index a document with the field `message` that contains an array of strings: + +```json +POST /my_index/_doc/1 +{ + "message": ["one", "two", "three", "four"], + "visibility": "public" +} +``` +{% include copy-curl.html %} + +### Creating a search pipeline + +Create a search pipeline with a `sort` response processor that sorts the `message` field and stores the sorted results in the `sorted_message` field: + +```json +PUT /_search/pipeline/my_pipeline +{ + "response_processors": [ + { + "sort": { + "field": "message", + "target_field": "sorted_message" + } + } + ] +} +``` +{% include copy-curl.html %} + +### Using a search pipeline + +Search for documents in `my_index` without a search pipeline: + +```json +GET /my_index/_search +``` +{% include copy-curl.html %} + +The response contains the field `message`: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 1, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "message": [ + "one", + "two", + "three", + "four" + ], + "visibility": "public" + } + } + ] + } +} +``` +
+ +To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter: + +```json +GET /my_index/_search?search_pipeline=my_pipeline +``` +{% include copy-curl.html %} + +The `sorted_message` field contains the strings from the `message` field sorted alphabetically: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "sorted_message": [ + "four", + "one", + "three", + "two" + ], + "message": [ + "one", + "two", + "three", + "four" + ] + } + } + ] + } +} +``` +
+ +You can also use the `fields` option to search for specific fields in a document: + +```json +POST /my_index/_search?pretty&search_pipeline=my_pipeline +{ + "fields": ["visibility", "message"] +} +``` +{% include copy-curl.html %} + +In the response, the `message` field is sorted and the results are stored in the `sorted_message` field: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 2, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "sorted_message": [ + "four", + "one", + "three", + "two" + ], + "message": [ + "one", + "two", + "three", + "four" + ] + }, + "fields": { + "visibility": [ + "public" + ], + "sorted_message": [ + "four", + "one", + "three", + "two" + ], + "message": [ + "one", + "two", + "three", + "four" + ] + } + } + ] + } +} +``` +
\ No newline at end of file diff --git a/_search-plugins/search-pipelines/split-processor.md b/_search-plugins/search-pipelines/split-processor.md new file mode 100644 index 0000000000..6830f81ec3 --- /dev/null +++ b/_search-plugins/search-pipelines/split-processor.md @@ -0,0 +1,234 @@ +--- +layout: default +title: Split +nav_order: 33 +has_children: false +parent: Search processors +grand_parent: Search pipelines +--- + +# Split processor + +The `split` processor splits a string field into an array of substrings based on a specified delimiter. + +## Request fields + +The following table lists all available request fields. + +Field | Data type | Description +:--- | :--- | :--- +`field` | String | The field containing the string to be split. Required. +`separator` | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required. +`preserve_trailing` | Boolean | If set to `true`, preserves empty trailing fields (for example, `''`) in the resulting array. If set to `false`, then empty trailing fields are removed from the resulting array. Default is `false`. +`target_field` | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place. +`tag` | String | The processor's identifier. +`description` | String | A description of the processor. +`ignore_failure` | Boolean | If `true`, then OpenSearch [ignores any failure]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/creating-search-pipeline/#ignoring-processor-failures) of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is `false`. + +## Example + +The following example demonstrates using a search pipeline with a `split` processor. + +### Setup + +Create an index named `my_index` and index a document containing the field `message`: + +```json +POST /my_index/_doc/1 +{ + "message": "ingest, search, visualize, and analyze data", + "visibility": "public" +} +``` +{% include copy-curl.html %} + +### Creating a search pipeline + +The following request creates a search pipeline with a `split` response processor that splits the `message` field and stores the results in the `split_message` field: + +```json +PUT /_search/pipeline/my_pipeline +{ + "response_processors": [ + { + "split": { + "field": "message", + "separator": ", ", + "target_field": "split_message" + } + } + ] +} +``` +{% include copy-curl.html %} + +### Using a search pipeline + +Search for documents in `my_index` without a search pipeline: + +```json +GET /my_index/_search +``` +{% include copy-curl.html %} + +The response contains the field `message`: + +
+ + Response + + {: .text-delta} +```json +{ + "took": 3, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "message": "ingest, search, visualize, and analyze data", + "visibility": "public" + } + } + ] + } +} +``` +
+ +To search with a pipeline, specify the pipeline name in the `search_pipeline` query parameter: + +```json +GET /my_index/_search?search_pipeline=my_pipeline +``` +{% include copy-curl.html %} + +The `message` field is split and the results are stored in the `split_message` field: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 6, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "message": "ingest, search, visualize, and analyze data", + "split_message": [ + "ingest", + "search", + "visualize", + "and analyze data" + ] + } + } + ] + } +} +``` +
+ +You can also use the `fields` option to search for specific fields in a document: + +```json +POST /my_index/_search?pretty&search_pipeline=my_pipeline +{ + "fields": ["visibility", "message"] +} +``` +{% include copy-curl.html %} + +In the response, the `message` field is split and the results are stored in the `split_message` field: + +
+ + Response + + {: .text-delta} + +```json +{ + "took": 7, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 1, + "hits": [ + { + "_index": "my_index", + "_id": "1", + "_score": 1, + "_source": { + "visibility": "public", + "message": "ingest, search, visualize, and analyze data", + "split_message": [ + "ingest", + "search", + "visualize", + "and analyze data" + ] + }, + "fields": { + "visibility": [ + "public" + ], + "message": [ + "ingest, search, visualize, and analyze data" + ], + "split_message": [ + "ingest", + "search", + "visualize", + "and analyze data" + ] + } + } + ] + } +} +``` +
\ No newline at end of file diff --git a/_search-plugins/sql/settings.md b/_search-plugins/sql/settings.md index d4aaac7f40..4842f98449 100644 --- a/_search-plugins/sql/settings.md +++ b/_search-plugins/sql/settings.md @@ -78,6 +78,7 @@ Setting | Default | Description `plugins.sql.cursor.keep_alive` | 1 minute | Configures how long the cursor context is kept open. Cursor contexts are resource-intensive, so we recommend a low value. `plugins.query.memory_limit` | 85% | Configures the heap memory usage limit for the circuit breaker of the query engine. `plugins.query.size_limit` | 200 | Sets the default size of index that the query engine fetches from OpenSearch. +`plugins.query.datasources.enabled` | true | Change to `false` to disable support for data sources in the plugin. ## Spark connector settings