diff --git a/_api-reference/index-apis/create-index-template.md b/_api-reference/index-apis/create-index-template.md index 2a92e3f4c4..ea71126210 100644 --- a/_api-reference/index-apis/create-index-template.md +++ b/_api-reference/index-apis/create-index-template.md @@ -45,7 +45,7 @@ Parameter | Type | Description `priority` | Integer | A number that determines which index templates take precedence during the creation of a new index or data stream. OpenSearch chooses the template with the highest priority. When no priority is given, the template is assigned a `0`, signifying the lowest priority. Optional. `template` | Object | The template that includes the `aliases`, `mappings`, or `settings` for the index. For more information, see [#template]. Optional. `version` | Integer | The version number used to manage index templates. Version numbers are not automatically set by OpenSearch. Optional. - +`context` | Object | (Experimental) The `context` parameter provides use-case-specific predefined templates that can be applied to an index. Among all settings and mappings declared for a template, context templates hold the highest priority. For more information, see [index-context]({{site.url}}{{site.baseurl}}/im-plugin/index-context/). ### Template diff --git a/_api-reference/index-apis/create-index.md b/_api-reference/index-apis/create-index.md index 2f4c1041bc..7f7d26815f 100644 --- a/_api-reference/index-apis/create-index.md +++ b/_api-reference/index-apis/create-index.md @@ -50,7 +50,7 @@ timeout | Time | How long to wait for the request to return. Default is `30s`. ## Request body -As part of your request, you can optionally specify [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/), [mappings]({{site.url}}{{site.baseurl}}/field-types/index/), and [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/) for your newly created index. +As part of your request, you can optionally specify [index settings]({{site.url}}{{site.baseurl}}/im-plugin/index-settings/), [mappings]({{site.url}}{{site.baseurl}}/field-types/index/), [aliases]({{site.url}}{{site.baseurl}}/opensearch/index-alias/), and [index context]({{site.url}}{{site.baseurl}}/opensearch/index-context/). ## Example request diff --git a/_api-reference/snapshots/get-snapshot-status.md b/_api-reference/snapshots/get-snapshot-status.md index c7f919bcb3..9636b40d64 100644 --- a/_api-reference/snapshots/get-snapshot-status.md +++ b/_api-reference/snapshots/get-snapshot-status.md @@ -22,8 +22,9 @@ Path parameters are optional. | Parameter | Data type | Description | :--- | :--- | :--- -| repository | String | Repository containing the snapshot. | -| snapshot | String | Snapshot to return. | +| repository | String | The repository containing the snapshot. | +| snapshot | List | The snapshot(s) to return. | +| index | List | The indexes to include in the response. | Three request variants provide flexibility: @@ -31,16 +32,23 @@ Three request variants provide flexibility: * `GET _snapshot//_status` returns all currently running snapshots in the specified repository. This is the preferred variant. -* `GET _snapshot///_status` returns detailed status information for a specific snapshot in the specified repository, regardless of whether it's currently running or not. +* `GET _snapshot///_status` returns detailed status information for a specific snapshot(s) in the specified repository, regardless of whether it's currently running. -Using the API to return state for other than currently running snapshots can be very costly for (1) machine machine resources and (2) processing time if running in the cloud. For each snapshot, each request causes file reads from all a snapshot's shards. +* `GET /_snapshot////_status` returns detailed status information only for the specified indexes in a specific snapshot in the specified repository. Note that this endpoint works only for indexes belonging to a specific snapshot. + +Snapshot API calls only work if the total number of shards across the requested resources, such as snapshots and indexes created from snapshots, is smaller than the limit specified by the following cluster setting: + +- `snapshot.max_shards_allowed_in_status_api`(Dynamic, integer): The maximum number of shards that can be included in the Snapshot Status API response. Default value is `200000`. Not applicable for [shallow snapshots v2]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/snapshot-interoperability##shallow-snapshot-v2), where the total number and sizes of files are returned as 0. + + +Using the API to return the state of snapshots that are not currently running can be very costly in terms of both machine resources and processing time when querying data in the cloud. For each snapshot, each request causes a file read of all of the snapshot's shards. {: .warning} ## Request fields | Field | Data type | Description | :--- | :--- | :--- -| ignore_unavailable | Boolean | How to handles requests for unavailable snapshots. If `false`, the request returns an error for unavailable snapshots. If `true`, the request ignores unavailable snapshots, such as those that are corrupted or temporarily cannot be returned. Defaults to `false`.| +| ignore_unavailable | Boolean | How to handle requests for unavailable snapshots and indexes. If `false`, the request returns an error for unavailable snapshots and indexes. If `true`, the request ignores unavailable snapshots and indexes, such as those that are corrupted or temporarily cannot be returned. Default is `false`.| ## Example request @@ -375,18 +383,18 @@ The `GET _snapshot/my-opensearch-repo/my-first-snapshot/_status` request returns :--- | :--- | :--- | repository | String | Name of repository that contains the snapshot. | | snapshot | String | Snapshot name. | -| uuid | String | Snapshot Universally unique identifier (UUID). | +| uuid | String | A snapshot's universally unique identifier (UUID). | | state | String | Snapshot's current status. See [Snapshot states](#snapshot-states). | | include_global_state | Boolean | Whether the current cluster state is included in the snapshot. | | shards_stats | Object | Snapshot's shard counts. See [Shard stats](#shard-stats). | -| stats | Object | Details of files included in the snapshot. `file_count`: number of files. `size_in_bytes`: total of all fie sizes. See [Snapshot file stats](#snapshot-file-stats). | +| stats | Object | Information about files included in the snapshot. `file_count`: number of files. `size_in_bytes`: total size of all files. See [Snapshot file stats](#snapshot-file-stats). | | index | list of Objects | List of objects that contain information about the indices in the snapshot. See [Index objects](#index-objects).| ##### Snapshot states | State | Description | :--- | :--- | -| FAILED | The snapshot terminated in an error and no data was stored. | +| FAILED | The snapshot terminated in an error and no data was stored. | | IN_PROGRESS | The snapshot is currently running. | | PARTIAL | The global cluster state was stored, but data from at least one shard was not stored. The `failures` property of the [Create snapshot]({{site.url}}{{site.baseurl}}/api-reference/snapshots/create-snapshot) response contains additional details. | | SUCCESS | The snapshot finished and all shards were stored successfully. | @@ -420,4 +428,4 @@ All property values are Integers. :--- | :--- | :--- | | shards_stats | Object | See [Shard stats](#shard-stats). | | stats | Object | See [Snapshot file stats](#snapshot-file-stats). | -| shards | list of Objects | List of objects containing information about the shards that include the snapshot. OpenSearch returns the following properties about the shards.

**stage**: Current state of shards in the snapshot. Shard states are:

* DONE: Number of shards in the snapshot that were successfully stored in the repository.

* FAILURE: Number of shards in the snapshot that were not successfully stored in the repository.

* FINALIZE: Number of shards in the snapshot that are in the finalizing stage of being stored in the repository.

* INIT: Number of shards in the snapshot that are in the initializing stage of being stored in the repository.

* STARTED: Number of shards in the snapshot that are in the started stage of being stored in the repository.

**stats**: See [Snapshot file stats](#snapshot-file-stats).

**total**: Total number and size of files referenced by the snapshot.

**start_time_in_millis**: Time (in milliseconds) when snapshot creation began.

**time_in_millis**: Total time (in milliseconds) that the snapshot took to complete. | +| shards | List of objects | Contains information about the shards included in the snapshot. OpenSearch returns the following properties about the shard:

**stage**: The current state of shards in the snapshot. Shard states are:

* DONE: The number of shards in the snapshot that were successfully stored in the repository.

* FAILURE: The number of shards in the snapshot that were not successfully stored in the repository.

* FINALIZE: The number of shards in the snapshot that are in the finalizing stage of being stored in the repository.

* INIT: The number of shards in the snapshot that are in the initializing stage of being stored in the repository.

* STARTED: The number of shards in the snapshot that are in the started stage of being stored in the repository.

**stats**: See [Snapshot file stats](#snapshot-file-stats).

**total**: The total number and sizes of files referenced by the snapshot.

**start_time_in_millis**: The time (in milliseconds) when snapshot creation began.

**time_in_millis**: The total amount of time (in milliseconds) that the snapshot took to complete. | diff --git a/_im-plugin/index-context.md b/_im-plugin/index-context.md new file mode 100644 index 0000000000..be0dbd527d --- /dev/null +++ b/_im-plugin/index-context.md @@ -0,0 +1,175 @@ +--- +layout: default +title: Index context +nav_order: 14 +redirect_from: + - /opensearch/index-context/ +--- + +# Index context + +This is an experimental feature and is not recommended for use in a production environment. For updates on the progress the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/). +{: .warning} + +Index context declares the use case for an index. Using the context information, OpenSearch applies a predetermined set of settings and mappings, which provides the following benefits: + +- Optimized performance +- Settings tuned to your specific use case +- Accurate mappings and aliases based on [OpenSearch Integrations]({{site.url}}{{site.baseurl}}/integrations/) + +The settings and metadata configuration that are applied using component templates are automatically loaded when your cluster starts. Component templates that start with `@abc_template@` or Application-Based Configuration (ABC) templates can only be used through a `context` object declaration, in order to prevent configuration issues. +{: .warning} + + +## Installation + +To install the index context feature: + +1. Install the `opensearch-system-templates` plugin on all nodes in your cluster using one of the [installation methods]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#install). + +2. Set the feature flag `opensearch.experimental.feature.application_templates.enabled` to `true`. For more information about enabling and disabling feature flags, see [Enabling experimental features]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/). + +3. Set the `cluster.application_templates.enabled` setting to `true`. For instructions on how to configure OpenSearch, see [configuring settings]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/#static-settings). + +## Using the `context` setting + +Use the `context` setting with the Index API to add use-case-specific context. + +### Considerations + +Consider the following when using the `context` parameter during index creation: + +1. If you use the `context` parameter to create an index, you cannot include any settings declared in the index context during index creation or dynamic settings updates. +2. The index context becomes permanent when set on an index or index template. + +When you adhere to these limitations, suggested configurations or mappings are uniformly applied on indexed data within the specified context. + +### Examples + +The following examples show how to use index context. + + +#### Create an index + +The following example request creates an index in which to store metric data by declaring a `metrics` mapping as the context: + +```json +PUT /my-metrics-index +{ + "context": { + "name": "metrics" + } +} +``` +{% include copy-curl.html %} + +After creation, the context is added to the index and the corresponding settings are applied: + + +**GET request** + +```json +GET /my-metrics-index +``` +{% include copy-curl.html %} + + +**Response** + +```json +{ + "my-metrics-index": { + "aliases": {}, + "mappings": {}, + "settings": { + "index": { + "codec": "zstd_no_dict", + "refresh_interval": "60s", + "number_of_shards": "1", + "provided_name": "my-metrics-index", + "merge": { + "policy": "log_byte_size" + }, + "context": { + "created_version": "1", + "current_version": "1" + }, + ... + } + }, + "context": { + "name": "metrics", + "version": "_latest" + } + } +} +``` + + +#### Create an index template + +You can also use the `context` parameter when creating an index template. The following example request creates an index template with the context information as `logs`: + +```json +PUT _index_template/my-logs +{ + "context": { + "name": "logs", + "version": "1" + }, + "index_patterns": [ + "my-logs-*" + ] +} +``` +{% include copy-curl.html %} + +All indexes created using this index template will get the metadata provided by the associated component template. The following request and response show how `context` is added to the template: + +**Get index template** + +```json +GET _index_template/my-logs +``` +{% include copy-curl.html %} + +**Response** + +```json +{ + "index_templates": [ + { + "name": "my-logs2", + "index_template": { + "index_patterns": [ + "my-logs1-*" + ], + "context": { + "name": "logs", + "version": "1" + } + } + } + ] +} +``` + +If there is any conflict between any settings, mappings, or aliases directly declared by your template and the backing component template for the context, the latter gets higher priority during index creation. + + +## Available context templates + +The following templates are available to be used through the `context` parameter as of OpenSearch 2.17: + +- `logs` +- `metrics` +- `nginx-logs` +- `amazon-cloudtrail-logs` +- `amazon-elb-logs` +- `amazon-s3-logs` +- `apache-web-logs` +- `k8s-logs` + +For more information about these templates, see the [OpenSearch system templates repository](https://github.com/opensearch-project/opensearch-system-templates/tree/main/src/main/resources/org/opensearch/system/applicationtemplates/v1). + +To view the current version of these templates on your cluster, use `GET /_component_template`. diff --git a/_install-and-configure/configuring-opensearch/availability-recovery.md b/_install-and-configure/configuring-opensearch/availability-recovery.md index 94960ebe0a..d25396a63f 100644 --- a/_install-and-configure/configuring-opensearch/availability-recovery.md +++ b/_install-and-configure/configuring-opensearch/availability-recovery.md @@ -16,7 +16,6 @@ Availability and recovery settings include settings for the following: - [Shard indexing backpressure](#shard-indexing-backpressure-settings) - [Segment replication](#segment-replication-settings) - [Cross-cluster replication](#cross-cluster-replication-settings) -- [Workload management](#workload-management-settings) To learn more about static and dynamic settings, see [Configuring OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/index/). @@ -71,7 +70,3 @@ For information about segment replication backpressure settings, see [Segment re ## Cross-cluster replication settings For information about cross-cluster replication settings, see [Replication settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/replication-plugin/settings/). - -## Workload management settings - -Workload management is a mechanism that allows administrators to organize queries into distinct groups. For more information, see [Workload management settings]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/workload-management/#workload-management-settings). diff --git a/_layouts/search_layout.html b/_layouts/search_layout.html index 47b8f25d1c..67e877fcb8 100644 --- a/_layouts/search_layout.html +++ b/_layouts/search_layout.html @@ -38,12 +38,16 @@

Filter results

- +
- - + + +
+
+ +
@@ -97,10 +101,7 @@

element.value).join(','); const urlPath = window.location.pathname; const versionMatch = urlPath.match(/(\d+\.\d+)/); const docsVersion = versionMatch ? versionMatch[1] : "latest"; @@ -139,11 +140,12 @@

{ + categoryBlog.addEventListener('change', () => { + updateAllCheckbox(); + triggerSearch(searchInput.value.trim()); + }); + categoryEvent.addEventListener('change', () => { updateAllCheckbox(); triggerSearch(searchInput.value.trim()); }); diff --git a/_tools/index.md b/_tools/index.md index 108f10da97..c9d446a81a 100644 --- a/_tools/index.md +++ b/_tools/index.md @@ -18,6 +18,7 @@ This section provides documentation for OpenSearch-supported tools, including: - [OpenSearch CLI](#opensearch-cli) - [OpenSearch Kubernetes operator](#opensearch-kubernetes-operator) - [OpenSearch upgrade, migration, and comparison tools](#opensearch-upgrade-migration-and-comparison-tools) +- [Sycamore](#sycamore) for AI-powered extract, transform, load (ETL) on complex documents for vector and hybrid search For information about Data Prepper, the server-side data collector for filtering, enriching, transforming, normalizing, and aggregating data for downstream analytics and visualization, see [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/index/). @@ -122,3 +123,9 @@ The OpenSearch Kubernetes Operator is an open-source Kubernetes operator that he OpenSearch migration tools facilitate migrations to OpenSearch and upgrades to newer versions of OpenSearch. These can help you can set up a proof-of-concept environment locally using Docker containers or deploy to AWS using a one-click deployment script. This empowers you to fine-tune cluster configurations and manage workloads more effectively before migration. For more information about OpenSearch migration tools, see the documentation in the [OpenSearch Migration GitHub repository](https://github.com/opensearch-project/opensearch-migrations/tree/capture-and-replay-v0.1.0). + +## Sycamore + +[Sycamore](https://github.com/aryn-ai/sycamore) is an open-source, AI-powered document processing engine designed to prepare unstructured data for retrieval-augmented generation (RAG) and semantic search using Python. Sycamore supports chunking and enriching a wide range of complex document types, including reports, presentations, transcripts, and manuals. Additionally, Sycamore can extract and process embedded elements, such as tables, figures, graphs, and other infographics. It can then load the data into target indexes, including vector and keyword indexes, using an [OpenSearch connector](https://sycamore.readthedocs.io/en/stable/sycamore/connectors/opensearch.html). + +For more information, see [Sycamore]({{site.url}}{{site.baseurl}}/tools/sycamore/). diff --git a/_tools/sycamore.md b/_tools/sycamore.md new file mode 100644 index 0000000000..9b3986dbf3 --- /dev/null +++ b/_tools/sycamore.md @@ -0,0 +1,48 @@ +--- +layout: default +title: Sycamore +nav_order: 210 +has_children: false +--- + +# Sycamore + +[Sycamore](https://github.com/aryn-ai/sycamore) is an open-source, AI-powered document processing engine designed to prepare unstructured data for retrieval-augmented generation (RAG) and semantic search using Python. Sycamore supports chunking and enriching a wide range of complex document types, including reports, presentations, transcripts, and manuals. Additionally, Sycamore can extract and process embedded elements, such as tables, figures, graphs, and other infographics. It can then load the data into target indexes, including vector and keyword indexes, using a connector like the [OpenSearch connector](https://sycamore.readthedocs.io/en/stable/sycamore/connectors/opensearch.html). + +To get started, visit the [Sycamore documentation](https://sycamore.readthedocs.io/en/stable/sycamore/get_started.html). + +## Sycamore ETL pipeline structure + +A Sycamore extract, transform, load (ETL) pipeline applies a series of transformations to a [DocSet](https://sycamore.readthedocs.io/en/stable/sycamore/get_started/concepts.html#docsets), which is a collection of documents and their constituent elements (for example, tables, blocks of text, or headers). At the end of the pipeline, the DocSet is loaded into OpenSearch vector and keyword indexes. + +A typical pipeline for preparing unstructured data for vector or hybrid search in OpenSearch consists of the following steps: + +* Read documents into a [DocSet](https://sycamore.readthedocs.io/en/stable/sycamore/get_started/concepts.html#docsets). +* [Partition documents](https://sycamore.readthedocs.io/en/stable/sycamore/transforms/partition.html) into structured JSON elements. +* Extract metadata and filter and clean data using [transforms](https://sycamore.readthedocs.io/en/stable/sycamore/APIs/docset.html). +* Create [chunks](https://sycamore.readthedocs.io/en/stable/sycamore/transforms/merge.html) from groups of elements. +* Embed the chunks using the model of your choice. +* [Load](https://sycamore.readthedocs.io/en/stable/sycamore/connectors/opensearch.html) the embeddings, metadata, and text into OpenSearch vector and keyword indexes. + +For an example pipeline that uses this workflow, see [this notebook](https://github.com/aryn-ai/sycamore/blob/main/notebooks/opensearch_docs_etl.ipynb). + + +## Install Sycamore + +We recommend installing the Sycamore library using `pip`. The connector for OpenSearch can be specified and installed using extras. For example: + +```bash +pip install sycamore-ai[opensearch] +``` +{% include copy.html %} + +By default, Sycamore works with the Aryn Partitioning Service to process PDFs. To run inference locally for partitioning or embedding, install Sycamore with the `local-inference` extra as follows: + +```bash +pip install sycamore-ai[opensearch,local-inference] +``` +{% include copy.html %} + +## Next steps + +For more information, visit the [Sycamore documentation](https://sycamore.readthedocs.io/en/stable/sycamore/get_started.html). diff --git a/_tuning-your-cluster/availability-and-recovery/workload-management.md b/_tuning-your-cluster/availability-and-recovery/workload-management.md deleted file mode 100644 index 1c6d9baf46..0000000000 --- a/_tuning-your-cluster/availability-and-recovery/workload-management.md +++ /dev/null @@ -1,60 +0,0 @@ ---- -layout: default -title: Workload management -nav_order: 60 -has_children: false -parent: Availability and recovery ---- - -# Workload management - -Workload management is a mechanism that allows administrators to organize queries into distinct groups, referred to as _query groups_. These query groups enable admins to limit the cumulative resource usage of each group, ensuring more balanced and fair resource distribution between them. This mechanism provides greater control over resource consumption so that no single group can monopolize cluster resources at the expense of others. - -## Query group - -A query group is a logical construct designed to manage search requests within defined virtual resource limits. The query group service tracks and aggregates resource usage at the node level for different groups, enforcing restrictions to ensure that no group exceeds its allocated resources. Depending on the configured containment mode, the system can limit or terminate tasks that surpass these predefined thresholds. - -Because the definition of a query group is stored in the cluster state, these resource limits are enforced consistently across all nodes in the cluster. - -### Schema - -Query groups use the following schema: - -```json -{ - "_id": "fafjafjkaf9ag8a9ga9g7ag0aagaga", - "resource_limits": { - "memory": 0.4, - "cpu": 0.2 - }, - "resiliency_mode": "enforced", - "name": "analytics", - "updated_at": 4513232415 -} -``` - -### Resource type - -Resource types represent the various system resources that are monitored and managed across different query groups. The following resource types are supported: - -- CPU usage -- JVM memory usage - -### Resiliency mode - -Resiliency mode determines how the assigned resource limits relate to the actual allowed resource usage. The following resiliency modes are supported: - -- **Soft mode** -- The query group can exceed the query group resource limits if the node is not under duress. -- **Enforced mode** -- The query group will never exceed the assigned limits and will be canceled as soon as the limits are exceeded. -- **Monitor mode** -- The query group will not cause any cancellations and will only log the eligible task cancellations. - -## Workload management settings - -Workload management settings allow you to define thresholds for rejecting or canceling tasks based on resource usage. Adjusting the following settings can help to maintain optimal performance and stability within your OpenSearch cluster. - -Setting | Default | Description -:--- | :--- | :--- -`wlm.query_group.node.memory_rejection_threshold` | `0.8` | The memory-based rejection threshold for query groups at the node level. Tasks that exceed this threshold will be rejected. The maximum allowed value is `0.9`. -`wlm.query_group.node.memory_cancellation_threshold` | `0.9` | The memory-based cancellation threshold for query groups at the node level. Tasks that exceed this threshold will be canceled. The maximum allowed value is `0.95`. -`wlm.query_group.node.cpu_rejection_threshold` | `0.8` | The CPU-based rejection threshold for query groups at the node level. Tasks that exceed this threshold will be rejected. The maximum allowed value is `0.9`. -`wlm.query_group.node.cpu_cancellation_threshold` | `0.9` | The CPU-based cancellation threshold for query groups at the node level. Tasks that exceed this threshold will be canceled. The maximum allowed value is `0.95`. diff --git a/assets/js/search.js b/assets/js/search.js index 8d9cab2ec5..4d4fce62f3 100644 --- a/assets/js/search.js +++ b/assets/js/search.js @@ -319,7 +319,7 @@ window.doResultsPageSearch = async (query, type, version) => { searchResultsContainer.appendChild(resultElement); const breakline = document.createElement('hr'); - breakline.style.border = '.5px solid #ccc'; + breakline.style.borderTop = '.5px solid #ccc'; breakline.style.margin = 'auto'; searchResultsContainer.appendChild(breakline); });