Merge branch 'main' into expr-doc-fix

opensearch-project · Oct 9, 2024 · 2d870c9 · 2d870c9
2 parents 8730369 + cd31d82
commit 2d870c9
Show file tree

Hide file tree

Showing 209 changed files with 7,936 additions and 1,139 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -2,7 +2,7 @@
 _Describe what this change achieves._
 
 ### Issues Resolved
-_List any issues this PR will resolve, e.g. Closes [...]._
+Closes #[_insert issue number_]
 
 ### Version
 _List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all._

diff --git a/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt b/.github/vale/styles/Vocab/OpenSearch/Words/accept.txt
@@ -77,9 +77,11 @@ Levenshtein
 [Mm]ultivalued
 [Mm]ultiword
 [Nn]amespace
-[Oo]versamples?
+[Oo]ffline
 [Oo]nboarding
+[Oo]versamples?
 pebibyte
+p\d{2}
 [Pp]erformant
 [Pp]laintext
 [Pp]luggable
@@ -101,8 +103,10 @@ pebibyte
 [Rr]eenable
 [Rr]eindex
 [Rr]eingest
+[Rr]eprovision(ed|ing)?
 [Rr]erank(er|ed|ing)?
 [Rr]epo
+[Rr]escor(e|ed|ing)?
 [Rr]ewriter
 [Rr]ollout
 [Rr]ollup
@@ -126,6 +130,7 @@ stdout
 [Ss]ubvector
 [Ss]ubwords?
 [Ss]uperset
+[Ss]uperadmins?
 [Ss]yslog
 tebibyte
 [Tt]emplated

diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,4 @@ Gemfile.lock
 *.iml
 .jekyll-cache
 .project
+vendor/bundle
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -78,6 +78,8 @@ Follow these steps to set up your local copy of the repository:
 
 1. Navigate to your cloned repository.
 
+##### Building using locally installed packages 
+
 1. Install [Ruby](https://www.ruby-lang.org/en/) if you don't already have it. We recommend [RVM](https://rvm.io/), but you can use any method you prefer:
 
    ```
@@ -98,6 +100,14 @@ Follow these steps to set up your local copy of the repository:
    bundle install
    ```
 
+##### Building using containerization
+
+Assuming you have `docker-compose` installed, run the following command:
+
+   ```
+   docker compose -f docker-compose.dev.yml up
+   ```
+
 #### Troubleshooting
 
 Try the following troubleshooting steps if you encounter an error when trying to build the documentation website:  

diff --git a/_about/index.md b/_about/index.md
@@ -22,16 +22,21 @@ This section contains documentation for OpenSearch and OpenSearch Dashboards.
 
 ## Getting started
 
-- [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/intro/)
-- [Quickstart]({{site.url}}{{site.baseurl}}/quickstart/)
+To get started, explore the following documentation:
+
+- [Getting started guide]({{site.url}}{{site.baseurl}}/getting-started/): 
+  - [Intro to OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/intro/)
+  - [Installation quickstart]({{site.url}}{{site.baseurl}}/getting-started/quickstart/)
+  - [Communicate with OpenSearch]({{site.url}}{{site.baseurl}}/getting-started/communicate/)
+  - [Ingest data]({{site.url}}{{site.baseurl}}/getting-started/ingest-data/)
+  - [Search data]({{site.url}}{{site.baseurl}}/getting-started/search-data/)
+  - [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/)
 - [Install OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/)
 - [Install OpenSearch Dashboards]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/index/)
-- [See the FAQ](https://opensearch.org/faq)
+- [FAQ](https://opensearch.org/faq)
 
 ## Why use OpenSearch?
 
-With OpenSearch, you can perform the following use cases:
-
 <table style="table-layout: auto ; width: 100%;">
 <tbody>
 <tr style="text-align: center; vertical-align:center;">
@@ -41,35 +46,38 @@ With OpenSearch, you can perform the following use cases:
 <td><img src="{{site.url}}{{site.baseurl}}/images/4_tracking.png" class="no-border" alt="Operational health tracking" height="100"/></td>
 </tr>
 <tr style="text-align: left; vertical-align:top; font-weight: bold; color: rgb(0,59,92)">
-<td>Fast, Scalable Full-text Search</td>
-<td>Application and Infrastructure Monitoring</td>
-<td>Security and Event Information Management</td>
-<td>Operational Health Tracking</td>
+<td>Fast, scalable full-text search</td>
+<td>Application and infrastructure monitoring</td>
+<td>Security and event information management</td>
+<td>Operational health tracking</td>
 </tr>
 <tr style="text-align: left; vertical-align:top;">
 <td>Help users find the right information within your application, website, or data lake catalog. </td>
-<td>Easily store and analyze log data, and set automated alerts for underperformance.</td>
+<td>Easily store and analyze log data, and set automated alerts for performance issues.</td>
 <td>Centralize logs to enable real-time security monitoring and forensic analysis.</td>
-<td>Use observability logs, metrics, and traces to monitor your applications and business in real time.</td>
+<td>Use observability logs, metrics, and traces to monitor your applications in real time.</td>
 </tr>
 </tbody>
 </table>
 
-**Additional features and plugins:**
+## Key features
+
+OpenSearch provides several features to help index, secure, monitor, and analyze your data:
 
-OpenSearch has several features and plugins to help index, secure, monitor, and analyze your data. Most OpenSearch plugins have corresponding OpenSearch Dashboards plugins that provide a convenient, unified user interface.
-- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) - Identify atypical data and receive automatic notifications
-- [KNN]({{site.url}}{{site.baseurl}}/search-plugins/knn/) - Find “nearest neighbors” in your vector data
-- [Performance Analyzer]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) - Monitor and optimize your cluster
-- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) - Use SQL or a piped processing language to query your data
-- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) - Automate index operations
-- [ML Commons plugin]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) - Train and execute machine-learning models
-- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) - Run search requests in the background
-- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) - Replicate your data across multiple OpenSearch clusters
+- [Anomaly detection]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/) -- Identify atypical data and receive automatic notifications.
+- [SQL]({{site.url}}{{site.baseurl}}/search-plugins/sql/index/) -- Use SQL or a Piped Processing Language (PPL) to query your data.
+- [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/) -- Automate index operations.
+- [Search methods]({{site.url}}{{site.baseurl}}/search-plugins/knn/) -- From traditional lexical search to advanced vector and hybrid search, discover the optimal search method for your use case.
+- [Machine learning]({{site.url}}{{site.baseurl}}/ml-commons-plugin/index/) -- Integrate machine learning models into your workloads.
+- [Workflow automation]({{site.url}}{{site.baseurl}}/automating-configurations/index/) -- Automate complex OpenSearch setup and preprocessing tasks.
+- [Performance evaluation]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/) -- Monitor and optimize your cluster.
+- [Asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/) -- Run search requests in the background.
+- [Cross-cluster replication]({{site.url}}{{site.baseurl}}/replication-plugin/index/) -- Replicate your data across multiple OpenSearch clusters.
 
 
 ## The secure path forward
-OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords.
+
+OpenSearch includes a demo configuration so that you can get up and running quickly, but before using OpenSearch in a production environment, you must [configure the Security plugin manually]({{site.url}}{{site.baseurl}}/security/configuration/index/) with your own certificates, authentication method, users, and passwords. To get started, see [Getting started with OpenSearch security]({{site.url}}{{site.baseurl}}/getting-started/security/).
 
 ## Looking for the Javadoc?
 

diff --git a/_about/version-history.md b/_about/version-history.md
@@ -9,6 +9,8 @@ permalink: /version-history/
 
 OpenSearch version | Release highlights | Release date  
 :--- | :--- | :--- 
+[2.17.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.1.md) | Includes bug fixes for ML Commons, anomaly detection, k-NN, and security analytics. Adds various infrastructure and maintenance updates. For a full list of release highlights, see the Release Notes. | 1 October 2024
+[2.17.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.17.0.md) | Includes disk-optimized vector search, binary quantization, and byte vector encoding in k-NN. Adds asynchronous batch ingestion for ML tasks. Provides search and query performance enhancements and a new custom trace source in trace analytics. Includes application-based configuration templates. For a full list of release highlights, see the Release Notes. | 17 September 2024
 [2.16.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.16.0.md) | Includes built-in byte vector quantization and binary vector support in k-NN. Adds new sort, split, and ML inference search processors for search pipelines. Provides application-based configuration templates and additional plugins to integrate multiple data sources in OpenSearch Dashboards. Includes an experimental Batch Predict ML Commons API. For a full list of release highlights, see the Release Notes. | 06 August 2024
 [2.15.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.15.0.md) | Includes parallel ingestion processing, SIMD support for exact search, and the ability to disable doc values for the k-NN field. Adds wildcard and derived field types. Improves performance for single-cardinality aggregations, rolling upgrades to remote-backed clusters, and more metrics for top N queries. For a full list of release highlights, see the Release Notes. | 25 June 2024
 [2.14.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.14.0.md) | Includes performance improvements to hybrid search and date histogram queries with multi-range traversal, ML model integration within the Ingest API, semantic cache for LangChain applications, low-level vector query interface for neural sparse queries, and improved k-NN search filtering. Provides an experimental tiered cache feature. For a full list of release highlights, see the Release Notes. | 14 May 2024
@@ -31,6 +33,7 @@ OpenSearch version | Release highlights | Release date
 [2.0.1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.1.md) | Includes bug fixes and maintenance updates for Alerting and Anomaly Detection. | 16 June 2022
 [2.0.0](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0.md) | Includes document-level monitors for alerting, OpenSearch Notifications plugins, and Geo Map Tiles in OpenSearch Dashboards. Also adds support for Lucene 9 and bug fixes for all OpenSearch plugins. For a full list of release highlights, see the Release Notes. | 26 May 2022
 [2.0.0-rc1](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-2.0.0-rc1.md) | The Release Candidate for 2.0.0. This version allows you to preview the upcoming 2.0.0 release before the GA release. The preview release adds document-level alerting, support for Lucene 9, and the ability to use term lookup queries in document level security. | 03 May 2022
+[1.3.19](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.19.md) | Includes bug fixes and maintenance updates for OpenSearch security, OpenSearch security Dashboards, and anomaly detection. | 27 August 2024
 [1.3.18](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.18.md) | Includes maintenance updates for OpenSearch security. | 16 July 2024
 [1.3.17](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.17.md) | Includes maintenance updates for OpenSearch security and OpenSearch Dashboards security. | 06 June 2024
 [1.3.16](https://github.com/opensearch-project/opensearch-build/blob/main/release-notes/opensearch-release-notes-1.3.16.md) | Includes bug fixes and maintenance updates for OpenSearch security, index management, performance analyzer, and reporting. | 23 April 2024

diff --git a/_analyzers/index-analyzers.md b/_analyzers/index-analyzers.md
@@ -2,6 +2,7 @@
 layout: default
 title: Index analyzers
 nav_order: 20
+parent: Analyzers
 ---
 
 # Index analyzers

diff --git a/_analyzers/index.md b/_analyzers/index.md
@@ -45,20 +45,9 @@ An analyzer must contain exactly one tokenizer and may contain zero or more char
 
 There is also a special type of analyzer called a ***normalizer***. A normalizer is similar to an analyzer except that it does not contain a tokenizer and can only include specific types of character filters and token filters. These filters can perform only character-level operations, such as character or pattern replacement, and cannot perform operations on the token as a whole. This means that replacing a token with a synonym or stemming is not supported. See [Normalizers]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) for further details.
 
-## Built-in analyzers
+## Supported analyzers
 
-The following table lists the built-in analyzers that OpenSearch provides. The last column of the table contains the result of applying the analyzer to the string `It’s fun to contribute a brand-new PR or 2 to OpenSearch!`.
-
-Analyzer | Analysis performed | Analyzer output 
-:--- | :--- | :---
-**Standard** (default) | - Parses strings into tokens at word boundaries <br> - Removes most punctuation <br> - Converts tokens to lowercase | [`it’s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-**Simple** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Converts tokens to lowercase  | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `to`, `opensearch`]
-**Whitespace** | - Parses strings into tokens on white space | [`It’s`, `fun`, `to`, `contribute`, `a`,`brand-new`, `PR`, `or`, `2`, `to`, `OpenSearch!`]
-**Stop** | - Parses strings into tokens on any non-letter character <br> - Removes non-letter characters <br> - Removes stop words <br> - Converts tokens to lowercase | [`s`, `fun`, `contribute`, `brand`, `new`, `pr`, `opensearch`]
-**Keyword** (no-op) | - Outputs the entire string unchanged | [`It’s fun to contribute a brand-new PR or 2 to OpenSearch!`]
-**Pattern** | - Parses strings into tokens using regular expressions <br> - Supports converting strings to lowercase <br> - Supports removing stop words | [`it`, `s`, `fun`, `to`, `contribute`, `a`,`brand`, `new`, `pr`, `or`, `2`, `to`, `opensearch`]
-[**Language**]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/) | Performs analysis specific to a certain language (for example, `english`). | [`fun`, `contribut`, `brand`, `new`, `pr`, `2`, `opensearch`]
-**Fingerprint** | - Parses strings on any non-letter character <br> - Normalizes characters by converting them to ASCII <br> - Converts tokens to lowercase <br> - Sorts, deduplicates, and concatenates tokens into a single token <br> - Supports removing stop words | [`2 a brand contribute fun it's new opensearch or pr to`] <br> Note that the apostrophe was converted to its ASCII counterpart.
+For a list of supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
 
 ## Custom analyzers
 
@@ -170,6 +159,29 @@ The response provides information about the analyzers for each field:
 }
 ```
 
+## Normalizers
+Tokenization divides text into individual terms, but it does not address variations in token forms. Normalization resolves these issues by converting tokens into a standard format. This ensures that similar terms are matched appropriately, even if they are not identical.
+
+### Normalization techniques
+
+The following normalization techniques can help address variations in token forms:
+1. **Case normalization**: Converts all tokens to lowercase to ensure case-insensitive matching. For example, "Hello" is normalized to "hello".
+
+2. **Stemming**: Reduces words to their root form. For instance, "cars" is stemmed to "car", and "running" is normalized to "run".
+
+3. **Synonym handling:** Treats synonyms as equivalent. For example, "jogging" and "running" can be indexed under a common term, such as "run".
+
+### Normalization
+
+A search for `Hello` will match documents containing `hello` because of case normalization.
+
+A search for `cars` will also match documents containing `car` because of stemming.
+
+A query for `running` can retrieve documents containing `jogging` using synonym handling.
+
+Normalization ensures that searches are not limited to exact term matches, allowing for more relevant results. For instance, a search for `Cars running` can be normalized to match `car run`.
+
 ## Next steps
 
-- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
+- Learn more about specifying [index analyzers]({{site.url}}{{site.baseurl}}/analyzers/index-analyzers/) and [search analyzers]({{site.url}}{{site.baseurl}}/analyzers/search-analyzers/).
+- See the list of [supported analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
diff --git a/_analyzers/language-analyzers.md b/_analyzers/language-analyzers.md
@@ -1,14 +1,15 @@
 ---
 layout: default
 title: Language analyzers
-nav_order: 10
+nav_order: 100
+parent: Analyzers
 redirect_from:
   - /query-dsl/analyzers/language-analyzers/
 ---
 
-# Language analyzer
+# Language analyzers
 
-OpenSearch supports the following language values with the `analyzer` option:
+OpenSearch supports the following language analyzers:
 `arabic`, `armenian`, `basque`, `bengali`, `brazilian`, `bulgarian`, `catalan`, `czech`, `danish`, `dutch`, `english`, `estonian`, `finnish`, `french`, `galician`, `german`, `greek`, `hindi`, `hungarian`, `indonesian`, `irish`, `italian`, `latvian`, `lithuanian`, `norwegian`, `persian`, `portuguese`, `romanian`, `russian`, `sorani`, `spanish`, `swedish`, `turkish`, and `thai`.
 
 To use the analyzer when you map an index, specify the value within your query. For example, to map your index with the French language analyzer, specify the `french` value for the analyzer field:
@@ -40,4 +41,4 @@ PUT my-index
 }
 ```
 
-<!-- TO do: each of the options needs its own section with an example. Convert table to individual sections, and then give a streamlined list with valid values. -->
+<!-- TO do: each of the options needs its own section with an example. Convert table to individual sections, and then give a streamlined list with valid values. -->
diff --git a/_analyzers/search-analyzers.md b/_analyzers/search-analyzers.md
@@ -2,6 +2,7 @@
 layout: default
 title: Search analyzers
 nav_order: 30
+parent: Analyzers
 ---
 
 # Search analyzers
@@ -42,7 +43,7 @@ GET shakespeare/_search
 ```
 {% include copy-curl.html %}
 
-Valid values for [built-in analyzers]({{site.url}}{{site.baseurl}}/analyzers/index#built-in-analyzers) are `standard`, `simple`, `whitespace`, `stop`, `keyword`, `pattern`, `fingerprint`, or any supported [language analyzer]({{site.url}}{{site.baseurl}}/analyzers/language-analyzers/).
+For more information about supported analyzers, see [Analyzers]({{site.url}}{{site.baseurl}}/analyzers/supported-analyzers/index/).
 
 ## Specifying a search analyzer for a field