From 942c7d725e1e17bcfc72db709e487757c4f2dfa5 Mon Sep 17 00:00:00 2001 From: Kaituo Li Date: Mon, 9 Sep 2024 14:13:19 -0700 Subject: [PATCH 01/36] Add documentation for rule-based anomaly detection and imputation This PR introduces new documentation for rule-based anomaly detection (AD) and imputation options, providing detailed guidance on configuring these features. It also updates the maximum shingle size information and enhances the documentation for window delay settings. Testing done: - Successfully ran Jekyll build and reviewed the updated documentation to ensure all changes are correctly displayed. Signed-off-by: Kaituo Li --- _observing-your-data/ad/index.md | 41 ++++++++++++- _observing-your-data/ad/result-mapping.md | 75 +++++++++++++++++++++++ 2 files changed, 115 insertions(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 5dfa1b8f1a..5dbd23e3ac 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -76,6 +76,8 @@ The query is designed to retrieve documents in which the `urlPath.keyword` field - (Optional) To add extra processing time for data collection, specify a **Window delay** value. - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay. - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time. + - To avoid missing any data, set the **Window delay** to the upper bound of the expected ingestion delay. This ensures the detector accounts for all data during its interval, reducing the chances of missing relevant information. While setting a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will always be looking further back in time. Strike a balance to maintain both data accuracy and timely detection. + 1. Specify custom results index. - The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. To enable this, select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, like `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored. @@ -164,7 +166,44 @@ This formula serves as a starting point. Make sure to test it with a representat Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case. -The anomaly detector expects the shingle size to be in the range of 1 and 60. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal. +The anomaly detector expects the shingle size to be in the range of 1 and 128. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal. + +#### (Advanced settings) Set an imputation option + +The Imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps: + +- **Ignore Missing Data (Default):** The system continues without factoring in missing data points, maintaining the existing data flow. +- **Fill with Custom Values:** Specify a custom value for each feature to replace missing data points, allowing for targeted imputation tailored to your data. +- **Fill with Zeros:** Replace missing values with zeros, ideal when the absence of data itself indicates a significant event, such as a drop to zero in event counts. +- **Use Previous Values:** Fill gaps with the last observed value, maintaining continuity in your time series data. This method treats missing data as non-anomalous, carrying forward the previous trend. + +Using these options can improve recall in anomaly detection. For instance, if you're monitoring for drops in event counts, including both partial and complete drops, filling missing values with zeros helps detect significant data absences, improving detection recall. + +Note: Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Remember, quality input is critical—poor data quality will lead to poor model performance. You can determine whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. For more information, see [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping). + +#### (Advanced settings) Suppressing Anomalies with Threshold-Based Rules + +You can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations. + +Suppose you want to detect substantial changes in log volume while ignoring small variations that aren't meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can filter out minor deviations and hone in on genuinely anomalous patterns. + +If you want to suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules: + +``` +Ignore anomalies for feature logVolume when the actual value is no more than 30% above the expected value. +Ignore anomalies for feature logVolume when the actual value is no more than 30% below the expected value. +``` + +Note: Ensure that a feature (e.g., logVolume) is properly defined in your model, as suppression rules are tied to specific features. + +If you expect that the log volume should differ by at least 10,000 from the expected value before being considered an anomaly, you can set absolute thresholds: + +``` +Ignore anomalies for feature logVolume when the actual value is no more than 10000 above the expected value. +Ignore anomalies for feature logVolume when the actual value is no more than 10000 below the expected value. +``` + +If no custom suppression rules are set, the system defaults to a filter that ignores anomalies with deviations of less than 20% from the expected value for each enabled feature. #### Preview sample anomalies diff --git a/_observing-your-data/ad/result-mapping.md b/_observing-your-data/ad/result-mapping.md index 7e1482a013..0ee8d02a97 100644 --- a/_observing-your-data/ad/result-mapping.md +++ b/_observing-your-data/ad/result-mapping.md @@ -80,6 +80,81 @@ Field | Description `model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity. `threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold. +When the imputation option is enabled, the anomaly result output will include a `feature_imputed` array, indicating whether each feature has been imputed. This information helps you understand which features were modified during the anomaly detection process due to missing data. If no features were imputed, the feature_imputed array will be omitted from the results. + +In the following example, the feature processing_bytes_max was imputed, as indicated by the `imputed: true` status: + +```json +{ + "detector_id": "kzcZ43wBgEQAbjDnhzGF", + "schema_version": 5, + "data_start_time": 1635898161367, + "data_end_time": 1635898221367, + "feature_data": [ + { + "feature_id": "processing_bytes_max", + "feature_name": "processing bytes max", + "data": 2322 + }, + { + "feature_id": "processing_bytes_avg", + "feature_name": "processing bytes avg", + "data": 1718.6666666666667 + }, + { + "feature_id": "processing_bytes_min", + "feature_name": "processing bytes min", + "data": 1375 + }, + { + "feature_id": "processing_bytes_sum", + "feature_name": "processing bytes sum", + "data": 5156 + }, + { + "feature_id": "processing_time_max", + "feature_name": "processing time max", + "data": 31198 + } + ], + "execution_start_time": 1635898231577, + "execution_end_time": 1635898231622, + "anomaly_score": 1.8124904404395776, + "anomaly_grade": 0, + "confidence": 0.9802940756605277, + "entity": [ + { + "name": "process_name", + "value": "process_3" + } + ], + "model_id": "kzcZ43wBgEQAbjDnhzGF_entity_process_3", + "threshold": 1.2368549346675202, + "feature_imputed": [ + { + "feature_id": "processing_bytes_max", + "imputed": true + }, + { + "feature_id": "processing_bytes_avg", + "imputed": false + }, + { + "feature_id": "processing_bytes_min", + "imputed": false + }, + { + "feature_id": "processing_bytes_sum", + "imputed": false + }, + { + "feature_id": "processing_time_max", + "imputed": false + } + ] +} +``` + If an anomaly detector detects an anomaly, the result has the following format: ```json From b2af679adb041505fbb57ef15867a02fb4ad8af1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Tue, 10 Sep 2024 16:39:30 -0600 Subject: [PATCH 02/36] Doc review Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 84 +++++++++++++---------- _observing-your-data/ad/result-mapping.md | 8 +-- 2 files changed, 52 insertions(+), 40 deletions(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 5dbd23e3ac..6c2e4edc35 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -10,30 +10,36 @@ redirect_from: # Anomaly detection -An anomaly in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you uncover early signs of a system failure. +An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you identify early signs of a system failure. -It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and isn't adaptive to data that exhibits organic growth or seasonal behavior. +It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior. Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected. -To get started, choose **Anomaly Detection** in OpenSearch Dashboards. -To first test with sample streaming data, you can try out one of the preconfigured detectors with one of the sample datasets. +## Using OpenSearch Dashboards anomaly detection + +To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Anomaly Detection**. OpenSearch Dashboards contains sample datasets. You can use these datasets with their preconfigured detectors to try out the feature. + +The following tutorial guides you through using anomaly detection with your OpenSearch data. ## Step 1: Define a detector -A detector is an individual anomaly detection task. You can define multiple detectors, and all the detectors can run simultaneously, with each analyzing data from different sources. +A _detector_ is an individual anomaly detection task. You can define multiple detectors. All the detectors can run simultaneously, with each analyzing data from different sources. 1. Choose **Create detector**. -1. Add in the detector details. - - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the purpose of the detector. -1. Specify the data source. +1. Add the detector details. + - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the detector's purpose. +1. Specify the data source. - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes. - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). -#### Example filter using query DSL -The query is designed to retrieve documents in which the `urlPath.keyword` field matches one of the following specified values: +--- + +#### Example: Filter using query DSL + +The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values: - /domain/{id}/short - /sub_dir/{id}/short @@ -62,9 +68,12 @@ The query is designed to retrieve documents in which the `urlPath.keyword` field } } ``` + {% include copy-curl.html %} -1. Specify a timestamp. - - Select the **Timestamp field** in your index. +--- + +1. Specify a timestamp. + - Select the **Timestamp field** in the index. 1. Define operation settings. - For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data. - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model. @@ -76,7 +85,7 @@ The query is designed to retrieve documents in which the `urlPath.keyword` field - (Optional) To add extra processing time for data collection, specify a **Window delay** value. - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay. - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time. - - To avoid missing any data, set the **Window delay** to the upper bound of the expected ingestion delay. This ensures the detector accounts for all data during its interval, reducing the chances of missing relevant information. While setting a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will always be looking further back in time. Strike a balance to maintain both data accuracy and timely detection. + - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection. 1. Specify custom results index. - The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. To enable this, select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, like `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored. @@ -111,31 +120,32 @@ After you define the detector, the next step is to configure the model. ## Step 2: Configure the model -#### Add features to your detector +1. Add features to your detector. -A feature is the field in your index that you want to check for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. +A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting. -{: .note } +{: .note} To configure an anomaly detection model based on an aggregation method, follow these steps: -1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**. +1. On the **Configure model** page, enter the **Feature name** and select the **Enable feature** checkbox. 1. For **Find anomalies based on**, select **Field Value**. 1. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. 1. For **Field**, select from the available options. To configure an anomaly detection model based on a JSON aggregation query, follow these steps: -1. On the **Configure Model** page, enter the **Feature name** and check **Enable feature**. -1. For **Find anomalies based on**, select **Custom expression**. You will see the JSON editor window open up. + +1. On the **Configure Model** page, enter the **Feature name** and select the **Enable feature** checkbox. +1. For **Find anomalies based on**, select **Custom expression**. The JSON editor window will open. 1. Enter your JSON aggregation query in the editor. For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) -{: .note } +{: .note} -#### (Optional) Set category fields for high cardinality +### (Optional) Set category fields for high cardinality You can categorize anomalies based on a keyword or IP field type. @@ -162,39 +172,41 @@ If the actual total number of unique entities is higher than the number that you This formula serves as a starting point. Make sure to test it with a representative workload. You can find more information in the [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/) blog post. {: .note } -#### (Advanced settings) Set a shingle size +### (Advanced settings) Set a shingle size Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case. -The anomaly detector expects the shingle size to be in the range of 1 and 128. The default shingle size is 8. We recommend that you don't choose 1 unless you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also false positives. Larger values might be useful for ignoring noise in a signal. +The anomaly detector expects the shingle size to be in the range of 1 and 128. The default shingle size is `8`. Choose `1` only if you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also increase false positives. Larger values might be useful for ignoring noise in a signal. -#### (Advanced settings) Set an imputation option +### (Advanced settings) Set an imputation option -The Imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps: +The imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps: -- **Ignore Missing Data (Default):** The system continues without factoring in missing data points, maintaining the existing data flow. +- **Ignore Missing Data (Default):** The system continues without considering missing data points, keeping the existing data flow. - **Fill with Custom Values:** Specify a custom value for each feature to replace missing data points, allowing for targeted imputation tailored to your data. -- **Fill with Zeros:** Replace missing values with zeros, ideal when the absence of data itself indicates a significant event, such as a drop to zero in event counts. -- **Use Previous Values:** Fill gaps with the last observed value, maintaining continuity in your time series data. This method treats missing data as non-anomalous, carrying forward the previous trend. +- **Fill with Zeros:** Replace missing values with zeros. This is ideal when the absence of data indicates a significant event, such as a drop to zero in event counts. +- **Use Previous Values:** Fill gaps with the last observed value to maintain continuity in your time-series data. This method treats missing data as non-anomalous, carrying forward the previous trend. -Using these options can improve recall in anomaly detection. For instance, if you're monitoring for drops in event counts, including both partial and complete drops, filling missing values with zeros helps detect significant data absences, improving detection recall. +Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, filling missing values with zeros helps detect significant data absences, improving detection recall. -Note: Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Remember, quality input is critical—poor data quality will lead to poor model performance. You can determine whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. For more information, see [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping). +Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping) for more information. +{: note} -#### (Advanced settings) Suppressing Anomalies with Threshold-Based Rules +### (Advanced settings) Suppressing anomalies with threshold-based rules You can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations. -Suppose you want to detect substantial changes in log volume while ignoring small variations that aren't meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can filter out minor deviations and hone in on genuinely anomalous patterns. +Suppose you want to detect substantial changes in log volume while ignoring small variations that are not meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can ignore minor deviations and focus on real anomalous patterns. -If you want to suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules: +To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules: ``` Ignore anomalies for feature logVolume when the actual value is no more than 30% above the expected value. Ignore anomalies for feature logVolume when the actual value is no more than 30% below the expected value. ``` -Note: Ensure that a feature (e.g., logVolume) is properly defined in your model, as suppression rules are tied to specific features. +Ensure that a feature, for example, `logVolume`, is properly defined in your model. Suppression rules are tied to specific features. +{: .note} If you expect that the log volume should differ by at least 10,000 from the expected value before being considered an anomaly, you can set absolute thresholds: @@ -203,9 +215,9 @@ Ignore anomalies for feature logVolume when the actual value is no more than 100 Ignore anomalies for feature logVolume when the actual value is no more than 10000 below the expected value. ``` -If no custom suppression rules are set, the system defaults to a filter that ignores anomalies with deviations of less than 20% from the expected value for each enabled feature. +If no custom suppression rules are set, then the system defaults to a filter that ignores anomalies with deviations of less than 20% from the expected value for each enabled feature. -#### Preview sample anomalies +### Preview sample anomalies Preview sample anomalies and adjust the feature settings if needed. For sample previews, the Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results. diff --git a/_observing-your-data/ad/result-mapping.md b/_observing-your-data/ad/result-mapping.md index 0ee8d02a97..24e7711620 100644 --- a/_observing-your-data/ad/result-mapping.md +++ b/_observing-your-data/ad/result-mapping.md @@ -9,9 +9,9 @@ redirect_from: # Anomaly result mapping -If you enabled custom result index, the anomaly detection plugin stores the results in your own index. +If you enabled custom result index, the Anomaly Detection plugin stores the results in your own index. -If the anomaly detector doesn’t detect an anomaly, the result has the following format: +If the anomaly detector does not detect an anomaly, the result has the following format: ```json { @@ -80,9 +80,9 @@ Field | Description `model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity. `threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold. -When the imputation option is enabled, the anomaly result output will include a `feature_imputed` array, indicating whether each feature has been imputed. This information helps you understand which features were modified during the anomaly detection process due to missing data. If no features were imputed, the feature_imputed array will be omitted from the results. +When the imputation option is enabled, the anomaly result output includes a `feature_imputed` array, showing which features have been imputed. This information helps you identify which features were modified during the anomaly detection process due to missing data. If no features were imputed, then the `feature_imputed` array is excluded from the results. -In the following example, the feature processing_bytes_max was imputed, as indicated by the `imputed: true` status: +In this example, the feature `processing_bytes_max` was imputed, as indicated by the `imputed: true` status: ```json { From e2c656eb7899e68e32cc7a0ec3223088306d20c7 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 11:23:45 -0600 Subject: [PATCH 03/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 6c2e4edc35..5e75c9c03f 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -30,7 +30,7 @@ A _detector_ is an individual anomaly detection task. You can define multiple de 1. Choose **Create detector**. 1. Add the detector details. - - Enter a name and brief description. Make sure the name is unique and descriptive enough to help you to identify the detector's purpose. + - Enter a name that describes the detector's intended use. 1. Specify the data source. - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes. - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). From fe79e71fdb69a3370d8fad29b91d3bd728c00ccf Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 13:30:23 -0600 Subject: [PATCH 04/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 5e75c9c03f..78e604e324 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -12,7 +12,7 @@ redirect_from: An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you identify early signs of a system failure. -It can be challenging to discover anomalies using conventional methods such as creating visualizations and dashboards. You could configure an alert based on a static threshold, but this requires prior domain knowledge and is not adaptive to data that exhibits organic growth or seasonal behavior. +Conventional techniques like visualizations and dashboards can make it difficult to uncover anomalies. Configuring alerts based on static thresholds is possible, but this approach requires prior domain knowledge and may not adapt to data with organic growth or seasonal trends. Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). From dcbce5ab520c05472f0902e4e91be3ed5b9a7d7b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 13:35:18 -0600 Subject: [PATCH 05/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 78e604e324..1b51399883 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -176,7 +176,7 @@ This formula serves as a starting point. Make sure to test it with a representat Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case. -The anomaly detector expects the shingle size to be in the range of 1 and 128. The default shingle size is `8`. Choose `1` only if you have two or more features. Smaller values might increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also increase false positives. Larger values might be useful for ignoring noise in a signal. +The anomaly detector requires the shingle size to be between 1 and 128. The default is `8`. Use `1` only if you have at least two features. Values less than `8` may increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall), but also increase false positives. Values greater than `8` may be useful for ignoring noise in a signal. ### (Advanced settings) Set an imputation option From 23bcea3e960db48577e7de01fc8350d84ce394bf Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 14:59:08 -0600 Subject: [PATCH 06/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 1b51399883..ffcdda79fc 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -132,7 +132,7 @@ A multi-feature model correlates anomalies across all its features. The [curse o To configure an anomaly detection model based on an aggregation method, follow these steps: 1. On the **Configure model** page, enter the **Feature name** and select the **Enable feature** checkbox. -1. For **Find anomalies based on**, select **Field Value**. +5. For **Find anomalies based on**, select **Field Value**. 1. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. 1. For **Field**, select from the available options. From 2754b3b695e335be2c9ea135c8164af6be67908d Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 14:59:20 -0600 Subject: [PATCH 07/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index ffcdda79fc..6cd9137d04 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -131,7 +131,10 @@ A multi-feature model correlates anomalies across all its features. The [curse o To configure an anomaly detection model based on an aggregation method, follow these steps: -1. On the **Configure model** page, enter the **Feature name** and select the **Enable feature** checkbox. +1. On the **Detectors** page, select the desired detector and then select the **Actions** button to activate the dropdown menu. +2. From the dropdown menu, select **Edit model configuration**. +3. On the **Edit model configuration** page, select the **Add another feature** button. +4. Enter a name in the **Feature name** and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. 1. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. 1. For **Field**, select from the available options. From 1a5120da561cf4f849746978a20460226ad6beb9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 14:59:41 -0600 Subject: [PATCH 08/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 6cd9137d04..8b389bff37 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -136,7 +136,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. -1. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. +6. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. 1. For **Field**, select from the available options. To configure an anomaly detection model based on a JSON aggregation query, follow these steps: From 6c3326d785e557bc3283b4ff2a8051341cbbb9f0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:00:30 -0600 Subject: [PATCH 09/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 8b389bff37..fdce70e9b4 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -137,7 +137,8 @@ To configure an anomaly detection model based on an aggregation method, follow t 4. Enter a name in the **Feature name** and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. 6. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. -1. For **Field**, select from the available options. +7. For **Field**, select from the available options. +8. Select the **Save changes** button. To configure an anomaly detection model based on a JSON aggregation query, follow these steps: From 614b660edcbaaa9b6d5306cf1f05be3b60fe12a3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:03:11 -0600 Subject: [PATCH 10/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index fdce70e9b4..f3f68554d3 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -136,7 +136,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. -6. For **aggregation method**, select either **average()**, **count()**, **sum()**, **min()**, or **max()**. +6. For **Aggregation method**, select the desired method. 7. For **Field**, select from the available options. 8. Select the **Save changes** button. From 3ab815fbaee46d8e75f9514f434d530a70d1e272 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:03:48 -0600 Subject: [PATCH 11/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index f3f68554d3..845aece955 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -136,7 +136,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. -6. For **Aggregation method**, select the desired method. +6. For **Aggregation method**, select the desired aggregation from the dropdown menu. 7. For **Field**, select from the available options. 8. Select the **Save changes** button. From 596adfa77ee59bb9c6d7afdd9d9dbbbf1dcc6aad Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:04:11 -0600 Subject: [PATCH 12/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 845aece955..40e72c8f81 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -134,7 +134,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 1. On the **Detectors** page, select the desired detector and then select the **Actions** button to activate the dropdown menu. 2. From the dropdown menu, select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. -4. Enter a name in the **Feature name** and select the **Enable feature** checkbox. +4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. 6. For **Aggregation method**, select the desired aggregation from the dropdown menu. 7. For **Field**, select from the available options. From bee1f4ce1a61de9aeef93cbf1ef54eb72253a598 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:05:49 -0600 Subject: [PATCH 13/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 40e72c8f81..89b812f949 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -131,7 +131,8 @@ A multi-feature model correlates anomalies across all its features. The [curse o To configure an anomaly detection model based on an aggregation method, follow these steps: -1. On the **Detectors** page, select the desired detector and then select the **Actions** button to activate the dropdown menu. +1. On the **Detectors** page, select the desired detector from the listed options. +2. One the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. 2. From the dropdown menu, select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. From f8ee3d96f84f1409005076272f31a4c25ccc9d9e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:06:11 -0600 Subject: [PATCH 14/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 1 - 1 file changed, 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 89b812f949..07a85e60a3 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -133,7 +133,6 @@ To configure an anomaly detection model based on an aggregation method, follow t 1. On the **Detectors** page, select the desired detector from the listed options. 2. One the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. -2. From the dropdown menu, select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. For **Find anomalies based on**, select **Field Value**. From 28a6b7753392818e2f5210c4d7cc049161be60d6 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:08:21 -0600 Subject: [PATCH 15/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 07a85e60a3..8da734e4ae 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -135,7 +135,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 2. One the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. -5. For **Find anomalies based on**, select **Field Value**. +5. Select **Field value** from the dropdown menu under **Find anomalies based on**. 6. For **Aggregation method**, select the desired aggregation from the dropdown menu. 7. For **Field**, select from the available options. 8. Select the **Save changes** button. From c318ece521d4b55a35d042302f67ec01511557a4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:08:58 -0600 Subject: [PATCH 16/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 8da734e4ae..bbc343d5b5 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -136,7 +136,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. Select **Field value** from the dropdown menu under **Find anomalies based on**. -6. For **Aggregation method**, select the desired aggregation from the dropdown menu. +6. Select the desired aggregation from the dropdown menu under **Aggregation method**. 7. For **Field**, select from the available options. 8. Select the **Save changes** button. From 4189083638a7ae759bce19cf630a22f5e8d6436f Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:10:11 -0600 Subject: [PATCH 17/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index bbc343d5b5..13718f0d26 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -137,7 +137,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. Select **Field value** from the dropdown menu under **Find anomalies based on**. 6. Select the desired aggregation from the dropdown menu under **Aggregation method**. -7. For **Field**, select from the available options. +7. Select the desired field from the available options in the dropdown menu under **Field**. 8. Select the **Save changes** button. To configure an anomaly detection model based on a JSON aggregation query, follow these steps: From 199fbc39181906b2012c9d8638ba0ba8da739e81 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:10:35 -0600 Subject: [PATCH 18/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 13718f0d26..f0836be88a 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -137,7 +137,7 @@ To configure an anomaly detection model based on an aggregation method, follow t 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. Select **Field value** from the dropdown menu under **Find anomalies based on**. 6. Select the desired aggregation from the dropdown menu under **Aggregation method**. -7. Select the desired field from the available options in the dropdown menu under **Field**. +7. Select the desired field from the available options listed in the dropdown menu under **Field**. 8. Select the **Save changes** button. To configure an anomaly detection model based on a JSON aggregation query, follow these steps: From 595b45a528a7e477a10f4ed9a31be6f3c7d39544 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:12:06 -0600 Subject: [PATCH 19/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index f0836be88a..0e01287741 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -142,7 +142,8 @@ To configure an anomaly detection model based on an aggregation method, follow t To configure an anomaly detection model based on a JSON aggregation query, follow these steps: -1. On the **Configure Model** page, enter the **Feature name** and select the **Enable feature** checkbox. +1. On the **Edit model configuration** page, select the **Add another feature** button. +2. Enter a name in the **Feature name** field and select the **Enable feature** checkbox.. 1. For **Find anomalies based on**, select **Custom expression**. The JSON editor window will open. 1. Enter your JSON aggregation query in the editor. From 66c48c4c63383bba120e4ac0f29678a3ba06bf0b Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:12:27 -0600 Subject: [PATCH 20/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 0e01287741..810566578c 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -143,7 +143,7 @@ To configure an anomaly detection model based on an aggregation method, follow t To configure an anomaly detection model based on a JSON aggregation query, follow these steps: 1. On the **Edit model configuration** page, select the **Add another feature** button. -2. Enter a name in the **Feature name** field and select the **Enable feature** checkbox.. +2. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 1. For **Find anomalies based on**, select **Custom expression**. The JSON editor window will open. 1. Enter your JSON aggregation query in the editor. From d6913fb5d52bf7dc6e5acce91a97251754ae8f3e Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:29:11 -0600 Subject: [PATCH 21/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 810566578c..807c73d4a5 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -144,7 +144,7 @@ To configure an anomaly detection model based on a JSON aggregation query, follo 1. On the **Edit model configuration** page, select the **Add another feature** button. 2. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. -1. For **Find anomalies based on**, select **Custom expression**. The JSON editor window will open. +3. Select **Custom expression** from the dropdown menu under **Find anomalies based on**. The JSON editor window will open. 1. Enter your JSON aggregation query in the editor. For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) From 45443b9c19b4e6fd36de9e3e79a5df1778e2efc9 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:34:04 -0600 Subject: [PATCH 22/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 807c73d4a5..9675961cf1 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -183,6 +183,8 @@ Set the number of aggregation intervals from your data stream to consider in a d The anomaly detector requires the shingle size to be between 1 and 128. The default is `8`. Use `1` only if you have at least two features. Values less than `8` may increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall), but also increase false positives. Values greater than `8` may be useful for ignoring noise in a signal. +To set the shingle size, select **Show** on the **Advanced settings** pane. Enter the desired size in the **intervals** field. + ### (Advanced settings) Set an imputation option The imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps: From cfc37097ca6656a1fb8007a6f2a9e9ef3b8e0340 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Wed, 11 Sep 2024 15:42:07 -0600 Subject: [PATCH 23/36] Update _observing-your-data/ad/result-mapping.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/result-mapping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/result-mapping.md b/_observing-your-data/ad/result-mapping.md index 24e7711620..bf28b568a5 100644 --- a/_observing-your-data/ad/result-mapping.md +++ b/_observing-your-data/ad/result-mapping.md @@ -9,7 +9,7 @@ redirect_from: # Anomaly result mapping -If you enabled custom result index, the Anomaly Detection plugin stores the results in your own index. +If **Custom result index** is enabled, the Anomaly Detection plugin stores the results in your own index. If the anomaly detector does not detect an anomaly, the result has the following format: From 8a3b25d7426167ab846a3eb8a0336bb17daf3f1a Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 12 Sep 2024 14:56:00 -0600 Subject: [PATCH 24/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 9675961cf1..1669ef5228 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -26,7 +26,7 @@ The following tutorial guides you through using anomaly detection with your Open ## Step 1: Define a detector -A _detector_ is an individual anomaly detection task. You can define multiple detectors. All the detectors can run simultaneously, with each analyzing data from different sources. +A _detector_ is an individual anomaly detection task. You can define multiple detectors, and all detectors can run simultaneously, with each analyzing data from different sources. 1. Choose **Create detector**. 1. Add the detector details. From bc9488ad67009ca61235412a95923c50559009a0 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 12 Sep 2024 17:23:45 -0600 Subject: [PATCH 25/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 1669ef5228..529b7a3bcd 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -132,7 +132,7 @@ A multi-feature model correlates anomalies across all its features. The [curse o To configure an anomaly detection model based on an aggregation method, follow these steps: 1. On the **Detectors** page, select the desired detector from the listed options. -2. One the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. +2. On the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. Select **Field value** from the dropdown menu under **Find anomalies based on**. From 50eff8ba29bc83ac650a70143111ca2a9a7ccb92 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 12 Sep 2024 17:25:56 -0600 Subject: [PATCH 26/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 529b7a3bcd..e35bc8cd37 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -129,6 +129,8 @@ For example, if you choose `min()`, the detector focuses on finding anomalies ba A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting. {: .note} +### Configuring a model based on an aggregation method + To configure an anomaly detection model based on an aggregation method, follow these steps: 1. On the **Detectors** page, select the desired detector from the listed options. From 2c2e06ca213dda127e539a60a9612a05611237a1 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 12 Sep 2024 17:26:40 -0600 Subject: [PATCH 27/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index e35bc8cd37..368831c2a7 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -142,6 +142,8 @@ To configure an anomaly detection model based on an aggregation method, follow t 7. Select the desired field from the available options listed in the dropdown menu under **Field**. 8. Select the **Save changes** button. +### Configuring a model based on a JSON aggregation query + To configure an anomaly detection model based on a JSON aggregation query, follow these steps: 1. On the **Edit model configuration** page, select the **Add another feature** button. From 894efeeca6859fd0036f454a677861638a6f60f3 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Thu, 12 Sep 2024 21:41:51 -0600 Subject: [PATCH 28/36] Update index.md Copy edit documentation Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 170 +++++++++++++------------------ 1 file changed, 72 insertions(+), 98 deletions(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 368831c2a7..0b1ea5ce47 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -10,34 +10,29 @@ redirect_from: # Anomaly detection -An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric might help you identify early signs of a system failure. +An _anomaly_ in OpenSearch is any unusual behavior change in your time-series data. Anomalies can provide valuable insights into your data. For example, for IT infrastructure data, an anomaly in the memory usage metric can help identify early signs of a system failure. Conventional techniques like visualizations and dashboards can make it difficult to uncover anomalies. Configuring alerts based on static thresholds is possible, but this approach requires prior domain knowledge and may not adapt to data with organic growth or seasonal trends. -Anomaly detection automatically detects anomalies in your OpenSearch data in near real-time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an `anomaly grade` and `confidence score` value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). +Anomaly detection automatically detects anomalies in your OpenSearch data in near real time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an _anomaly grade_ and _confidence score_ value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected. +{: .note} -## Using OpenSearch Dashboards anomaly detection - -To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Anomaly Detection**. OpenSearch Dashboards contains sample datasets. You can use these datasets with their preconfigured detectors to try out the feature. +## Getting started with anomaly detection in OpenSearch Dashboards -The following tutorial guides you through using anomaly detection with your OpenSearch data. +To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Anomaly Detection**. ## Step 1: Define a detector A _detector_ is an individual anomaly detection task. You can define multiple detectors, and all detectors can run simultaneously, with each analyzing data from different sources. -1. Choose **Create detector**. -1. Add the detector details. - - Enter a name that describes the detector's intended use. -1. Specify the data source. - - For **Data source**, choose the index you want to use as the data source. You can optionally use index patterns to choose multiple indexes. - - (Optional) For **Data filter**, filter the index you chose as the data source. From the **Data filter** menu, choose **Add data filter**, and then design your filter query by selecting **Field**, **Operator**, and **Value**, or choose **Use query DSL** and add your own JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). - ---- +1. On the **Anomaly detection** page, select the **Create detector** button. +2. On the **Define detector** page, enter the required information on the **Detector details** pane. +3. On the **Select data** pane, specify the data source by choosing a source from the **Index** dropdown menu. You can choose an index, index patterns, or alias. +4. (Optional) Filter the data source by selecting **Add data filter** and then entering the conditions for **Field**, **Operator**, and **Value**. Alternatively, you can choose **Use query DSL** and add your JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). -#### Example: Filter using query DSL +#### Example: Filtering data using query DSL The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values: @@ -70,43 +65,36 @@ The following example query retrieves documents where the `urlPath.keyword` fiel ``` {% include copy-curl.html %} ---- - -1. Specify a timestamp. - - Select the **Timestamp field** in the index. -1. Define operation settings. - - For **Operation settings**, define the **Detector interval**, which is the time interval at which the detector collects data. - - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model. - The shorter you set this interval, the fewer data points the detector aggregates. - The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals. - - - We recommend setting the detector interval based on your actual data. If it's too long it might delay the results, and if it's too short it might miss some data. It also won't have a sufficient number of consecutive data points for the shingle process. +5. On the **Timestamp** pane, select a field from the **Timestamp field** dropdown menu. - - (Optional) To add extra processing time for data collection, specify a **Window delay** value. +6. On the **Operation settings** pane, define the **Detector interval**, which is the time interval at which the detector collects data. + - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model. The shorter you set this interval, the fewer data points the detector aggregates. The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals. + - You should set the detector interval based on your actual data. If the detector interval is too long, then it might delay the results. If the detector interval is too short, then it might miss some data. The detector interval also will not have a sufficient number of consecutive data points for the shingle process. + - (Optional) To add extra processing time for data collection, specify a **Window delay** value. - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay. - - For example, say the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time. + - For example, the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time. - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection. -1. Specify custom results index. - - The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. To enable this, select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, like `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored. +7. Specify custom results index. + - The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. Select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, such as `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored. - You can use the dash “-” sign to separate the namespace to manage custom results index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the results index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the "financial" department at a granular level for the "us" area. + You can use `-` to separate the namespace to manage custom results index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the results index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the `financial` department at a granular level for the `us` group. {: .note } - When the Security plugin (fine-grained access control) is enabled, the default results index becomes a system index and is no longer accessible through the standard Index or Search APIs. To access its content, you must use the Anomaly Detection RESTful API or the dashboard. As a result, you cannot build customized dashboards using the default results index if the Security plugin is enabled. However, you can create a custom results index in order to build customized dashboards. - If the custom index you specify does not exist, the Anomaly Detection plugin will create it when you create the detector and start your real-time or historical analysis. - If the custom index already exists, the plugin will verify that the index mapping matches the required structure for anomaly results. In this case, ensure that the custom index has a valid mapping as defined in the [`anomaly-results.json`](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json) file. - - To use the custom results index option, you need the following permissions: - - `indices:admin/create` - The Anomaly Detection plugin requires the ability to create and roll over the custom index. - - `indices:admin/aliases` - The Anomaly Detection plugin requires access to create and manage an alias for the custom index. - - `indices:data/write/index` - You need the `write` permission for the Anomaly Detection plugin to write results into the custom index for a single-entity detector. - - `indices:data/read/search` - You need the `search` permission because the Anomaly Detection plugin needs to search custom results indexes to show results on the Anomaly Detection UI. - - `indices:data/write/delete` - Because the detector might generate a large number of anomaly results, you need the `delete` permission to delete old data and save disk space. - - `indices:data/write/bulk*` - You need the `bulk*` permission because the Anomaly Detection plugin uses the bulk API to write results into the custom index. - - Managing the custom results index: - - The anomaly detection dashboard queries all detectors’ results from all custom results indexes. Having too many custom results indexes might impact the performance of the Anomaly Detection plugin. - - You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to rollover old results indexes. You can also manually delete or archive any old results indexes. We recommend reusing a custom results index for multiple detectors. - - The Anomaly Detection plugin also provides lifecycle management for custom indexes. It rolls an alias over to a new index when the custom results index meets any of the conditions in the following table. + - To use the custom results index option, you must have the following permissions: + - `indices:admin/create` - The `create` permission is required in order to create and roll over the custom index. + - `indices:admin/aliases` - The `aliases` permission is required in order to create and manage an alias for the custom index. + - `indices:data/write/index` - The `write` permission is required in order to write results into the custom index for a single-entity detector. + - `indices:data/read/search` - The `search` permission is required in order to search custom results indexes to show results on the Anomaly Detection interface. + - `indices:data/write/delete` - The detector may generate many anomaly results. The `delete` permission is required to delete old data and save disk space. + - `indices:data/write/bulk*` - The `bulk*` permission is required because the plugin uses the bulk API to write results into the custom index. + - When managing the custom results index, consider the following: + - The anomaly detection dashboard queries all detector results from all custom results indexes. Having too many custom results indexes can impact the plugin's performance. + - You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to roll over old results indexes. You can also manually delete or archive any old results indexes. Reusing a custom results index for multiple detectors is recommended. + - The plugin provides lifecycle management for custom indexes. It rolls over an alias to a new index when the custom results index meets any of the conditions in the following table. Parameter | Description | Type | Unit | Example | Required :--- | :--- |:--- |:--- |:--- |:--- @@ -114,7 +102,7 @@ The following example query retrieves documents where the `urlPath.keyword` fiel `result_index_min_age` | The minimum index age required for rollover, calculated from its creation time to the current time. | `integer` |`day` | `7` | No `result_index_ttl` | The minimum age required to permanently delete rolled-over indexes. | `integer` | `day` | `60` | No -1. Choose **Next**. +8. Choose **Next**. After you define the detector, the next step is to configure the model. @@ -126,7 +114,7 @@ A _feature_ is the field in your index that you want to analyze for anomalies. A For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. -A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features might negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data might further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is 5. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting. +A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features can negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data can further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is `5`. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting. {: .note} ### Configuring a model based on an aggregation method @@ -149,18 +137,17 @@ To configure an anomaly detection model based on a JSON aggregation query, follo 1. On the **Edit model configuration** page, select the **Add another feature** button. 2. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 3. Select **Custom expression** from the dropdown menu under **Find anomalies based on**. The JSON editor window will open. -1. Enter your JSON aggregation query in the editor. +4. Enter your JSON aggregation query in the editor. +5. Select the **Save changes** button. For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) {: .note} -### (Optional) Set category fields for high cardinality +### Setting category fields for high cardinality -You can categorize anomalies based on a keyword or IP field type. +You can categorize anomalies based on a keyword or IP field type. The category field categorizes or slices the source time series with a dimension, such as IP addresses, product IDs, and country codes. This gives you a granular view of anomalies within each entity of the category field to help isolate and debug issues. -The category field categorizes or slices the source time series with a dimension like IP addresses, product IDs, country codes, and so on. This helps to see a granular view of anomalies within each entity of the category field to isolate and debug issues. - -To set a category field, choose **Enable a category field** and select a field. You can’t change the category fields after you create the detector. +To set a category field, choose **Enable a category field** and select a field. You cannot change the category fields after you create the detector. Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster: @@ -168,44 +155,42 @@ Only a certain number of unique entities are supported in the category field. Us (data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector) ``` -To get the entity model size of a detector, use the [profile detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting. +To get the detector's entity model size, use the [Profile Detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting. -Consider a cluster with 3 data nodes, each with 8 GB of JVM heap size and the default 10% memory allocation. With an entity model size of 1 MB, the following formula calculates the estimated number of unique entities: +Consider a cluster with three data nodes, each with 8 GB of JVM heap size and the default 10% memory allocation. With an entity model size of 1 MB, the following formula calculates the estimated number of unique entities: ``` (8096 MB * 0.1 / 1 MB ) * 3 = 2429 ``` -If the actual total number of unique entities is higher than the number that you calculate (in this case, 2,429), the anomaly detector will attempt to model the extra entities. The detector prioritizes entities that occur more often and are more recent. +If the actual total number of unique entities is higher than the number that you calculate (in this case, 2,429), the anomaly detector attempts to model the extra entities. The detector prioritizes entities that occur more often and are more recent. -This formula serves as a starting point. Make sure to test it with a representative workload. You can find more information in the [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/) blog post. +This formula serves as a starting point. Make sure to test it with a representative workload. See the OpenSearch blog [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/). {: .note } -### (Advanced settings) Set a shingle size - -Set the number of aggregation intervals from your data stream to consider in a detection window. It’s best to choose this value based on your actual data to see which one leads to the best results for your use case. +### Setting a shingle size -The anomaly detector requires the shingle size to be between 1 and 128. The default is `8`. Use `1` only if you have at least two features. Values less than `8` may increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall), but also increase false positives. Values greater than `8` may be useful for ignoring noise in a signal. +On the **Advanced settings** pane, you can set the number of aggregation intervals from your data stream to include in the detection window. Choose this value based on your actual data to find the optimal setting for your use case. To set the shingle size, select **Show** on the **Advanced settings** pane. Enter the desired size in the **intervals** field. -To set the shingle size, select **Show** on the **Advanced settings** pane. Enter the desired size in the **intervals** field. +The anomaly detector requires the shingle size to be between 1 and 128. The default is `8`. Use `1` only if you have at least two features. Values less than `8` may increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also may increase false positives. Values greater than `8` may be useful for ignoring noise in a signal. -### (Advanced settings) Set an imputation option +### Setting an imputation option -The imputation option allows you to address missing data in your streams. You can choose from the following methods to handle gaps: +On the **Advanced settings** pane, you can set the imputation option. This allows you to handle missing data in your streams. The options include the following: - **Ignore Missing Data (Default):** The system continues without considering missing data points, keeping the existing data flow. - **Fill with Custom Values:** Specify a custom value for each feature to replace missing data points, allowing for targeted imputation tailored to your data. - **Fill with Zeros:** Replace missing values with zeros. This is ideal when the absence of data indicates a significant event, such as a drop to zero in event counts. - **Use Previous Values:** Fill gaps with the last observed value to maintain continuity in your time-series data. This method treats missing data as non-anomalous, carrying forward the previous trend. -Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, filling missing values with zeros helps detect significant data absences, improving detection recall. +Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, then filling missing values with zeros helps detect significant data absences, improving detection recall. Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping) for more information. {: note} -### (Advanced settings) Suppressing anomalies with threshold-based rules +### Suppressing anomalies with threshold-based rules -You can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations. +On the **Advanced settings** pane, you can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations. Suppose you want to detect substantial changes in log volume while ignoring small variations that are not meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can ignore minor deviations and focus on real anomalous patterns. @@ -228,70 +213,59 @@ Ignore anomalies for feature logVolume when the actual value is no more than 100 If no custom suppression rules are set, then the system defaults to a filter that ignores anomalies with deviations of less than 20% from the expected value for each enabled feature. -### Preview sample anomalies +### Previewing sample anomalies -Preview sample anomalies and adjust the feature settings if needed. -For sample previews, the Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. It loads this sample dataset into the detector. The detector uses this sample dataset to generate a sample preview of anomaly results. - -Examine the sample preview and use it to fine-tune your feature configurations (for example, enable or disable features) to get more accurate results. +You can preview anomalies based on a sample feature input and adjust the feature settings as needed. The Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. The sample dataset is loaded into the detector that then uses the sample dataset to generate a preview of the anomalies. 1. Choose **Preview sample anomalies**. - - If you don't see any sample anomaly result, check the detector interval and make sure you have more than 400 data points for some entities during the preview date range. -1. Choose **Next**. - -## Step 3: Set up detector jobs + - If sample anomaly results are not displayed, then check the detector interval to verify that 400 or more data points are set for the entities during the preview date range. +2. Select the **Next** button. -To start a real-time detector to find anomalies in your data in near real-time, check **Start real-time detector automatically (recommended)**. +## Step 3: Setting up detector jobs -Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), check **Run historical analysis detection** and select a date range (at least 128 detection intervals). +To start a real-time detector to find anomalies in your data in near real time, check **Start real-time detector automatically (recommended)**. -Analyzing historical data helps you get familiar with the Anomaly Detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it. +Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), select the **Run historical analysis detection** box and select a date range (at least 128 detection intervals). -We recommend experimenting with historical analysis with different feature sets and checking the precision before moving on to real-time detectors. +Analyzing historical data familiarizes you with the Anomaly Detection plugin. For example, you can evaluate the performance of a detector with historical data to fine-tune it. -## Step 4: Review and create +You can experiment with historical analysis using different feature sets and checking the precision before moving on to real-time detectors. -Review your detector settings and model configurations to make sure that they're valid and then select **Create detector**. +## Step 4: Reviewing detector settings -![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/review_ad.png) +Review your detector settings and model configurations to confirm they are valid and then select **Create detector**. -If you see any validation errors, edit the settings to fix the errors and then return back to this page. +If validation errors occur, then edit the settings to fix the error and return to the detector page. {: .note } -## Step 5: Observe the results +## Step 5: Observing the results -Choose the **Real-time results** or **Historical analysis** tab. For real-time results, you need to wait for some time to see the anomaly results. If the detector interval is 10 minutes, the detector might take more than an hour to start, because its waiting for sufficient data to generate anomalies. +Choose the **Real-time results** or **Historical analysis** tab. For real-time results, it will take time to display the anomaly results. For example, if the detector interval is 10 minutes, then the detector may take an hour to start because it is waiting for sufficient data to generate anomalies. -A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner. -Use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to make sure you have sufficient data points. +A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner. You can use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to ensure you have enough data points. -If you see the detector pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check for any missing data points. If you find a lot of missing data points from the aggregated data, consider increasing the detector interval. +If the detector is pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check for any missing data points. If you find many missing data points from the aggregated data, consider increasing the detector interval. -Choose and drag over the anomaly line chart to zoom in and see a more detailed view of an anomaly. +Choose and drag over the anomaly line chart to zoom in and see a detailed view of an anomaly. {: .note } -Analyze anomalies with the following visualizations: +You can analyze anomalies with the following visualizations: -- **Live anomalies** (for real-time results) displays live anomaly results for the last 60 intervals. For example, if the interval is 10, it shows results for the last 600 minutes. The chart refreshes every 30 seconds. -- **Anomaly overview** (for real-time results) / **Anomaly history** (for historical analysis in the **Historical analysis** tab) plots the anomaly grade with the corresponding measure of confidence. This pane includes: +- **Live anomalies** (for real-time results) displays live anomaly results for the last 60 intervals. For example, if the interval is `10`, it shows results for the last 600 minutes. The chart refreshes every 30 seconds. +- **Anomaly overview** (for real-time results) or **Anomaly history** (for historical analysis in the **Historical analysis** tab) plots the anomaly grade with the corresponding measure of confidence. The pane includes: - The number of anomaly occurrences based on the given data-time range. - - The **Average anomaly grade**, a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of 0 represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly. + - The **Average anomaly grade**, a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of `0` represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly. - **Confidence** estimate of the probability that the reported anomaly grade matches the expected anomaly grade. Confidence increases as the model observes more data and learns the data behavior and trends. Note that confidence is distinct from model accuracy. - **Last anomaly occurrence** is the time at which the last anomaly occurred. -Underneath **Anomaly overview**/**Anomaly history** are: +Underneath **Anomaly overview** or **Anomaly history** are: - **Feature breakdown** plots the features based on the aggregation method. You can vary the date-time range of the detector. Selecting a point on the feature line chart shows the **Feature output**, the number of times a field appears in your index, and the **Expected value**, a predicted value for the feature output. Where there is no anomaly, the output and expected values are equal. - ![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/feature-contribution-ad.png) - - **Anomaly occurrences** shows the `Start time`, `End time`, `Data confidence`, and `Anomaly grade` for each detected anomaly. Selecting a point on the anomaly line chart shows **Feature Contribution**, the percentage of a feature that contributes to the anomaly -![Anomaly detection results]({{site.url}}{{site.baseurl}}/images/feature-contribution-ad.png) - - If you set the category field, you see an additional **Heat map** chart. The heat map correlates results for anomalous entities. This chart is empty until you select an anomalous entity. You also see the anomaly and feature line chart for the time period of the anomaly (`anomaly_grade` > 0). @@ -311,7 +285,7 @@ To see all the configuration settings for a detector, choose the **Detector conf 1. To make any changes to the detector configuration, or fine tune the time interval to minimize any false positives, go to the **Detector configuration** section and choose **Edit**. - You need to stop real-time and historical analysis to change its configuration. Confirm that you want to stop the detector and proceed. -1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**. +2. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**. ## Step 8: Manage your detectors From 5738739bdf1c628ec393c25c3bd994edbd3b2553 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 07:56:46 -0600 Subject: [PATCH 29/36] Update result-mapping.md Doc review complete Signed-off-by: Melissa Vagi --- _observing-your-data/ad/result-mapping.md | 26 +++++++++++------------ 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/_observing-your-data/ad/result-mapping.md b/_observing-your-data/ad/result-mapping.md index bf28b568a5..67a8c3b07d 100644 --- a/_observing-your-data/ad/result-mapping.md +++ b/_observing-your-data/ad/result-mapping.md @@ -9,9 +9,7 @@ redirect_from: # Anomaly result mapping -If **Custom result index** is enabled, the Anomaly Detection plugin stores the results in your own index. - -If the anomaly detector does not detect an anomaly, the result has the following format: +When you select the **Enable custom result index** box on the **Custom result index** pane, the Anomaly Detection plugin will store the results in your own index. When the anomaly detector does not detect an anomaly, the result format is as follows: ```json { @@ -61,6 +59,7 @@ If the anomaly detector does not detect an anomaly, the result has the following "threshold": 1.2368549346675202 } ``` +{% include copy-curl.html %} ## Response body fields @@ -80,9 +79,9 @@ Field | Description `model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity. `threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold. -When the imputation option is enabled, the anomaly result output includes a `feature_imputed` array, showing which features have been imputed. This information helps you identify which features were modified during the anomaly detection process due to missing data. If no features were imputed, then the `feature_imputed` array is excluded from the results. +When the imputation option is enabled, the anomaly result includes a `feature_imputed` array, showing which features were modified due to missing data. If no features were imputed, then this is excluded. -In this example, the feature `processing_bytes_max` was imputed, as indicated by the `imputed: true` status: +In the following example anomaly result output, the `processing_bytes_max` feature was imputed, as shown by the `imputed: true` status: ```json { @@ -154,8 +153,9 @@ In this example, the feature `processing_bytes_max` was imputed, as indicated by ] } ``` +{% include copy-curl.html %} -If an anomaly detector detects an anomaly, the result has the following format: +When an anomaly is detected, the result has the following format: ```json { @@ -254,24 +254,23 @@ If an anomaly detector detects an anomaly, the result has the following format: "execution_start_time": 1635898427803 } ``` +{% include copy-curl.html %} -You can see the following additional fields: +Note that the result includes the following additional field: Field | Description :--- | :--- `relevant_attribution` | Represents the contribution of each input variable. The sum of the attributions is normalized to 1. `expected_values` | The expected value for each feature. -At times, the detector might detect an anomaly late. -Let's say the detector sees a random mix of the triples {1, 2, 3} and {2, 4, 5} that correspond to `slow weeks` and `busy weeks`, respectively. For example 1, 2, 3, 1, 2, 3, 2, 4, 5, 1, 2, 3, 2, 4, 5, ... and so on. -If the detector comes across a pattern {2, 2, X} and it's yet to see X, the detector infers that the pattern is anomalous, but it can't determine at this point which of the 2's is the cause. If X = 3, then the detector knows it's the first 2 in that unfinished triple, and if X = 5, then it's the second 2. If it's the first 2, then the detector detects the anomaly late. +The detector may detect an anomaly late. For example, the detector observes a sequence of data that alternates between "slow weeks" (represented by the triples {1, 2, 3}) and "busy weeks" (represented by the triples {2, 4, 5}). If the detector comes across a pattern {2, 2, X}, where it has not yet seen the value that X will take, the detector infers that the pattern is anomalous. However, it cannot determine which of the 2's is the cause. If X = 3, then the first 2 is the anomaly. If X = 5, then the second 2 is the anomaly. If it is the first 2, then the detector would detect the anomaly late. -If a detector detects an anomaly late, the result has the following additional fields: +When a detector detects an anomaly late, the result includes the following additional fields: Field | Description :--- | :--- -`past_values` | The actual input that triggered an anomaly. If `past_values` is null, the attributions or expected values are from the current input. If `past_values` is not null, the attributions or expected values are from a past input (for example, the previous two steps of the data [1,2,3]). -`approx_anomaly_start_time` | The approximate time of the actual input that triggers an anomaly. This field helps you understand when a detector flags an anomaly. Both single-stream and high-cardinality detectors don't query previous anomaly results because these queries are expensive operations. The cost is especially high for high-cardinality detectors that might have a lot of entities. If the data is not continuous, the accuracy of this field is low and the actual time that the detector detects an anomaly can be earlier. +`past_values` | The actual input that triggered an anomaly. If `past_values` is null, then the attributions or expected values are from the current input. If `past_values` is not null, then the attributions or expected values are from a past input (for example, the previous two steps of the data [1,2,3]). +`approx_anomaly_start_time` | The approximate time of the actual input that triggers an anomaly. This field helps you understand when a detector flags an anomaly. Both single-stream and high-cardinality detectors do not query previous anomaly results because these queries are costly operations. The cost is especially high for high-cardinality detectors that may have many entities. If the data is not continuous, then the accuracy of this field is low and the actual time that the detector detects an anomaly can be earlier. ```json { @@ -394,3 +393,4 @@ Field | Description "approx_anomaly_start_time": 1635883620000 } ``` +{% include copy-curl.html %} From 14dc45481ce2ad698b49b0d3cacec03c3c7f8f31 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 08:04:15 -0600 Subject: [PATCH 30/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 0b1ea5ce47..79eadb240f 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -122,7 +122,7 @@ A multi-feature model correlates anomalies across all its features. The [curse o To configure an anomaly detection model based on an aggregation method, follow these steps: 1. On the **Detectors** page, select the desired detector from the listed options. -2. On the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. +2. On the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. Select **Field value** from the dropdown menu under **Find anomalies based on**. From 4ad9e020256807c09ab8bde4768ff806a64a8514 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 08:05:29 -0600 Subject: [PATCH 31/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 79eadb240f..44dda9fa91 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -185,7 +185,7 @@ On the **Advanced settings** pane, you can set the imputation option. This allow Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, then filling missing values with zeros helps detect significant data absences, improving detection recall. -Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping) for more information. +Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping/) for more information. {: note} ### Suppressing anomalies with threshold-based rules From 9afca30323401f403fd19c38c894fd002995beb5 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 08:20:18 -0600 Subject: [PATCH 32/36] Fix links Signed-off-by: Melissa Vagi --- _observing-your-data/ad/dashboards-anomaly-detection.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_observing-your-data/ad/dashboards-anomaly-detection.md b/_observing-your-data/ad/dashboards-anomaly-detection.md index 679237094a..ad6fa5950b 100644 --- a/_observing-your-data/ad/dashboards-anomaly-detection.md +++ b/_observing-your-data/ad/dashboards-anomaly-detection.md @@ -18,12 +18,12 @@ You can connect data visualizations to OpenSearch datasets and then create, run, Before getting started, you must have: - Installed OpenSearch and OpenSearch Dashboards version 2.9 or later. See [Installing OpenSearch]({{site.url}}{{site.baseurl}}/install-and-configure/install-opensearch/index/). -- Installed the Anomaly Detection plugin version 2.9 or later. See [Installing OpenSearch plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins). +- Installed the Anomaly Detection plugin version 2.9 or later. See [Installing OpenSearch plugins]/({{site.url}}{{site.baseurl}}/install-and-configure/plugins/). - Installed the Anomaly Detection Dashboards plugin version 2.9 or later. See [Managing OpenSearch Dashboards plugins]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/plugins/) to get started. ## General requirements for anomaly detection visualizations -Anomaly detection visualizations are displayed as time-series charts that give you a snapshot of when anomalies have occurred from different anomaly detectors you have configured for the visualization. You can display up to 10 metrics on your chart, and each series can be shown as a line on the chart. Note that only real-time anomalies will be visible on the chart. For more information on real-time and historical anomaly detection, see [Anomaly detection, Step 3: Set up detector jobs]({{site.url}}{{site.baseurl}}/observing-your-data/ad/index/#step-3-set-up-detector-jobs). +Anomaly detection visualizations are displayed as time-series charts that give you a snapshot of when anomalies have occurred from different anomaly detectors you have configured for the visualization. You can display up to 10 metrics on your chart, and each series can be shown as a line on the chart. Note that only real-time anomalies will be visible on the chart. For more information about real-time and historical anomaly detection, see [Anomaly detection, Step 3: Set up detector jobs]({{site.url}}{{site.baseurl}}/observing-your-data/ad/index/#step-3-setting-up-detector-jobs). Keep in mind the following requirements when setting up or creating anomaly detection visualizations. The visualization: From 0067b5d44ecd4efa0a7c636e504305fc79182609 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 08:41:32 -0600 Subject: [PATCH 33/36] Fix links Signed-off-by: Melissa Vagi --- _observing-your-data/ad/result-mapping.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/result-mapping.md b/_observing-your-data/ad/result-mapping.md index 67a8c3b07d..4e8dc41494 100644 --- a/_observing-your-data/ad/result-mapping.md +++ b/_observing-your-data/ad/result-mapping.md @@ -9,7 +9,7 @@ redirect_from: # Anomaly result mapping -When you select the **Enable custom result index** box on the **Custom result index** pane, the Anomaly Detection plugin will store the results in your own index. When the anomaly detector does not detect an anomaly, the result format is as follows: +When you select the **Enable custom result index** box on the **Custom result index** pane, the Anomaly Detection plugin will save the results to an index of your choosing. When the anomaly detector does not detect an anomaly, the result format is as follows: ```json { From a99969b70230ae3394d3eb4d72731385eedad7a4 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 11:03:40 -0600 Subject: [PATCH 34/36] Address editorial feedback Signed-off-by: Melissa Vagi --- _observing-your-data/ad/result-mapping.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/_observing-your-data/ad/result-mapping.md b/_observing-your-data/ad/result-mapping.md index 4e8dc41494..967b185684 100644 --- a/_observing-your-data/ad/result-mapping.md +++ b/_observing-your-data/ad/result-mapping.md @@ -79,7 +79,7 @@ Field | Description `model_id` | A unique ID that identifies a model. If a detector is a single-stream detector (with no category field), it has only one model. If a detector is a high-cardinality detector (with one or more category fields), it might have multiple models, one for each entity. `threshold` | One of the criteria for a detector to classify a data point as an anomaly is that its `anomaly_score` must surpass a dynamic threshold. This field records the current threshold. -When the imputation option is enabled, the anomaly result includes a `feature_imputed` array, showing which features were modified due to missing data. If no features were imputed, then this is excluded. +When the imputation option is enabled, the anomaly results include a `feature_imputed` array showing which features were modified due to missing data. If no features were imputed, then this is excluded. In the following example anomaly result output, the `processing_bytes_max` feature was imputed, as shown by the `imputed: true` status: @@ -155,7 +155,7 @@ In the following example anomaly result output, the `processing_bytes_max` featu ``` {% include copy-curl.html %} -When an anomaly is detected, the result has the following format: +When an anomaly is detected, the result is provided in the following format: ```json { @@ -256,21 +256,21 @@ When an anomaly is detected, the result has the following format: ``` {% include copy-curl.html %} -Note that the result includes the following additional field: +Note that the result includes the following additional field. Field | Description :--- | :--- `relevant_attribution` | Represents the contribution of each input variable. The sum of the attributions is normalized to 1. `expected_values` | The expected value for each feature. -The detector may detect an anomaly late. For example, the detector observes a sequence of data that alternates between "slow weeks" (represented by the triples {1, 2, 3}) and "busy weeks" (represented by the triples {2, 4, 5}). If the detector comes across a pattern {2, 2, X}, where it has not yet seen the value that X will take, the detector infers that the pattern is anomalous. However, it cannot determine which of the 2's is the cause. If X = 3, then the first 2 is the anomaly. If X = 5, then the second 2 is the anomaly. If it is the first 2, then the detector would detect the anomaly late. +The detector may be late in detecting an anomaly. For example: The detector observes a sequence of data that alternates between "slow weeks" (represented by the triples {1, 2, 3}) and "busy weeks" (represented by the triples {2, 4, 5}). If the detector comes across a pattern {2, 2, X}, where it has not yet seen the value that X will take, then the detector infers that the pattern is anomalous. However, it cannot determine which 2 is the cause. If X = 3, then the first 2 is the anomaly. If X = 5, then the second 2 is the anomaly. If it is the first 2, then the detector will be late in detecting the anomaly. -When a detector detects an anomaly late, the result includes the following additional fields: +When a detector is late in detecting an anomaly, the result includes the following additional fields. Field | Description :--- | :--- -`past_values` | The actual input that triggered an anomaly. If `past_values` is null, then the attributions or expected values are from the current input. If `past_values` is not null, then the attributions or expected values are from a past input (for example, the previous two steps of the data [1,2,3]). -`approx_anomaly_start_time` | The approximate time of the actual input that triggers an anomaly. This field helps you understand when a detector flags an anomaly. Both single-stream and high-cardinality detectors do not query previous anomaly results because these queries are costly operations. The cost is especially high for high-cardinality detectors that may have many entities. If the data is not continuous, then the accuracy of this field is low and the actual time that the detector detects an anomaly can be earlier. +`past_values` | The actual input that triggered an anomaly. If `past_values` is `null`, then the attributions or expected values are from the current input. If `past_values` is not `null`, then the attributions or expected values are from a past input (for example, the previous two steps of the data [1,2,3]). +`approx_anomaly_start_time` | The approximate time of the actual input that triggered an anomaly. This field helps you understand the time at which a detector flags an anomaly. Both single-stream and high-cardinality detectors do not query previous anomaly results because these queries are costly operations. The cost is especially high for high-cardinality detectors that may have many entities. If the data is not continuous, then the accuracy of this field is low and the actual time at which the detector detects an anomaly can be earlier. ```json { From 7ea3d63e0a936fb3164f5fdf1c97ab8d0c669326 Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 11:21:09 -0600 Subject: [PATCH 35/36] Address editorial feedback Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 98 ++++++++++++++++---------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 44dda9fa91..ccab07388e 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -14,7 +14,7 @@ An _anomaly_ in OpenSearch is any unusual behavior change in your time-series da Conventional techniques like visualizations and dashboards can make it difficult to uncover anomalies. Configuring alerts based on static thresholds is possible, but this approach requires prior domain knowledge and may not adapt to data with organic growth or seasonal trends. -Anomaly detection automatically detects anomalies in your OpenSearch data in near real time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an _anomaly grade_ and _confidence score_ value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Random Cut Forests](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). +Anomaly detection automatically detects anomalies in your OpenSearch data in near real time using the Random Cut Forest (RCF) algorithm. RCF is an unsupervised machine learning algorithm that models a sketch of your incoming data stream to compute an _anomaly grade_ and _confidence score_ value for each incoming data point. These values are used to differentiate an anomaly from normal variations. For more information about how RCF works, see [Robust Random Cut Forest Based Anomaly Detection on Streams](https://www.semanticscholar.org/paper/Robust-Random-Cut-Forest-Based-Anomaly-Detection-on-Guha-Mishra/ecb365ef9b67cd5540cc4c53035a6a7bd88678f9). You can pair the Anomaly Detection plugin with the [Alerting plugin]({{site.url}}{{site.baseurl}}/monitoring-plugins/alerting/) to notify you as soon as an anomaly is detected. {: .note} @@ -25,16 +25,16 @@ To get started, go to **OpenSearch Dashboards** > **OpenSearch Plugins** > **Ano ## Step 1: Define a detector -A _detector_ is an individual anomaly detection task. You can define multiple detectors, and all detectors can run simultaneously, with each analyzing data from different sources. +A _detector_ is an individual anomaly detection task. You can define multiple detectors, and all detectors can run simultaneously, with each analyzing data from different sources. You can define a detector by following these steps: 1. On the **Anomaly detection** page, select the **Create detector** button. -2. On the **Define detector** page, enter the required information on the **Detector details** pane. -3. On the **Select data** pane, specify the data source by choosing a source from the **Index** dropdown menu. You can choose an index, index patterns, or alias. +2. On the **Define detector** page, enter the required information in the **Detector details** pane. +3. In the **Select data** pane, specify the data source by choosing a source from the **Index** dropdown menu. You can choose an index, index patterns, or an alias. 4. (Optional) Filter the data source by selecting **Add data filter** and then entering the conditions for **Field**, **Operator**, and **Value**. Alternatively, you can choose **Use query DSL** and add your JSON filter query. Only [Boolean queries]({{site.url}}{{site.baseurl}}/query-dsl/compound/bool/) are supported for query domain-specific language (DSL). #### Example: Filtering data using query DSL -The following example query retrieves documents where the `urlPath.keyword` field matches any of the specified values: +The following example query retrieves documents in which the `urlPath.keyword` field matches any of the specified values: - /domain/{id}/short - /sub_dir/{id}/short @@ -65,17 +65,17 @@ The following example query retrieves documents where the `urlPath.keyword` fiel ``` {% include copy-curl.html %} -5. On the **Timestamp** pane, select a field from the **Timestamp field** dropdown menu. +5. In the **Timestamp** pane, select a field from the **Timestamp field** dropdown menu. -6. On the **Operation settings** pane, define the **Detector interval**, which is the time interval at which the detector collects data. - - The detector aggregates the data in this interval, then feeds the aggregated result into the anomaly detection model. The shorter you set this interval, the fewer data points the detector aggregates. The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process needs a certain number of aggregated data points from contiguous intervals. +6. In the **Operation settings** pane, define the **Detector interval**, which is the interval at which the detector collects data. + - The detector aggregates the data at this interval and then feeds the aggregated result into the anomaly detection model. The shorter the interval, the fewer data points the detector aggregates. The anomaly detection model uses a shingling process, a technique that uses consecutive data points to create a sample for the model. This process requires a certain number of aggregated data points from contiguous intervals. - You should set the detector interval based on your actual data. If the detector interval is too long, then it might delay the results. If the detector interval is too short, then it might miss some data. The detector interval also will not have a sufficient number of consecutive data points for the shingle process. - (Optional) To add extra processing time for data collection, specify a **Window delay** value. - This value tells the detector that the data is not ingested into OpenSearch in real time but with a certain delay. Set the window delay to shift the detector interval to account for this delay. - For example, the detector interval is 10 minutes and data is ingested into your cluster with a general delay of 1 minute. Assume the detector runs at 2:00. The detector attempts to get the last 10 minutes of data from 1:50 to 2:00, but because of the 1-minute delay, it only gets 9 minutes of data and misses the data from 1:59 to 2:00. Setting the window delay to 1 minute shifts the interval window to 1:49--1:59, so the detector accounts for all 10 minutes of the detector interval time. - - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, setting it too high can hinder real-time anomaly detection, as the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection. + - To avoid missing any data, set the **Window delay** to the upper limit of the expected ingestion delay. This ensures that the detector captures all data during its interval, reducing the risk of missing relevant information. While a longer window delay helps capture all data, too long of a window delay can hinder real-time anomaly detection because the detector will look further back in time. Find a balance to maintain both data accuracy and timely detection. -7. Specify custom results index. +7. Specify a custom results index. - The Anomaly Detection plugin allows you to store anomaly detection results in a custom index of your choice. Select **Enable custom results index** and provide a name for your index, for example, `abc`. The plugin then creates an alias prefixed with `opensearch-ad-plugin-result-` followed by your chosen name, for example, `opensearch-ad-plugin-result-abc`. This alias points to an actual index with a name containing the date and a sequence number, such as `opensearch-ad-plugin-result-abc-history-2024.06.12-000002`, where your results are stored. You can use `-` to separate the namespace to manage custom results index permissions. For example, if you use `opensearch-ad-plugin-result-financial-us-group1` as the results index, you can create a permission role based on the pattern `opensearch-ad-plugin-result-financial-us-*` to represent the `financial` department at a granular level for the `us` group. @@ -85,12 +85,12 @@ The following example query retrieves documents where the `urlPath.keyword` fiel - If the custom index you specify does not exist, the Anomaly Detection plugin will create it when you create the detector and start your real-time or historical analysis. - If the custom index already exists, the plugin will verify that the index mapping matches the required structure for anomaly results. In this case, ensure that the custom index has a valid mapping as defined in the [`anomaly-results.json`](https://github.com/opensearch-project/anomaly-detection/blob/main/src/main/resources/mappings/anomaly-results.json) file. - To use the custom results index option, you must have the following permissions: - - `indices:admin/create` - The `create` permission is required in order to create and roll over the custom index. - - `indices:admin/aliases` - The `aliases` permission is required in order to create and manage an alias for the custom index. - - `indices:data/write/index` - The `write` permission is required in order to write results into the custom index for a single-entity detector. - - `indices:data/read/search` - The `search` permission is required in order to search custom results indexes to show results on the Anomaly Detection interface. - - `indices:data/write/delete` - The detector may generate many anomaly results. The `delete` permission is required to delete old data and save disk space. - - `indices:data/write/bulk*` - The `bulk*` permission is required because the plugin uses the bulk API to write results into the custom index. + - `indices:admin/create` -- The `create` permission is required in order to create and roll over the custom index. + - `indices:admin/aliases` -- The `aliases` permission is required in order to create and manage an alias for the custom index. + - `indices:data/write/index` -- The `write` permission is required in order to write results into the custom index for a single-entity detector. + - `indices:data/read/search` -- The `search` permission is required in order to search custom results indexes to show results on the Anomaly Detection interface. + - `indices:data/write/delete` -- The detector may generate many anomaly results. The `delete` permission is required in order to delete old data and save disk space. + - `indices:data/write/bulk*` -- The `bulk*` permission is required because the plugin uses the Bulk API to write results into the custom index. - When managing the custom results index, consider the following: - The anomaly detection dashboard queries all detector results from all custom results indexes. Having too many custom results indexes can impact the plugin's performance. - You can use [Index State Management]({{site.url}}{{site.baseurl}}/im-plugin/ism/index/) to roll over old results indexes. You can also manually delete or archive any old results indexes. Reusing a custom results index for multiple detectors is recommended. @@ -110,24 +110,24 @@ After you define the detector, the next step is to configure the model. 1. Add features to your detector. -A _feature_ is the field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. +A _feature_ is any field in your index that you want to analyze for anomalies. A detector can discover anomalies across one or more features. You must choose an aggregation method for each feature: `average()`, `count()`, `sum()`, `min()`, or `max()`. The aggregation method determines what constitutes an anomaly. For example, if you choose `min()`, the detector focuses on finding anomalies based on the minimum values of your feature. If you choose `average()`, the detector finds anomalies based on the average values of your feature. -A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely for multi-feature models to identify smaller anomalies as compared to a single-feature model. Adding more features can negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data can further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is `5`. You can adjust this limit with the `plugins.anomaly_detection.max_anomaly_features` setting. +A multi-feature model correlates anomalies across all its features. The [curse of dimensionality](https://en.wikipedia.org/wiki/Curse_of_dimensionality) makes it less likely that multi-feature models will identify smaller anomalies as compared to a single-feature model. Adding more features can negatively impact the [precision and recall](https://en.wikipedia.org/wiki/Precision_and_recall) of a model. A higher proportion of noise in your data can further amplify this negative impact. Selecting the optimal feature set is usually an iterative process. By default, the maximum number of features for a detector is `5`. You can adjust this limit using the `plugins.anomaly_detection.max_anomaly_features` setting. {: .note} ### Configuring a model based on an aggregation method To configure an anomaly detection model based on an aggregation method, follow these steps: -1. On the **Detectors** page, select the desired detector from the listed options. +1. On the **Detectors** page, select the desired detector from the list. 2. On the detector's details page, select the **Actions** button to activate the dropdown menu and then select **Edit model configuration**. 3. On the **Edit model configuration** page, select the **Add another feature** button. 4. Enter a name in the **Feature name** field and select the **Enable feature** checkbox. 5. Select **Field value** from the dropdown menu under **Find anomalies based on**. 6. Select the desired aggregation from the dropdown menu under **Aggregation method**. -7. Select the desired field from the available options listed in the dropdown menu under **Field**. +7. Select the desired field from the options listed in the dropdown menu under **Field**. 8. Select the **Save changes** button. ### Configuring a model based on a JSON aggregation query @@ -140,14 +140,14 @@ To configure an anomaly detection model based on a JSON aggregation query, follo 4. Enter your JSON aggregation query in the editor. 5. Select the **Save changes** button. -For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/) +For acceptable JSON query syntax, see [OpenSearch Query DSL]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/index/). {: .note} -### Setting category fields for high cardinality +### Setting categorical fields for high cardinality -You can categorize anomalies based on a keyword or IP field type. The category field categorizes or slices the source time series with a dimension, such as IP addresses, product IDs, and country codes. This gives you a granular view of anomalies within each entity of the category field to help isolate and debug issues. +You can categorize anomalies based on a keyword or IP field type. You can enable the **Categorical fields** option to categorize, or "slice," the source time series using a dimension, such as an IP address, a product ID, or a country code. This gives you a granular view of anomalies within each entity of the category field to help isolate and debug issues. -To set a category field, choose **Enable a category field** and select a field. You cannot change the category fields after you create the detector. +To set a category field, choose **Enable categorical fields** and select a field. You cannot change the category fields after you create the detector. Only a certain number of unique entities are supported in the category field. Use the following equation to calculate the recommended total number of entities supported in a cluster: @@ -155,28 +155,28 @@ Only a certain number of unique entities are supported in the category field. Us (data nodes * heap size * anomaly detection maximum memory percentage) / (entity model size of a detector) ``` -To get the detector's entity model size, use the [Profile Detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage with the `plugins.anomaly_detection.model_max_size_percent` setting. +To get the detector's entity model size, use the [Profile Detector API]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api/#profile-detector). You can adjust the maximum memory percentage using the `plugins.anomaly_detection.model_max_size_percent` setting. -Consider a cluster with three data nodes, each with 8 GB of JVM heap size and the default 10% memory allocation. With an entity model size of 1 MB, the following formula calculates the estimated number of unique entities: +Consider a cluster with 3 data nodes, each with 8 GB of JVM heap size and the default 10% memory allocation. With an entity model size of 1 MB, the following formula calculates the estimated number of unique entities: ``` (8096 MB * 0.1 / 1 MB ) * 3 = 2429 ``` -If the actual total number of unique entities is higher than the number that you calculate (in this case, 2,429), the anomaly detector attempts to model the extra entities. The detector prioritizes entities that occur more often and are more recent. +If the actual total number of unique entities is higher than the number that you calculate (in this case, 2,429), then the anomaly detector attempts to model the extra entities. The detector prioritizes both entities that occur more often and are more recent. -This formula serves as a starting point. Make sure to test it with a representative workload. See the OpenSearch blog [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/). +This formula serves as a starting point. Make sure to test it with a representative workload. See the OpenSearch blog post [Improving Anomaly Detection: One million entities in one minute](https://opensearch.org/blog/one-million-enitities-in-one-minute/) for more information. {: .note } ### Setting a shingle size -On the **Advanced settings** pane, you can set the number of aggregation intervals from your data stream to include in the detection window. Choose this value based on your actual data to find the optimal setting for your use case. To set the shingle size, select **Show** on the **Advanced settings** pane. Enter the desired size in the **intervals** field. +In the **Advanced settings** pane, you can set the number of data stream aggregation intervals to include in the detection window. Choose this value based on your actual data to find the optimal setting for your use case. To set the shingle size, select **Show** in the **Advanced settings** pane. Enter the desired size in the **intervals** field. -The anomaly detector requires the shingle size to be between 1 and 128. The default is `8`. Use `1` only if you have at least two features. Values less than `8` may increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also may increase false positives. Values greater than `8` may be useful for ignoring noise in a signal. +The anomaly detector requires the shingle size to be between 1 and 128. The default is `8`. Use `1` only if you have at least two features. Values of less than `8` may increase [recall](https://en.wikipedia.org/wiki/Precision_and_recall) but also may increase false positives. Values greater than `8` may be useful for ignoring noise in a signal. ### Setting an imputation option -On the **Advanced settings** pane, you can set the imputation option. This allows you to handle missing data in your streams. The options include the following: +In the **Advanced settings** pane, you can set the imputation option. This allows you to manage missing data in your streams. The options include the following: - **Ignore Missing Data (Default):** The system continues without considering missing data points, keeping the existing data flow. - **Fill with Custom Values:** Specify a custom value for each feature to replace missing data points, allowing for targeted imputation tailored to your data. @@ -185,16 +185,16 @@ On the **Advanced settings** pane, you can set the imputation option. This allow Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, then filling missing values with zeros helps detect significant data absences, improving detection recall. -Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly result index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping/) for more information. +Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly results index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping/) for more information. {: note} ### Suppressing anomalies with threshold-based rules -On the **Advanced settings** pane, you can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations. +In the **Advanced settings** pane, you can suppress anomalies by setting rules that define acceptable differences between the expected and actual values, either as an absolute value or a relative percentage. This helps reduce false anomalies caused by minor fluctuations, allowing you to focus on significant deviations. Suppose you want to detect substantial changes in log volume while ignoring small variations that are not meaningful. Without customized settings, the system might generate false alerts for minor changes, making it difficult to identify true anomalies. By setting suppression rules, you can ignore minor deviations and focus on real anomalous patterns. -To suppress anomalies for deviations smaller than 30% from the expected value, you can set the following rules: +To suppress anomalies for deviations of less than 30% from the expected value, you can set the following rules: ``` Ignore anomalies for feature logVolume when the actual value is no more than 30% above the expected value. @@ -215,46 +215,46 @@ If no custom suppression rules are set, then the system defaults to a filter tha ### Previewing sample anomalies -You can preview anomalies based on a sample feature input and adjust the feature settings as needed. The Anomaly Detection plugin selects a small number of data samples---for example, one data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. The sample dataset is loaded into the detector that then uses the sample dataset to generate a preview of the anomalies. +You can preview anomalies based on sample feature input and adjust the feature settings as needed. The Anomaly Detection plugin selects a small number of data samples---for example, 1 data point every 30 minutes---and uses interpolation to estimate the remaining data points to approximate the actual feature data. The sample dataset is loaded into the detector, which then uses the sample dataset to generate a preview of the anomalies. 1. Choose **Preview sample anomalies**. - - If sample anomaly results are not displayed, then check the detector interval to verify that 400 or more data points are set for the entities during the preview date range. + - If sample anomaly results are not displayed, check the detector interval to verify that 400 or more data points are set for the entities during the preview date range. 2. Select the **Next** button. ## Step 3: Setting up detector jobs -To start a real-time detector to find anomalies in your data in near real time, check **Start real-time detector automatically (recommended)**. +To start a detector to find anomalies in your data in near real time, select **Start real-time detector automatically (recommended)**. -Alternatively, if you want to perform historical analysis and find patterns in long historical data windows (weeks or months), select the **Run historical analysis detection** box and select a date range (at least 128 detection intervals). +Alternatively, if you want to perform historical analysis and find patterns in longer historical data windows (weeks or months), select the **Run historical analysis detection** box and select a date range of at least 128 detection intervals. -Analyzing historical data familiarizes you with the Anomaly Detection plugin. For example, you can evaluate the performance of a detector with historical data to fine-tune it. +Analyzing historical data can help to familiarize you with the Anomaly Detection plugin. For example, you can evaluate the performance of a detector against historical data in order to fine-tune it. -You can experiment with historical analysis using different feature sets and checking the precision before moving on to real-time detectors. +You can experiment with historical analysis by using different feature sets and checking the precision before using real-time detectors. ## Step 4: Reviewing detector settings -Review your detector settings and model configurations to confirm they are valid and then select **Create detector**. +Review your detector settings and model configurations to confirm that they are valid and then select **Create detector**. -If validation errors occur, then edit the settings to fix the error and return to the detector page. +If a validation error occurs, edit the settings to correct the error and return to the detector page. {: .note } ## Step 5: Observing the results -Choose the **Real-time results** or **Historical analysis** tab. For real-time results, it will take time to display the anomaly results. For example, if the detector interval is 10 minutes, then the detector may take an hour to start because it is waiting for sufficient data to generate anomalies. +Choose either the **Real-time results** or **Historical analysis** tab. For real-time results, it will take some time to display the anomaly results. For example, if the detector interval is 10 minutes, then the detector may take an hour to initiate because it is waiting for sufficient data to be able to generate anomalies. -A shorter interval means the model passes the shingle process more quickly and starts to generate the anomaly results sooner. You can use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to ensure you have enough data points. +A shorter interval results in the model passing the shingle process more quickly and generating anomaly results sooner. You can use the [profile detector]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/api#profile-detector) operation to ensure that you have enough data points. -If the detector is pending in "initialization" for longer than a day, aggregate your existing data using the detector interval to check for any missing data points. If you find many missing data points from the aggregated data, consider increasing the detector interval. +If the detector is pending in "initialization" for longer than 1 day, aggregate your existing data and use the detector interval to check for any missing data points. If you find many missing data points, consider increasing the detector interval. -Choose and drag over the anomaly line chart to zoom in and see a detailed view of an anomaly. +Click and drag over the anomaly line chart to zoom in and see a detailed view of an anomaly. {: .note } -You can analyze anomalies with the following visualizations: +You can analyze anomalies using the following visualizations: - **Live anomalies** (for real-time results) displays live anomaly results for the last 60 intervals. For example, if the interval is `10`, it shows results for the last 600 minutes. The chart refreshes every 30 seconds. -- **Anomaly overview** (for real-time results) or **Anomaly history** (for historical analysis in the **Historical analysis** tab) plots the anomaly grade with the corresponding measure of confidence. The pane includes: +- **Anomaly overview** (for real-time results) or **Anomaly history** (for historical analysis on the **Historical analysis** tab) plot the anomaly grade with the corresponding measure of confidence. The pane includes: - The number of anomaly occurrences based on the given data-time range. - - The **Average anomaly grade**, a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of `0` represents “not an anomaly,” and a non-zero value represents the relative severity of the anomaly. + - The **Average anomaly grade**, a number between 0 and 1 that indicates how anomalous a data point is. An anomaly grade of `0` represents "not an anomaly," and a non-zero value represents the relative severity of the anomaly. - **Confidence** estimate of the probability that the reported anomaly grade matches the expected anomaly grade. Confidence increases as the model observes more data and learns the data behavior and trends. Note that confidence is distinct from model accuracy. - **Last anomaly occurrence** is the time at which the last anomaly occurred. From ca49c0cda69708c4ff867323cee11e2b7ad04abd Mon Sep 17 00:00:00 2001 From: Melissa Vagi Date: Fri, 13 Sep 2024 17:39:49 -0600 Subject: [PATCH 36/36] Update _observing-your-data/ad/index.md Signed-off-by: Melissa Vagi --- _observing-your-data/ad/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_observing-your-data/ad/index.md b/_observing-your-data/ad/index.md index 70889af32b..657c3c90cb 100644 --- a/_observing-your-data/ad/index.md +++ b/_observing-your-data/ad/index.md @@ -196,7 +196,7 @@ In the **Advanced settings** pane, you can set the imputation option. This allow Using these options can improve recall in anomaly detection. For instance, if you are monitoring for drops in event counts, including both partial and complete drops, then filling missing values with zeros helps detect significant data absences, improving detection recall. -Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly results index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping/) for more information. +Be cautious when imputing extensively missing data, as excessive gaps can compromise model accuracy. Quality input is critical---poor data quality leads to poor model performance. The confidence score also decreases when imputations occur. You can check whether a feature value has been imputed using the `feature_imputed` field in the anomaly results index. See [Anomaly result mapping]({{site.url}}{{site.baseurl}}/monitoring-plugins/ad/result-mapping/) for more information. {: note} ### Suppressing anomalies with threshold-based rules