Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace analytics update #7362

Merged
merged 70 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
93516cf
update the integration page to reflect the integration catalog and ad…
YANG-DB Jun 6, 2024
2615a67
update the integration documentation
YANG-DB Jun 7, 2024
49c42c0
Update schema section
Swiddis Jun 10, 2024
f311e7d
Merge pull request #1 from Swiddis/schema-rewrite
YANG-DB Jun 10, 2024
cfcec50
Merge pull request #2 from opensearch-project/main
YANG-DB Jun 10, 2024
1cfd6ff
Merge branch 'main' into integration-catalog-update
YANG-DB Jun 11, 2024
272d5ee
Merge branch 'main' into integration-catalog-update
YANG-DB Jun 11, 2024
c78a878
update the metrics analytics documentation
YANG-DB Jun 11, 2024
569da98
Merge remote-tracking branch 'origin/integration-catalog-update' into…
YANG-DB Jun 11, 2024
c08bc63
update the trace analytics documentation
YANG-DB Jun 11, 2024
588620d
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
7ff2e0f
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
e7f32c9
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
10f5bc5
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
4c87982
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
aab2347
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
25e7bc1
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
1e0e1c7
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
b2960ed
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
72d0d9e
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
b00b006
Update ta-dashboards.md
vagimeli Jun 11, 2024
393e2e1
Merge branch 'main' into trace-analytics-update
vagimeli Jun 11, 2024
1521795
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
0e59e6d
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 11, 2024
1f4b111
Merge branch 'opensearch-project:main' into trace-analytics-update
YANG-DB Jun 12, 2024
2de6d49
update service correlation index naming convention
YANG-DB Jun 12, 2024
a392b63
Update ta-dashboards.md
vagimeli Jun 13, 2024
9ee3d81
Update ta-dashboards.md
vagimeli Jun 13, 2024
70e321f
Update ta-dashboards.md
vagimeli Jun 13, 2024
95a01cf
Update ta-dashboards.md
vagimeli Jun 13, 2024
6d3f9a4
Merge branch 'main' into trace-analytics-update
vagimeli Jun 13, 2024
01aaf20
Update ta-dashboards.md
vagimeli Jun 18, 2024
af81b46
Update ta-dashboards.md
vagimeli Jun 19, 2024
a0e0764
Update ta-dashboards.md
vagimeli Jun 19, 2024
0832eb1
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
1a4ed0a
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
c0fed9f
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
22a6bf1
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
59cb066
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
4556fa3
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
66e88ec
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
97244d1
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
6f9cd44
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
8d72dbf
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
ea35cc0
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
41e87e3
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
7233f31
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
fb5903a
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
08c4b4a
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
214a707
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
622d796
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
dfa57f0
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
83ef1f3
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
edc0f18
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
31888a1
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
41b1a78
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
9bbd084
Update _observing-your-data/trace/ta-dashboards.md
YANG-DB Jun 19, 2024
a51d01b
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
9f3beb2
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
1cc94d6
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
11b9c58
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
b7b12f0
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
3c0ddab
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
9e59974
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
59e9e7e
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
22fd303
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
5509571
Update _observing-your-data/trace/ta-dashboards.md
vagimeli Jun 19, 2024
0018777
Merge branch 'main' into trace-analytics-update
vagimeli Jun 19, 2024
6cc7072
Merge branch 'main' into trace-analytics-update
vagimeli Jun 20, 2024
e5e42bc
Merge branch 'main' into trace-analytics-update
vagimeli Jun 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 128 additions & 9 deletions _observing-your-data/trace/ta-dashboards.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,144 @@
---
layout: default
title: OpenSearch Dashboards plugin
title: Trace Analytics plugin for OpenSearch Dashboards
parent: Trace Analytics
nav_order: 50
redirect_from:
- /observability-plugin/trace/ta-dashboards/
- /monitoring-plugins/trace/ta-dashboards/
---

# Trace Analytics OpenSearch Dashboards plugin
# Trace Analytics plugin for OpenSearch Dashboards

The Trace Analytics plugin for OpenSearch Dashboards provides at-a-glance visibility into your application performance, along with the ability to drill down on individual traces. For installation instructions, see [Standalone OpenSearch Dashboards plugin install]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/plugins/).
The Trace Analytics plugin offers at-a-glance visibility into application performance based on [OpenTelemetry (OTel)](https://opentelemetry.io/) protocol data that standardizes instrumentation for collecting telemetry data from cloud-native software.

The **Dashboard** view groups traces together by HTTP method and path so that you can see the average latency, error rate, and trends associated with a particular operation. For a more focused view, try filtering by trace group name.
## Installing the plugin

![Dashboard view]({{site.url}}{{site.baseurl}}/images/ta-dashboard.png)
See [Standalone OpenSearch Dashboards plugin install]({{site.url}}{{site.baseurl}}/install-and-configure/install-dashboards/plugins/) for instructions on how to install the Trace Analytics plugin.

To drill down on the traces that make up a trace group, choose the number of traces in the column on the right. Then choose an individual trace for a detailed summary.
## Setting up the OpenTelemetry Demo

![Detailed trace view]({{site.url}}{{site.baseurl}}/images/ta-trace.png)
The [OpenTelemetry Demo with OpenSearch](https://github.com/opensearch-project/opentelemetry-demo) simulates a distributed application generating real-time telemetry data, providing you with a practical environment in which to explore features available with the Trace Analytics plugin before implementing it in your environment.

The **Services** view lists all services in the application, plus an interactive map that shows how the various services connect to each other. In contrast to the dashboard, which helps identify problems by operation, the service map helps identify problems by service. Try sorting by error rate or latency to get a sense of potential problem areas of your application.

![Service view]({{site.url}}{{site.baseurl}}/images/ta-services.png)
**Step 1: Set up the OpenTelemetry Demo**

- Clone the [OpenTelemetry Demo with OpenSearch](https://github.com/opensearch-project/opentelemetry-demo) repository: `git clone https://github.com/opensearch-project/opentelemetry-demo`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there should be a "the" preceding "OpenSearch".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think so...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo name is "OpenTelemetry Demo with OpenSearch."

- Follow the [Getting Started](https://github.com/opensearch-project/opentelemetry-demo/blob/main/tutorial/GettingStarted.md) instructions to deploy the demo application using Docker, which runs multiple microservices generating telemetry data.

**Step 2: Ingest telemetry data**

- Configure the OTel collectors to send telemetry data (traces, metrics, logs) to your OpenSearch cluster, using the [preexisting setup](https://github.com/opensearch-project/opentelemetry-demo/tree/main/src/otelcollector).
- Confirm that [Data Prepper](https://github.com/opensearch-project/opentelemetry-demo/tree/main/src/dataprepper) is set up to process the incoming data, handle trace analytics and service map pipelines, submit data to required indexes, and perform preaggregated calculations.

**Step 3: Explore Trace Analytics in OpenSearch Dashboards**

The **Trace Analytics** application includes two options: **Services** and **Traces**:

- **Services** lists all services in the application, plus an interactive map that shows how the various services connect to each other. In contrast to the dashboard (which helps identify problems by operation), the **Service map** helps you identify problems by service based on error rates and latency. To access this option, go to **Trace Analytics** > **Services**.
- **Traces** groups traces together by HTTP method and path so that you can see the average latency, error rate, and trends associated with a particular operation. For a more focused view, try filtering by trace group name. To access this option, go to **Trace Analytics** > **Traces**. From the **Trace Groups** panel, you can review the traces that comprise a trace group. From the **Traces** panel you can analyze individual traces for a detailed summary.

**Step 4: Perform correlation analysis**
- Select **Services correlation** to display connections between various telemetry signals. This allows you to navigate from the logical service level to the associated metrics and logs for that specific service.

---
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

## Schema dependencies and assumptions

The plugin requires you to use [Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/) to process and visualize OTel data and relies on the following Data Prepper pipelines for OTel correlations and service map calculations:

- [Trace analytics pipeline]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/)
- [Service map pipeline]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/service-map-stateful/)

### Standardized telemetry data

The plugin requires telemetry data to follow the OTel schema conventions, including the structure and naming of spans, traces, and metrics as specified by OTel, and to be implemented using the [Simple Schema for Observability]({{site.url}}{{site.baseurl}}/observing-your-data/ss4o/).

### Service names and dependency map

For accurate service mapping and correlation analysis, adhere to the following guidelines:

- Service names must be unique and used consistently across application components.
- The `serviceName` field must be populated using the Data Prepper pipeline.
- Services must be ingested with predefined upstream and downstream dependencies to construct accurate service maps and understand service relationships.

### Trace and span IDs

Traces and spans must have consistently generated and maintained unique identifiers across distributed systems to enable end-to-end tracing and accurate performance insights.

### RED metrics adherence

The plugin expects metric data to include rate, error, and duration (RED) indicators for each service, either preaggregated using the Data Prepper pipeline or calculated dynamically based on spans. This allows for effective computation and display of key performance indicators.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### Correlation fields

Certain fields, such as `serviceName`, must be present to perform correlation analysis. These fields enable the plugin to link related telemetry data and provide a holistic view of service interactions and dependencies.

### Correlation indexes

Navigating from the service dialog to its corresponding traces or logs requires the existence of correlating fields and that the target indexes (for example, logs) follow the specified naming conventions, as described at [Simple Schema for Observability](https://opensearch.org/docs/latest/observing-your-data/ss4o/).

---

## Trace analytics with OTel protocol analytics
Introduced 2.15
{: .label .label-purple }

Trace analytics with OTel protocol analytics provide comprehensive insights into distributed systems. You can visualize and analyze the following:
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

- [Service](https://opentelemetry.io/docs/specs/semconv/resource/#service): The components of a distributed application. These components are significant logical terms used to measure and monitor the application's building blocks in order to validate the system's health.
- [Traces](https://opentelemetry.io/docs/concepts/signals/traces/): A visual representation of a request's path across services into requests' journeys across services, offering insights into latency and performance issues.
- [RED metrics](https://opentelemetry.io/docs/specs/otel/metrics/api/): Metrics for service health and performance, measured as requests per second (rate), failed requests (errors), and request processing time (duration).

### Trace analytics visualizations

**Services** visualizations, such as a table or map, help you logically analyze service behavior and accuracy. The following visualizations can help you identify anomalies and errors:

- **Services table**
- A RED indicator, along with connected upstream and downstream services and other actions, is indicated in each table column. An example **Services** table is shown in the following image.

![Services table]({{site.url}}{{site.baseurl}}/images/trace-analytics/services-table.png)

- General-purpose filter selection is used for field or filter composition. The following image shows this filter.

![Services filter selection]({{site.url}}{{site.baseurl}}/images/trace-analytics/services-filter-selection.png)

- The **Services** throughput tooltip provides an at-a-glance overview of a service's incoming request trend for the past 24 hours. The following image shows an example tooltip.

![Services throughput tooltip ]({{site.url}}{{site.baseurl}}/images/trace-analytics/service-throughput-tooltip.png)

- The **Services** correlation dialog window provides an at-a-glance overview of a service's details, including its 24-hour throughput trend. You can use these details to analyze correlated logs or traces by filtering based on the `serviceName` field. The following image shows this window.

![Services correlation dialog window]({{site.url}}{{site.baseurl}}/images/trace-analytics/single-service-correlation-dialog.png)

- The **Services** RED metrics dialog window provides an at-a-glance overview of a service's RED metrics indicators, including 24-hour error, duration, and throughput rate. The following image shows this window.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rate" => "rates"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be singular


![Services RED metrics for duration]({{site.url}}{{site.baseurl}}/images/trace-analytics/single-service-RED-metrics.png)

- The **Span details** dialog window provides the details of a trace. You can use this information to further analyze a trace's elements, such as attributes and associated logs. The following image shows this window.

![Services Span details dialog window]({{site.url}}{{site.baseurl}}/images/trace-analytics/span-details-fly-out.png)

- **Service map**
- The **Service map** displays nodes, each representing a service. The node color indicates the RED indicator severity for that service and its dependencies. The following image shows a map.

![Services map tooltip]({{site.url}}{{site.baseurl}}/images/trace-analytics/service-details-tooltip.png)

- You can select a node to open a detailed dialog window for that service. This interactive map visualizes service interconnections, helping identify problems by service, unlike dashboards that identify issues by operation. You can sort by error rate or latency to pinpoint potential problem areas.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

- In the **Service map** dialog window, nodes represent connected downstream services dependent on the selected service. The node color indicates the RED indicator severity for that service and its downstream dependencies. The following image shows this dialog window.

![Service map dialog window]({{site.url}}{{site.baseurl}}/images/trace-analytics/single-service-fly-out.png)

- **Trace groups**
- Traces are grouped by their HTTP API name, allowing clustering based on their business functional unit. Traces are grouped by HTTP method and path, displaying the average latency, error rate, and trends associated with a particular operation. You can filter by trace group name. The following image shows the **Trace Groups** window.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "business functional unit" the best wording for what we mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so - any suggestions ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Business functional unit refers to the logical grouping of functionality based on the applications purpose. For example, HTTP APIs differ for the application, such as bank application, e-commerce application, or health system application.


![Trace Groups window]({{site.url}}{{site.baseurl}}/images/trace-analytics/trace-group-RED-metrics.png)

- In the **Trace Groups** window, you can filter by group name and other filters. You can also analyze associated traces. To drill down on the traces that comprise a group, select the number of traces in the right-hand column and then choose an individual trace to see a detailed summary.

![Trace group dialog window]({{site.url}}{{site.baseurl}}/images/ta-dashboard.png)

- The **Trace details** window displays a breakdown of a single trace, including its corresponding spans, associated service names, and a waterfall chart of the spans' time and duration interactions. The following image shows this view.

![Trace details window]({{site.url}}{{site.baseurl}}/images/ta-trace.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/trace-analytics/services-table.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading