Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Updates from backport 2563 #2580

Merged
merged 4 commits into from
Jun 27, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions docs/sources/tempo/configuration/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,14 @@ ingester:
For more information on configuration options, see [here](https://github.com/grafana/tempo/blob/main/modules/generator/config.go).

The metrics-generator processes spans and write metrics using the Prometheus remote write protocol.
For more information on the metrics-generator, refer to the [Metrics-generator documentation]({{<> relref "../metrics-generator" >}}).
knylander-grafana marked this conversation as resolved.
Show resolved Hide resolved

Metrics-generator processors are disabled by default. To enable it for a specific tenant, set `metrics_generator_processors` in the [overrides](#overrides) section.

You can limit spans with start times that occur within a configured duration to be considered in metrics generation using `metrics_ingestion_time_range_slack`.
knylander-grafana marked this conversation as resolved.
Show resolved Hide resolved
In Grafana Cloud, this value defaults to 30 seconds so all spans sent to the metrics-generation more than 30 seconds in the past are discarded or rejected.


Metrics-generator processors are disabled by default. To enable it for a specific tenant set `metrics_generator_processors` in the [overrides](#overrides) section.

```yaml
# Metrics-generator configuration block
Expand Down Expand Up @@ -361,8 +367,8 @@ metrics_generator:
[- <Prometheus remote write config>]

# This option only allows spans with start time that occur within the configured duration to be
knylander-grafana marked this conversation as resolved.
Show resolved Hide resolved
knylander-grafana marked this conversation as resolved.
Show resolved Hide resolved
# considered in metrics generation
# This is to filter out spans that are outdated
# considered in metrics generation.
# This is to filter out spans that are outdated.
[metrics_ingestion_time_range_slack: <duration> | default = 30s]
```

Expand Down
4 changes: 2 additions & 2 deletions docs/sources/tempo/metrics-generator/active-series.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ weight: 100

# Active series

An active series is a time series that receives new data points or samples. When you stop writing new datapoints to a time series, shortly afterwards it is no longer considered active.
An active series is a time series that receives new data points or samples. When you stop writing new data points to a time series, shortly afterwards it is no longer considered active.

Metrics generated by Tempo's metrics generator can provide both RED (Rate/Error/Duration) metrics and interdependency graphs between services in a trace (the Service Graph functionality in Grafana).
These capabilities rely on a set of generated span metrics and service metrics.

Any spans that are ingested by Tempo could potentially create up to 13 metrics. However, this doesn't mean that every time a span is ingested that a new active series is created.
Any spans that are ingested by Tempo can create many metric series. However, this doesn't mean that every time a span is ingested that a new active series is created.

The number of active series generated depends on the label pairs generated from span data that are associated with the metrics, similar to other Prometheus-formated data.

Expand Down
60 changes: 17 additions & 43 deletions docs/sources/tempo/operations/best-practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,66 +11,40 @@ This page provides some general best practices for tracing.

## Span and resource attributes

Traces are built from spans. Spans are constructed primarily of span and resource attributes.
[Traces]({{< relref "../traces" >}}) are built from spans, which denote units of work such as a call to, or from, an upstream service. Spans are constructed primarily of span and resource attributes.
Spans also have a hierarchy, where parent spans can have children or siblings.

A **span attribute** is a key/value pair that exposes context for the span that it exists within. For example, if the span deals with calling another service via HTTP, it could include the HTTP URL (maybe as the span attribute key `http.url`) and the HTTP status code returned (as the span attribute `http.status_code`). Span attributes can consist of varying, non-null types.
In the screenshot below, the left side of the screen (1) shows the list of results for the query. The right side (2) lists each span that makes up the selected trace.

Unlike a span attribute, a **resource attribute** is a key/value pair that is concerned around the context of the manner in which the span was collected.
For example, this could a set of resource attributes concerning a Kubernetes cluster, in which case you may see resource attributes, for example: `k8s.namespace`, `k8s.container_name`, and `k8s.cluster`.
These can also include information on the libraries that were used to instrument the spans for a trace, or any other infrastructure information.
<p align="center"><img src="getting-started/assets/trace-explore-spans.png" alt="Trace example"></p>

For more information, read the [Attribute and Resource](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/overview.md) sections in the OpenTelemetry specification.
A **span attribute** is a key/value pair that provides context for its span. For example, if the span deals with calling another service via HTTP, an attribute could include the HTTP URL (maybe as the span attribute key `http.url`) and the HTTP status code returned (as the span attribute `http.status_code`). Span attributes can consist of varying, non-null types.

Unlike a span attribute, a **resource attribute** is a key/value pair that describes the context of how the span was collected. Generally, these attributes describe the process that created the span.
For example, this could be a set of resource attributes concerning a Kubernetes cluster, in which case you may see resource attributes, for example: `k8s.namespace`, `k8s.container_name`, and `k8s.cluster`.
These can also include information on the libraries that were used to instrument the spans for a trace, or any other infrastructure information.

For more information, read the [Attribute and Resource](https://opentelemetry.io/docs/specs/otel/overview/) sections in the OpenTelemetry specification.

### Naming conventions for span and resource attributes

When naming attributes, it is best to use consistent, nested namespaces.
This ensures that attribute keys will be obvious to anyone observing the spans of a trace, and that common attributes can be shared by spans.
Using our example from above, the `http` prefix of the attribute is the namespace, with `url` and `status_code` being keys within that namespace.
These can also be nested, for example `http.url.protocol` might be `HTTP` or `HTTPS`, whereas `http.url.path` might be `/api/v1/query`.
When naming attributes, use consistent, nested namespaces to ensures that attribute keys are obvious to anyone observing the spans of a trace and that common attributes can be shared by spans.
Using our example from above, the `http` prefix of the attribute is the namespace, and `url` and `status_code` are keys within that namespace.
Attributes can also be nested, for example `http.url.protocol` might be `HTTP` or `HTTPS`, whereas `http.url.path` might be `/api/v1/query`.

There are more details around semantic naming conventions which should be followed at the following link: https://opentelemetry.io/docs/specs/otel/common/attribute-naming/#recommendations-for-opentelemetry-authors
For more details around semantic naming conventions, refer to the [Recommendations for OpenTelemetry Authors](https://opentelemetry.io/docs/specs/otel/common/attribute-naming/#recommendations-for-opentelemetry-authors) documentation.

Some third-party libraries already provide auto-instrumentation that generate span and span attributes when included in a source base.
This alleviates the need for you to add spans and attributes for calling those libraries.
Some third-party libraries provide auto-instrumentation that generate span and span attributes when included in a source base.

For more information about instrumenting your app for tracing, refer to the [Instrument for distributed tracing](/docs/tempo/latest/getting-started/instrumentation/) documentation.


## Determining where to add spans

Spans make up a trace, where a trace is essentially just a meta ID that groups spans together.
Spans themselves denote units of work. This could be something that carries out some work within a service, or it could be a call from, or to, another service that is upstream or downstream.
Spans also have a hierarchy, where parent spans can have children or siblings.

When instrumenting, determine the smallest piece of work that you need to observe in a trace to be of value to ensure that you don’t over (or under) instrument.

In general, when manually instrumenting, create a new span for any work that has a relatively significant duration. This allows the observation of a trace to immediately show where significant amounts of time are spent during the processing of a request into your application or system.
Creating a new span for any work that has a relatively significant duration allows the observation of a trace to immediately show where significant amounts of time are spent during the processing of a request into your application or system.

For example, adding a span for a call to another services (either instrumented or not) may take an unknown amount of time to complete, and therefore being able to separate this work shows when services are taking longer than expected.

Adding a span for a piece of work that might call many other functions in a loop is a good signal of how long that loop is taking (you might add a span attribute that counts how many time the loop runs to determine if the duration is acceptable).
However, adding a span for each method or function call in that loop might not, as it might produce hundreds or thousands of spans that are essentially of no individual value.

## Tracing versus profiling

Tracing provides an overview of tasks performed by an operation or set of work.
Profiling provides a code-level view of what was going on.
Generally, tracing is done at a much higher level specific to one transaction, and profiling is sampled over time, aggregated over many transactions.

The superpower of tracing is seeing how a thing in one program invoked another program.

The superpower of profiling is seeing function-level or line-level detail.

For example, let’s say you want to gather trace data on how long it takes to enter and start a car. The trace would contain multiple spans:

- Walking from the resident to the car
- Unlocking the car
- Adjusting the seat
- Starting the ignition

This trace data is collected every time the car is entered and started.
You can track variations between each operation that can help pinpoint when issues happen.
If the driver forgot their keys, then that would show up as an outlying longer duration span.
In this same example, profiling gives the code stack, in minute detail: get-to-car invoked step-forward, which invoked lift-foot, which invoked contract-muscle, etc.
This extra detail provides the context that informs the data provided by a trace.
However, adding a span for each method or function call in that loop might not, as it might produce hundreds or thousands of worthless spans.
29 changes: 29 additions & 0 deletions docs/sources/tempo/traces.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,37 @@ Traces can help you find bottlenecks. A trace can be visualized to give a graphi

Metrics, logs, and traces form the three pillars of observability. Metrics provide the health data about the state of a system. Logs provide an audit trail of activity that create an informational context. Traces tell you what happens at each step or action in a data pathway.

## Tracing versus profiling

Tracing provides an overview of tasks performed by an operation or set of work.
Profiling provides a code-level view of what was going on.
Generally, tracing is done at a much higher level specific to one transaction, and profiling is sampled over time, aggregated over many transactions.

The superpower of tracing is seeing how a thing in one program invoked another program.

The superpower of profiling is seeing function-level or line-level detail.

For example, let’s say you want to gather trace data on how long it takes to enter and start a car. The trace would contain multiple spans:

- Walking from the resident to the car
- Unlocking the car
- Adjusting the seat
- Starting the ignition

This trace data is collected every time the car is entered and started.
You can track variations between each operation that can help pinpoint when issues happen.
If the driver forgot their keys, then that would show up as an outlying longer duration span.
In this same example, profiling gives the code stack, in minute detail: get-to-car invoked step-forward, which invoked lift-foot, which invoked contract-muscle, etc.
This extra detail provides the context that informs the data provided by a trace.

## Terminology

Active series
: A time series that receives new data points or samples.

Cardinality
: The total combination of key/value pairs, such as labels and label values for a given metric series or log stream, and how many unique combinations they generate.

Data source
: A basic storage for data such as a database, a flat file, or even live references or measurements from a device. A file, database, or service that provides data. For example, traces data is imported into Grafana by configuring and enabling a Tempo data source.

Expand Down