Skip to content

Commit

Permalink
[DOC] Release notes for 2.6 (#4048)
Browse files Browse the repository at this point in the history
  • Loading branch information
knylander-grafana authored Sep 4, 2024
1 parent 3e26204 commit c14b2ab
Show file tree
Hide file tree
Showing 2 changed files with 373 additions and 3 deletions.
247 changes: 247 additions & 0 deletions docs/sources/tempo/release-notes/v2-6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
---
title: Version 2.6 release notes
menuTitle: V2.6
description: Release notes for Grafana Tempo 2.6
weight: 30
---

# Version 2.6 release notes

The Tempo team is pleased to announce the release of Tempo 2.5.

This release gives you:

* Additions to the TraceQL language, including the ability to search by span events, links, and arrays
* Additions to TraceQL metric query-types including a compare function and the ability to do instant queries (which will return faster than range queries).
* Performance and stability enhancements

<!-- add link when blog is live
Read the [Tempo 2.6 blog post](https://grafana.com/blog/2024/06/03/grafana-tempo-2.5-release-vparquet4-streaming-endpoints-and-more-metrics/) for more examples and details about these improvements.
-->

These release notes highlight the most important features and bugfixes. For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases).

<!-- add video from blog {{< youtube id=" " >}} -->

## Features and enhancements

The most important features and enhancements in Tempo 2.6 are highlighted below.

### Additional TraceQL metrics (experimental)

In this release, we’ve added several [TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/). In Tempo 2.6, TraceQL metrics adds:

* Exemplars [[PR 3824](https://github.com/grafana/tempo/pull/3824), [documentation](https://grafana.com/docs/tempo/next/traceql/metrics-queries/#exemplars)]
* Instant metrics queries using `/api/metrics/query` [[PR 3859](https://github.com/grafana/tempo/pull/3859), [documentation](https://grafana.com/docs/tempo/next/api_docs/#traceql-metrics)]
* A `q` parameter to tag-name filtering the search v2 API [[PR 3822](https://github.com/grafana/tempo/pull/3822), [documentation](https://grafana.com/docs/tempo/next/api_docs/#traceql-metrics)]
* A new `compare()` metrics function [[PR 3695](https://github.com/grafana/tempo/pull/3695), [documentation](https://grafana.com/docs/tempo/latest/traceql/metrics-queries/#functions)]

Additionally, we're working on refactoring the replication factor. Refer to the [Operational change for TraceQL metrics](#operational-change-for-traceql-metrics) section for details.

Note that using TraceQL metrics may require additional system resources.

For more information, refer to the [TraceQL metrics queries](https://grafana.com/docs/tempo/latest/traceql/metrics-queries/) and [Configure TraceQL metrics](https://grafana.com/docs/tempo/latest/operations/traceql-metrics/).

### TraceQL improvements

Unique to Tempo, TraceQL is a query language that lets you perform custom queries into your tracing data. To learn more about the TraceQL syntax, refer to the [TraceQL documentation](https://grafana.com/docs/tempo/latest/traceql/).

We’ve added event attributes and link scopes. Like spans, they both have instrinsics and attributes.

The `event` scope lets you query events that happen within a span. A span event is a unique point in time during the span’s duration. While spans help build the structural hierarchy of your services, span events can provide a deeper level of granularity to help debug your application faster and maintain optimal performance. To learn more about how you can use span events, read the [What are span events?](https://grafana.com/blog/2024/08/15/all-about-span-events-what-they-are-and-how-to-query-them/) blog post. [PRs [3708](https://github.com/grafana/tempo/pull/3708), [3708](https://github.com/grafana/tempo/pull/3748), [3908](https://github.com/grafana/tempo/pull/3908)]

If you've instrumented your traces for span links, you can use the `link` scope to search for an attribute within a span link. A span link associates one span with one or more other spans. [PRs [3814](https://github.com/grafana/tempo/pull/3814), [3741](https://github.com/grafana/tempo/pull/3741)]

For more information on span links, refer to the [Span Links](https://opentelemetry.io/docs/concepts/signals/traces/#span-links) documentation in the Open Telemetry project.

You can search for an attribute in your link:

```
{ link.opentracing.ref_type = "child_of" }
```

![A TraceQL example showing `link` scope](/media/docs/grafana/data-sources/tempo/query-editor/traceql-link-example.png)

We’ve also added autocomplete support for `events` and `links`. [[PR 3846](https://github.com/grafana/tempo/pull/3846)]

Tempo 2.6 improves TraceQL performance with these updates:

* Performance improvement for `rate() by ()` queries [[PR 3719](https://github.com/grafana/tempo/pull/3719)]
* Add caching to query range queries [[PR 3796](https://github.com/grafana/tempo/pull/3796)]
* Only stream diffs on metrics queries [[PR 3808](https://github.com/grafana/tempo/pull/3808)]
* Tag value lookup use protobuf internally for improved latency [[PR 3731](https://github.com/grafana/tempo/pull/3731)]
* TraceQL metrics queries use protobuf internally for improved latency [[PR 3745](https://github.com/grafana/tempo/pull/3745)]
* TraceQL search and other endpoints use protobuf internally for improved latency and resource usage [[PR 3944](https://github.com/grafana/tempo/pull/3944)]
* Add local disk caching of metrics queries in local-blocks processor [[PR 3799](https://github.com/grafana/tempo/pull/3799)]
* Performance improvement for queries using trace-level intrinsics [[PR 3920](https://github.com/grafana/tempo/pull/3920)]
* Use multiple goroutines to unmarshal responses in parallel in the query frontend. [[PR 3713](https://github.com/grafana/tempo/pull/3713)]

### Native histogram support

The metrics-generator can produce native histograms for high-resolution data. [PR 3789](https://github.com/grafana/tempo/pull/3789)

Native histograms are a data type in Prometheus that can produce, store, and query high-resolution histograms of observations. It usually offers higher resolution and more straightforward instrumentation than classic histograms.

To learn more, refer to the [Native histogram](https://grafana.com/docs/tempo/&lt;TEMPO_VERSION>/metrics-generator/#native-histograms) documentation.

### Performance improvements

One of our major improvements in Tempo 2.6 is the reduction of memory usage due to polling improvements. [PRs [3950](https://github.com/grafana/tempo/pull/3950), [3951](https://github.com/grafana/tempo/pull/3951), [3952](https://github.com/grafana/tempo/pull/3952])

![Comparison graph showing the reduction of memory usage due to the polling improvements](/media/docs/tempo/tempo-2-6-poll-improvement.png)

This improvement is a result of some of these changes:

* Add data quality metric to measure traces without a root [[PR 3812](https://github.com/grafana/tempo/pull/3812)]
* Reduce memory consumption of query-frontend [[PR 3888](https://github.com/grafana/tempo/pull/3888)]
* Reduce allocs of caching middleware [[PR 3976](https://github.com/grafana/tempo/pull/3976)]
* Reduce allocs building queriers sharded requests [[PR 3932](https://github.com/grafana/tempo/pull/3932)]
* Improve trace id lookup from Tempo Vulture by selecting a date range [[PR 3874](https://github.com/grafana/tempo/pull/3874)]

### Other enhancements and improvements

This release also has these notable updates:

* Bring back OTel receiver metrics. [[PR 3917](https://github.com/grafana/tempo/pull/3917)]
* Add a `q` parameter to `/api/v2/search/tags` for tag name filtering. [[PR 3822](https://github.com/grafana/tempo/pull/3822)]
* Add middleware to block matching URLs. [[PR 3963](https://github.com/grafana/tempo/pull/3963)]
* Add data quality metric to measure traces without a root. [[PR 3812](https://github.com/grafana/tempo/pull/3812)]
* Implement polling tenants concurrently. [[PR 3647](https://github.com/grafana/tempo/pull/3647)]
* Add [native histograms](https://grafana.com/docs/tempo/next/metrics-generator/#native-histograms) for internal metrics [[PR 3870](https://github.com/grafana/tempo/pull/3870)]
* Add a Tempo CLI command to drop traces by id by rewriting blocks. [[PR 3856](https://github.com/grafana/tempo/pull/3856), [documentation](https://grafana.com/docs/tempo/next/operations/tempo_cli/#drop-trace-by-id)]
* Add new OTel compatible Traces API V2. [[PR 3912](https://github.com/grafana/tempo/pull/3912), [documentation](https://grafana.com/docs/tempo/next/api_docs/#query-v2)]
* Rename `Batches` to `ResourceSpans`. [[PR 3895](https://github.com/grafana/tempo/pull/3895)]

## Upgrade considerations

When [upgrading](https://grafana.com/docs/tempo/latest/setup/upgrade/) to Tempo 2.6, be aware of these considerations and breaking changes.

### Operational change for TraceQL metrics

We've changed to an RF1 (Replication Factor 1) pattern for TraceQL metrics as we were unable to hit performance goals for RF3 de-duplication. This requires some operational changes to query TraceQL metrics.

TraceQL metrics are still considered experimental. We hope to mark them GA soon when we productionize a complete RF1 write-read path. [PRs [3628](https://github.com/grafana/tempo/pull/3628), [3691]([https://github.com/grafana/tempo/pull/3691](https://github.com/grafana/tempo/pull/3691)), [3723]([https://github.com/grafana/tempo/pull/3723](https://github.com/grafana/tempo/pull/3723)), [3995]([https://github.com/grafana/tempo/pull/3995](https://github.com/grafana/tempo/pull/3995))]

**For recent data**

The local-blocks processor must be enabled to start using metrics queries like `{ } | rate()`. If not enabled metrics queries fail with the error `localblocks processor not found`. Enabling the local-blocks processor can be done either per tenant or in all tenants.

* Per-tenant in the per-tenant overrides:

```yaml
overrides:
'tenantID':
metrics_generator_processors:
- local-blocks
```
* By default, for all tenants in the main config:
```yaml
overrides:
defaults:
metrics_generator:
processors: [local-blocks]
```
Add this configuration to run TraceQL metrics queries against all spans (and not just server spans):
```yaml
metrics_generator:
processor:
local_blocks:
filter_server_spans: false
```
**For historical data**
To run metrics queries on historical data, you must configure the local-blocks processor to flush rf1 blocks to object storage:
```yaml
metrics_generator:
processor:
local_blocks:
flush_to_storage: true
```
### Transition to vParquet4
vParquet4 format is now the default block format.
It's production ready and we highly recommend switching to it for improved query performance. [PR [3810](https://github.com/grafana/tempo/pull/3810)]
Upgrading to Tempo 2.6 modifies the Parquet block format.
Although you can use Tempo 2.6 with vParquet2 or vParquet3, you can only use Tempo 2.6 with vParquet3.
You can also use the `tempo-cli analyse blocks` command to query vParquet4 blocks. [PR 3868](https://github.com/grafana/tempo/pull/3868)].
Refer to the [Tempo CLI ](https://grafana.com/docs/tempo/next/operations/tempo_cli/#analyse-blocks)documentation for more information.

For information on upgrading, refer to [Upgrade to Tempo 2.6](https://grafana.com/docs/tempo/next/setup/upgrade/) and [Choose a different block format](https://grafana.com/docs/tempo/next/configuration/parquet/#choose-a-different-block-format).

### Updated, removed, or renamed configuration parameters

<table>
<tr>
<td><strong>Parameter</strong>
</td>
<td><strong>Comments</strong>
</td>
</tr>
<tr>
<td>
<code>storage:<br />
&nbsp;&nbsp; azure:<br />
&nbsp;&nbsp;&nbsp;&nbsp;use_v2_sdk: </code>
</td>
<td>Removed. Azure v2 is the only and primary Azure backend [PR <a href="https://github.com/grafana/tempo/pull/3875">3875</a>]
</td>
</tr>
<tr>
<td><code>autocomplete_filtering_enabled</code>
</td>
<td>The feature flag option has been removed. The feature is always enabled. [PR <a href="https://github.com/grafana/tempo/pull/3729">3729</a>]
</td>
</tr>
<tr>
<td><code>completedfilepath</code> and <code>blocksfilepath</code>
</td>
<td>Removed unused WAL configuration options. [PR <a href="https://github.com/grafana/tempo/pull/3911">3911</a>]
</td>
</tr>
<tr>
<td><code>compaction_disabled</code>
</td>
<td>New. Allow compaction disablement per-tenant. [PR <a href="https://github.com/grafana/tempo/pull/3965">3965</a>, <a href="https://grafana.com/docs/tempo/next/configuration/#overrides">documentation</a>]
</td>
</tr>
<tr>
<td>
<code>
Storage:<br />
&nbsp;&nbsp;s3:<br />
&nbsp;&nbsp;&nbsp;&nbsp;[enable_dual_stack: &lt;bool>]</code>
</td>
<td>Boolean flag to activate or deactivate <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/dual-stack-endpoints.html">dualstack mode</a> on the Storage block configuration for S3. [PR <a href="https://github.com/grafana/tempo/pull/3721">3721</a>, <a href="https://grafana.com/docs/tempo/next/configuration/#standard-overrides">documentation</a>]
</td>
</tr>
</table>

## Bugfixes

For a complete list, refer to the [Tempo changelog](https://github.com/grafana/tempo/releases).

* Fix panic in certain metrics queries using `rate()` with `by`. [[PR 3847](https://github.com/grafana/tempo/pull/3847)]
* Fix metrics queries when grouping by attributes that may not exist. [[PR 3734](https://github.com/grafana/tempo/pull/3734)]
* Fix metrics query histograms and quantiles on `traceDuration`. [[PR 3879](https://github.com/grafana/tempo/pull/3879)]
* Fix divide by 0 bug in query frontend exemplar calculations. [[PR 3936](https://github.com/grafana/tempo/pull/3936)]
* Fix autocomplete of a query using scoped instrinsics. [[PR 3865](https://github.com/grafana/tempo/pull/3865)]
* Improved handling of complete blocks in localblocks processor after enabling flushing. [[PR 3805](https://github.com/grafana/tempo/pull/3805)]
* Fix double appending the primary iterator on second pass with event iterator. [[PR 3903](https://github.com/grafana/tempo/pull/3903)]
* Fix frontend parsing error on cached responses [[PR 3759](https://github.com/grafana/tempo/pull/3759)]
* `max_global_traces_per_user`: take into account `ingestion.tenant_shard_size` when converting to local limit. [[PR 3618](https://github.com/grafana/tempo/pull/3618)]
* Fix HTTP connection reuse on GCP and AWS by reading `io.EOF` through the `http` body. [[PR 3760](https://github.com/grafana/tempo/pull/3760)]
* Handle out of boundaries spans kinds. [[PR 3861](https://github.com/grafana/tempo/pull/3861)]
* Maintain previous tenant blocklist on tenant errors. [PR 3860](https://github.com/grafana/tempo/pull/3860)
* Fix prefix handling in Azure backend `Find()` call. [[PR 3875](https://github.com/grafana/tempo/pull/3875)]
* Correct block end time when the ingested traces are outside the ingestion slack. [[PR 3954](https://github.com/grafana/tempo/pull/3954)]
* Fix race condition where a streaming response could be marshaled while being modified in the combiner resulting in a panic. [[PR 3961](https://github.com/grafana/tempo/pull/3961)]
* Pass search options to the backend for `SearchTagValuesBlocksV2` requests. [[PR 3971](https://github.com/grafana/tempo/pull/3971)]
Loading

0 comments on commit c14b2ab

Please sign in to comment.