grafana · zalegrala · Aug 22, 2024 · Aug 19, 2024 · Aug 19, 2024 · Aug 20, 2024
@@ -26,7 +26,8 @@ In addition, the [Tempo runbook](https://github.com/grafana/tempo/blob/main/oper
 - [Queries fail with 500 and "error using pageFinder"]({{< relref "./bad-blocks" >}})
 - [I can search traces, but there are no service name or span name values available]({{< relref "./search-tag" >}})
 - [Error message `response larger than the max (<number> vs <limit>)`]({{< relref "./response-too-large" >}})
+- [Search results don't match trace lookup results with long-running traces]({{< relref "./long-running-traces" >}})
 
 ## Metrics Generator
 
-- [Metrics or service graphs seem incomplete]({{< relref "./metrics-generator" >}})
+- [Metrics or service graphs seem incomplete]({{< relref "./metrics-generator" >}})
@@ -0,0 +1,62 @@
+---
+title: Long-running traces
+description: Troubleshoot search results when using long-running traces
+weight: 479
+aliases:
+  - ../operations/troubleshooting/long-running-traces/
+---
+
+# Long-running traces
+
+Long-running traces are created when Tempo receives spans for a trace,
+followed by a delay, and then Tempo receives additional spans for the same
+trace. If the delay between spans is great enough, the spans end up in
+different blocks, which can lead to inconsistency in a few ways:
+
+1. When using TraceQL search, the duration information only pertains to a
+   subset of the blocks that contain a trace. This happens because Tempo
+   consults only enough blocks to know the TraceID of the matching spans. When
+   performing a TraceID lookup, Tempo searches for all parts of a trace in all
+   matching blocks, which yields greater accuracy when combined.
+
+1. When using [`spanset`
+   operators](https://grafana.com/docs/tempo/latest/traceql/#combining-spansets),
+   Tempo only evaluates the contiguous trace of the current block. This means
+   that for a single block the conditions may evaluate to false, but to
+   consider all parts of the trace from all blocks would evaluate true.
+
+You can tune the `ingester.trace_idle_period` configuration to allow for
+greater control about when traces are written to a block. Extending this beyond
+the default `10s` can allow for long running trace to be co-located in the same
+block, but take into account other considerations around memory consumption on
+the ingesters. Currently this setting isn't per-tenant, and so adjusting
+affects all ingester instances.
+
+Tempo publishes a `tempo_warnings_total` metric from several components, which
+can aid in understanding when this situation arises. In particular, the following query can be used to know what percentage of traces which are flushed to the wall are connected.
+
+```
+1 - sum(rate(tempo_warnings_total{reason="disconnected_trace_flushed_to_wal"}[5m])) / sum(rate(tempo_ingester_traces_created_total{}[5m]))
+```
+
+If you have long-running traces, you may also be interested in the
+`rootless_trace_flushed_to_wal` reason to know when a trace is flushed to the
+wall without a root trace.
+
+You can use `reason` fields for discovery with this query:
+
+```
+sum(rate(tempo_warnings_total{}[5m])) by (reason)
+```
+
+In general, Tempo functions at its peak when all parts of a trace are stored
+within as few blocks as possible. There is a wide variety of tracing patterns
+in the wild, which makes it impossible to optimize for all of them.
+
+While the preceding information can help determine what Tempo is doing, it may
+be worth modifying the usage pattern slightly. For example, you may want to use
+[span
+links](https://opentelemetry.io/docs/concepts/signals/traces/#span-links), so
+that traces are split up, allowing one trace to complete, while pointing to the
+next trace in the causal chain . This allows both traces to finish in a
+shorter duration, and increase the chances of ending up in the same block.