Releases: openzipkin/zipkin
Zipkin 1.25
Zipkin 1.25 lets you to disable the query api when deploying collector-only services. It also lets you log http requests sent to Elasticsearch. Finally, it fixes a bug where a non-default MySQL schema would fail health checks.
Disabling the UI and Query api for collector-only servers
@SirTyro's security team wants collectors deployed separately, in a way that reduces exposure if compromised. You can now disable the api and UI by setting QUERY_ENABLED=false. Thanks to @shakuzen for help implementing this.
Understanding Zipkin's requests to Elasticsearch
Reflecting on a troubleshooting session with @ezraroi, we could have used more data to understand why an Elasticsearch index template was missing. This would have saved us time. You can now set ES_HTTP_LOGGING=BASIC
to see what traffic is sent from zipkin to Elasticsearch. Other options include HEADER
and BODY
. Thanks to OkHttp for the underlying interceptor that does this.
Fixed health check when you have a non-default MySQL schema
@zhanglc stumbled upon a bug where the health check misreported a service unhealthy if it had a non-default schema. This is now fixed.
Zipkin 1.24
Zipkin 1.24 enables search by binary annotation (aka tag) key. It also adds a helper for those parsing IP addresses. Finally, it fixes a bug where server-side service names weren't indexed.
Search by binary annotation (aka tag) key
Before, you could only search by exact key=value match on binary annotations (aka tags).
Thanks to @kellabyte for noticing we can gain a lot of value by allowing search on tag
key. Ex "error" search now returns any traces that include an error, regardless of the
message.
This change is now in, and here's the impact:
- Cassandra will index a bit more: once per unique service/tag key
- Elasticsearch now does two nested queries when looking for a key
- MySQL now considers all annotation rows when looking for a key
Helps tracers be more safe about IP Addresses
Before, tracers including Brave and Finagle blindly assumed addresses
and strings were IPv4. While that's usually the case, it can lead to
very late problems, such as runtime exceptions.
Zipkin 1.24 adds a utility to encourage safe parsing practice of potentially
null inputs. This re-uses code from guava (without adding a dependency),
avoiding troublesome IP from name service lookups.
Ex. if your input is an HttpServletRequest
, the following is safe:
if (!builder.parseIp(input.getHeader("X-Forwarded-For"))) {
builder.parseIp(input.getRemoteAddr());
}
Fixed mid-tier service name indexing
@garyd203 stumbled upon a bug where we weren't indexing mid-tier service names.
Basically, you couldn't search for a service that wasn't itself a client of something else.
Surprisingly, this affected all data stores. Lots of thanks to Gary for writing the test,
which made implementation a breeze.
Zipkin 1.23
Zipkin 1.23 improves Elasticsearch performance and simplifies image assembly
Elasticsearch performance
The zipkin UI has drop-downs for service and span name. These queries can become troublesome as the data set grows. Particularly in Elasticsearch, we had poor performance because nested terms queries are expensive. We now pre-index a "servicespan" type which flattens the query, making it far more performant. You have no action to take except to upgrade. Thanks especially to @semyonslepov and @devinsba for rolling this out to production and verifying this improves things.
Simplified image assembly
Besides normal advantages of updating, Spring Boot 1.5 cleaned up an integration pattern we use for plugging in azure and aws support into our stock server. These layered docker images are easier to understand and simpler now. Thanks to @dsyer for championing our cause upstream.
Zipkin 1.22
Zipkin 1.22 disables the Scribe collector and removes the Elasticsearch native transport. It also includes some bug fixes and new knobs. Many thanks to our new contributors for help on this release.
Scribe disabled by default (#1540)
Scribe (thrift RPC listening port 9410) was the original span transport. Most sites stopped using it after we added http and Kafka support (and more recently AWS and Azure collectors). Meanwhile, people have been confused about port 9410, accidentally sending http traffic to it. Moreover, those wanting a single-port for zipkin were thwarted by this. Starting with Zipkin 1.22, we disable scribe by default, which means zipkin only listens on port 9411. Those who want to turn on scribe must set SCRIBE_ENABLED=true
.
Elasticsearch native transport dropped (#1547)
Most tools in the Elasticsearch world use HTTP as means to insert or query data. Before, we supported storage commands via the http transport (port 9200) or the native transport (port 9300). Polling users, we found no resistance to removing the native transport. By removing this, we deleted 3.6K lines of code and simplified the dependencies of the project, allowing for easier maintenance moving forward. To reduce impact, if you have configuration pointing to port 9300, the server will attempt to use 9200 instead.
Elasticsearch X-Pack (formerly Shield) security (#1548)
A number of users requested the ability to authenticate connections to secured Elasticsearch sites (which use X-Pack, formerly Shield, security). @StephenWithPH championed this issue, and @NithinMadhavanpillai implemented it (as his first pull request!). Thanks to these folks, you can now set ES_USERNAME
and ES_PASSWORD
accordingly.
Zipkin UI sort order is now preserved (#1543)
Another lingering nag we had was that when people selected a sort order in the UI, that order wasn't preserved in follow-up queries. You can imagine this was annoying. @joel-airspring picked this up off the backlog and implemented a fix to the problem (first pull request to Zipkin!). Less mouse clicks are thanks to him!
Other small fixes
Zipkin 1.20
Zipkin 1.20 focuses on Elasticsearch
There are two main changes:
@ImFlog added a new parameter which helps when you can't use hyphenated date format
ES_DATE_SEPARATOR
: The separator used when generating dates in index.
Defaults to '-' so the queried index look likezipkin-yyyy-DD-mm
.
Could for example be changed to '.' to givezipkin-yyyy.MM.dd
Moving to http for communication between Zipkin and Elasticsearch
We also decoupled elasticsearch from the transport protocol. Those using the server won't see
impact as it is transparent.
Those not yet using http should switch as soon as possible as we won't support
the transport protocol going forward (#1511).
If you are using Amazon or the zipkin-dependencies spark job, you are unaffected
as they always used http.
If you are using Elasticsearch with zipkin-server, you'd transition like below:
# this implicitly uses the transport protocol
$ STORAGE_TYPE=elasticsearch ES_HOSTS=1.2.3.4:9300,5.6.7.8:9300 ...
# change to this, for the http protocol
$ STORAGE_TYPE=elasticsearch ES_HOSTS=http://1.2.3.4:9200,http://5.6.7.8:9200 ...
If you are using Zipkin's Elasticsearch storage library directly, you'd transition like below:
// this implicitly uses the transport protocol
es = ElasticsearchStorage.builder()
.index("my_custom_prefix").build();
// change to this, for the http protocol
es = ElasticsearchHttpStorage.builder()
.index("my_custom_prefix").build();
Note that ElasticsearchHttpStorage
works with Elasticsearch 2.x+ and only has library dependencies on OkHttp, Moshi, and Zipkin itself. Unlike its predecessor ElasticsearchStorage
, you aren't pinned to a specific ES or Guava library version. (#1431)
Zipkin 1.19
Zipkin 1.19 includes UI and server improvements and controls in-flight requests to Elasticsearch
Our first release of the new year includes code from a couple Zipkin newcomers: Jakub and Nomy
- @jakubhava beautified the UI when json is used as a tag (aka binary annotation) value (#1458)
- @naoman fixed span merge logic (#1443) and an edge case in very large tags (#1451)
We're also lucky that Chris and Jeanneret continue to fix issues for the rest of us
- @fedj solved a couple hard-to-diagnose errors
- @cburroughs continues to pare down the UI issue backlog
We also have a new feature for Elasticsearch users (those using http to connect)
Before, we limited in-flight http connections per Elasticsearch host to 5. This is now 64 and can be adjusted by the ES_MAX_REQUESTS
variable. (#1450)
Zipkin 1.18
Zipkin 1.18 includes a number of UI fixes and exposes arbitrary Kafka configuration
Thanks to @cburroughs, many Zipkin UI glitches are addressed, including the ability to escape out of dialog boxes and fix the default trace view to Expand All services. Chris also reduced its minified size from 2.2 MiB to 821KiB, which will improve load performance and also reduce bandwidth usage.
Also notable in 1.18 is Kafka configuration. You can now override any Kafka consumer property using zipkin.collector.kafka.overrides
as a CLI argument or system property.
For example, to override "overrides.auto.offset.reset", you can set a prefixed system property:
$ KAFKA_ZOOKEEPER=127.0.0.1:2181 java -Dzipkin.collector.kafka.overrides.auto.offset.reset=largest -jar zipkin.jar
Thanks to our volunteers for continued improvements, and to our users for improvement suggestions.
Zipkin 1.16
Zipkin 1.16 includes support for Elasticsearch 5, as well some useful features from new contributors.
- @ys added SSL support for Cassandra, accessed via the
CASSANDRA_USE_SSL
variable - @cburroughs made our json parser lenient with regards to IPv4 mapped addresses in json
- All docker images are bumped to JRE 1.8.0_112
elasticsearch-http
storage type now supports Elasticsearch 5...
Some expressed interest in the new ingest pipeline feature of ES 5. In the context of Zipkin, pipelines affect spans after they are collected, but before they are indexed. For example, pipelines could correct service names, delete sensitive information, or derive lookup keys. In some cases, this feature obviates the need for a custom zipkin build.
For example, the following pipeline adds a timestamp of when a span got to elasticsearch. You could use that to plot reporting lag. Since the pipeline below is named zipkin, you'd set ES_PIPELINE=zipkin
to enable it.
$ curl -X PUT -s your_elasticsearch_node:9200/_ingest/pipeline/zipkin -d '{
"description" : "add es_timestamp",
"processors" : [
{
"set" : {
"field": "es_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}'
Zipkin 1.15
Zipkin 1.15 completes the transition to support 128-bit trace IDs, notably considering high resolution ids when querying and grouping traces.
Regular zipkin usage is unimpacted as this is all behind the scenes. However, the below details will be interesting to some and particularly of note during any transition from 64-128 bit trace IDs.
128-bit trace IDs
Zipkin supports 64 and 128-bit trace identifiers, typically serialized
as 16 or 32 character hex strings. By default, spans reported to zipkin
with the same trace ID will be considered in the same trace.
For example, 463ac35c9f6413ad48485a3953bb6124
is a 128-bit trace ID,
while 48485a3953bb6124
is a 64-bit one.
Note: Span (or parent) IDs within a trace are 64-bit regardless of the
length or value of their trace ID.
Migrating from 64 to 128-bit trace IDs
Unless you only issue 128-bit traces when all applications support them,
the process of updating applications from 64 to 128-bit trace IDs results
in a mixed state. This mixed state is mitigated by the setting
STRICT_TRACE_ID=false
, explained below. Once a migration is complete,
remove the setting STRICT_TRACE_ID=false
or set it to true.
Here are a few trace IDs the help what happens during this setting.
- Trace ID A: 463ac35c9f6413ad48485a3953bb6124
- Trace ID B: 48485a3953bb6124
- Trace ID C: 463ac35c9f6413adf1a48a8cff464e0e
- Trace ID D: 463ac35c9f6413ad
In a 64-bit environment, trace IDs will look like B or D above. When an
application upgrades to 128-bit instrumentation and decides to create a
128-bit trace, its trace IDs will look like A or C above.
Applications who aren't yet 128-bit capable typically only retain the
right-most 16 characters of the trace ID. When this happens, the same
trace could be reported as trace ID A or trace ID B.
By default, Zipkin will think these are different trace IDs, as they are
different strings. During a transition from 64-128 bit trace IDs, spans
would appear split across two IDs. For example, it might start as trace
ID A, but the next hop might truncate it to trace ID B. This would render
the system unusable for applications performing upgrades.
One way to address this problem is to not use 128-bit trace IDs until
all applications support them. This prevents a mixed scenario at the cost
of coordination. Another way is to set STRICT_TRACE_ID=false
.
When STRICT_TRACE_ID=false
, only the right-most 16 of a 32 character
trace ID are considered when grouping or retrieving traces. This setting
should only be applied when transitioning from 64 to 128-bit trace IDs
and removed once the transition is complete.
See openzipkin/b3-propagation#6 for the status
of known open source libraries on 128-bit trace identifiers.
Cassandra
There's no impact to the cassandra
(Cassandra 2.x) schema. The experimental cassandra3
schema has changed and needs to be recreated.
Elasticsearch
When STRICT_TRACE_ID=false
, the indexing template will be less efficient as it tokenizes trace IDs. Don't set STRICT_TRACE_ID=false
unless you really need to.
MySQL
There are no schema changes since last versions, but you'll likely want to add indexes in consideration of 128bit trace IDs.
ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`, `id`);
ALTER TABLE zipkin_spans ADD INDEX(`trace_id_high`, `trace_id`);
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`, `span_id`);
ALTER TABLE zipkin_annotations ADD INDEX(`trace_id_high`, `trace_id`);
Java Api
The STRICT_TRACE_ID
variable above corresponds to zipkin.storage.StorageComponent.Builder.strictTraceId
. Those using storage components directly will want to set this to false under similar circumstances to those described above.
We've added methods to SpanStore
, in support of high-resolution gets. Traces with 64-bit ids are retrieved by simply passing 0 as traceIdHigh.
@Nullable
List<Span> getTrace(long traceIdHigh, long traceIdLow);
@Nullable
List<Span> getRawTrace(long traceIdHigh, long traceIdLow);
Zipkin 1.14
Zipkin 1.14 introduces support for 128-bit trace identifiers
Most zipkin sites store traces for a limited amount of time (like 2 days) and also trace a small percentage of operations (via sampling). For these reasons and also those of simplicity, 64-bit trace identifiers have been the norm since zipkin started over 4 years ago.
Starting with Zipkin 1.14, 128-bit trace identifiers are also supported. This can be useful in sites that have very large traffic volume, persist traces forever, or are re-using externally generated 128-bit IDs as trace IDs. You can also use 128-bit trace ids to interop with other 128-bit systems such as Google Stackdriver Trace. Note: span IDs within a trace are still 64-bit.
When 128-bit trace ids are propagated, they will be twice as long as before. For example, the X-B3-TraceId
header will hold a 32-character value like 163ac35c9f6413ad48485a3953bb6124
. Prior to Zipkin 1.14, we updated all major tracing libraries to silently truncate long trace ids to 64-bit. With the example noted, its 64-bit counterpart would be 48485a3953bb6124
. For the foreseeable future, you will be able to lookup a trace by either its 128-bit or 64-bit ID. This allows you to upgrade your instrumentation and environment in steps.
Should you want to use 128-bit tracing today, you'll need to update to latest Zipkin, and if using MySQL, issue the following DDL update:
ALTER TABLE zipkin_spans ADD `trace_id_high` BIGINT NOT NULL DEFAULT 0;
ALTER TABLE zipkin_annotations ADD `trace_id_high` BIGINT NOT NULL DEFAULT 0;
ALTER TABLE zipkin_spans
DROP INDEX trace_id,
ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `id`) COMMENT 'ignore insert on duplicate';
ALTER TABLE zipkin_annotations
DROP INDEX trace_id,
ADD UNIQUE KEY(`trace_id_high`, `trace_id`, `span_id`, `a_key`, `a_timestamp`) COMMENT 'Ignore insert on duplicate';
Next, you'll need to use a library that supports generating 128-bit ids. The first two to support this are zipkin-go-opentracing v0.2 and Brave (java) v3.5. The supporting change in thrift is a new trace_id_high field.
If you have any further questions on this feature, reach out to us on gitter: https://gitter.im/openzipkin/zipkin