Skip to content

Releases: openzipkin/brave

Brave 4.17

04 Mar 07:23
Compare
Choose a tag to compare

Brave 4.17 adds Dubbo and RabbitMQ instrumentation and a Bill of Materials (BOM) for easier versioning

Dubbo instrumentation

Dubbo is an RPC framework originally written by Alibaba, now in the Apache Incubator. A Dubbo Provider is a server, and a Dubbo Consumer is a client in Zipkin terminology.

We've just added a tracing filter which creates RPC spans based on communication between your services.

It is easy to configure, just make sure a Brave Tracing object is available in your Spring context and indicate you want tracing on like so:

<!-- a provider -->
<dubbo:service filter="tracing" interface="com.alibaba.dubbo.demo.DemoService" ref="demoService"/>

<!-- or a consumer -->
<dubbo:reference filter="tracing" id="demoService" check="false" interface="com.alibaba.dubbo.demo.DemoService"/>

Note: layered projects such as dubbo-spring-boot-starter provide non-XML means to do the same.

谢谢 @blacklau @sdcuike and @jessyZu who wrote dubbo filters for Brave in the past

Spring RabbitMQ instrumentation

RabbitTemplate commands and @RabbitListener-driven message consumers are now automatically instrumented with brave-spring-rabbit. This is due to a lot of hard work at Tyro and after work, too, by @jonathan-lo. Notably message tracing is tricky, so Jonathan deserves a big shout-out for this.

To set this up, wrap your various bits with SpringRabbitTracing like so (or use a tool that does this automatically):

@Bean
public SpringRabbitTracing springRabbitTracing(Tracing tracing) {
  return SpringRabbitTracing.newBuilder(tracing)
                            .remoteServiceName("my-mq-service")
                            .build();
}

@Bean
public RabbitTemplate rabbitTemplate(
    ConnectionFactory connectionFactory,
    SpringRabbitTracing springRabbitTracing
) {
  RabbitTemplate rabbitTemplate = springRabbitTracing.newRabbitTemplate(connectionFactory);
  // other customizations as required
  return rabbitTemplate;
}

@Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory(
    ConnectionFactory connectionFactory,
    SpringRabbitTracing springRabbitTracing
) {
  return springRabbitTracing.newSimpleMessageListenerContainerFactory(connectionFactory);
}

Afterwards, you'll see PRODUCER and CONSUMER spans coming out of rabbit that look similar to our kafka instrumentation.

Easier versions with Bill of Materials (BOM)

When using multiple brave components, you'll want to align versions in
one place. This allows you to more safely upgrade, with less worry about
conflicts. Thanks to @jorgheymans and @marcingrzejszczak for putting
this together!

You can use our Maven instrumentation BOM (Bill of Materials) for this:

Ex. in your dependencies section, import the BOM like this:

  <dependencyManagement>
    <dependencies>
      <dependency>
        <groupId>io.zipkin.brave</groupId>
        <artifactId>brave-bom</artifactId>
        <version>${brave.version}</version>
        <type>pom</type>
        <scope>import</scope>
      </dependency>
    </dependencies>
  </dependencyManagement>

Now, you can leave off the version when choosing any supported
instrumentation. Also any indirect use will have versions aligned:

<dependency>
  <groupId>io.zipkin.brave</groupId>
  <artifactId>brave-instrumentation-okhttp3</artifactId>
</dependency>

With the above in place, you can use the properties brave.version,
zipkin-reporter.version or zipkin.version to override dependency
versions coherently. This is most commonly to test a new feature or fix.

Note: If you override a version, always double check that your version
is valid (equal to or later) than what you are updating. This will avoid
class conflicts.

Brave 4.16

26 Feb 07:28
Compare
Choose a tag to compare

Brave 4.16 uses the http route when naming spans. It also adds Jersey server instrumentation.

http route

Zipkin 2.5 introduced a new tag "http.route", which is like "http.path" except it includes variable tokens. These allow the tag to be the same for requests that includes IDs.

An "http.route" could look like "/items/:itemId". Brave 4.5 adds support for parsing this, employing it as the default span name policy when present. A route based span name looks like "get /users/{userId}", "get not_found" or "get redirected". Please look at the docs for more details.

support

The following brave-based instrumentation support route-based span names. More soon!

credits

Both ratpack-zipkin and play-zipkin-tracing formerly used route-based span names. The mechanics for how we parse the name (ex on the response path) was a technique @llinder and @hyleung worked out in ratpack. This insight saved a lot of effort. The default naming strategy, notably redirect and not_found constants were via micrometer from @jkschneider.

Jersey server instrumentation

JAX-RS is a helpful specification, but it does not make error handling easy, nor does it allow you to access the path template declared on service interfaces.

Through discussion with @jplock, we decided to add jersey-server instrumentation.

This uses an application event listener to do a better job tracing
Jersey servers than what can be done with JAX-RS. For example, the event
listener can handle errors natively, and act more precisely with regards
to jersey-specifics like ManagedAsync.

This instrumentation is used by default in DropWizard Zipkin.

Other updates

The following are some other changes you may want to be aware of

  • "http.method" tag is now added by default (as it is not necessarily the same as the span name anymore)
  • @takezoe fixed generic result types of HttpServerHandler and HttpClientHandler.create() (helps in scala)
  • You can now parse more from the response HttpAdapter.methodFromResponse and HttpAdapter.route. These are used for the span name.
  • You can now parse the status code as an integer instead of boxing via statusCodeAsInt
  • The JAX-RS client filter is now public so you can install it alongside jersey server instrumentation, and without the container filter.

Brave 4.15

22 Feb 13:28
Compare
Choose a tag to compare

Brave 4.15 Adds support for Vert.x web

brave-instrumentation-vertx-web contains a routing context handler for Vert.x Web
This extracts trace state from incoming requests. Then, it reports to Zipkin how long each request takes, along with relevant tags like the http url. Register this as an failure handler to ensure any errors are also sent to Zipkin.

To enable tracing you need to set order, handler and failureHandler hooks:

vertxWebTracing = VertxWebTracing.create(httpTracing);
routingContextHandler = vertxWebTracing.routingContextHandler();
router.route()
      .order(-1) // applies before routes
      .handler(routingContextHandler)
      .failureHandler(routingContextHandler);

// any routes you add are now traced, such as the below
router.route("/foo").handler(ctx -> {
    ctx.response().end("bar");
});

Other notes

  • Kafka 1.0 libraries can now be used thx @ImFlog

Brave 4.14

22 Feb 13:18
Compare
Choose a tag to compare

Brave 4.14 stops publishing packages under "com.github.kristofa.brave" and handles partial trace context extraction.

No longer publishing com.github.kristofa.brave

We released the "Brave 4" apis, like brave.Tracer a year ago, including an adapter for moving off the "com.github.kristofa.brave" apis. Until now, we've continued to publish the old libraries. Please migrate to newer libraries as we are no longer maintaining or publishing the old ones.

Partial trace extraction

@tabdulradi noticed when we extracted traces, we could skip "extra" data when there was no incoming trace, such as additional AWS propagation fields. We now take care to preserve this data.

Other notes

  • Brave 4.14 deprecates Tracing.Builder.localEndpoint for Tracing.Builder.endpoint to avoid zipkin v1 compile dependency
  • Memory leak fixed when using ExtraFieldPropagation thx @aldex32 and @aukevanleeuwen for hunting this down

Brave 4.13

22 Feb 13:04
Compare
Choose a tag to compare

Brave 4.13 contains a number of improvements, notably in support of Camel and Spring Cloud Sleuth.

Most notably, ExtraFieldPropagation now supports field prefixing. Thanks @aldex32 for championing this feature:

Ex. if you have fields prefixed with "baggage-" you can assign whitelist them like so:

tracingBuilder.propagationFactory(
    ExtraFieldPropagation.newFactoryBuilder(B3Propagation.FACTORY)
        .addField("x-vcap-request-id")
        .addPrefixedFields("baggage-", Arrays.asList("country-code"))
        .build();
);

There are a edge case or supporting changes included in this release as well:

  • Adds Tracing.clock(context) to allow custom scoping of clocks
  • Adds TracingAsyncClientHttpRequestInterceptor to brave-instrumentation-spring-web (thx @marcingrzejszczak)
  • ThreadLocalSpan now supports stacking
  • Trace context parsing is more efficient (by avoiding Long boxing and using latest zipkin library)
  • Fixes bug where an incoming context missing sampled flag didn't sample (thx @marcingrzejszczak)
  • Fixes a number of bugs and performance problems with ExtraFieldPropagation (thx @mpetazzoni)
  • Fixes out-date OSGi references and reduces internal package sharing (thx @oscerd)
  • Stops propagating TraceContext.shared (as it isn't read)

Brave 4.12

14 Dec 04:01
Compare
Choose a tag to compare

Brave 4.12 re-introduces support for Spring WebMVC 2.5 and reduces overhead under load

Spring WebMVC 2.5

Brave once supported Spring WebMVC 2.5, but this fell off radar as many applications updated to Spring 3 or later. Through your multiple requests, we realized Spring WebMVC 2.5 is still important.

Now, brave-instrumentation-spring-webmvc is usable on XML-driven Spring WebMVC 2.5 apps. We've also introduced zipkin-reporter-spring-beans which lets you more flexibly configure things like kafka topics. A number of small changes were made to ensure older libraries work, including maven invoker tests and a new example.

Thanks for your patience, and remember.. asking for what you want is the best start at getting it. You can find us on gitter or watch our repo to see what others are asking for.

Less overhead under load

Through coordinated effort in zipkin-reporter, Brave 4.12 performs much better under heavy load. This means by simply upgrading you will have a lot less overhead when you get a surge of requests. Thanks to @tramchamploo and @wu-sheng for keeping us honest.

Long story on overhead under heavy load

In the past, particularly in the zipkin-reporter project, users like @tramchamploo raised concerns about locking and the amount of spans one can send to zipkin. @adriancole dismissed some of these concerns, due the unlikelihood of being able to query large orders of spans and relatively good JMH scores on the related components.

This was unfortunate, because the data collection concern exists regardless of intent when a system is under load, and JMH benchmarks don't reflect how systems behave under load. This led to a problem going dormant until the next person @wu-sheng noticed overhead, and our focus changed.

Lately, users have asked for a "firehose mode" where 100% of data is collected and reported to something non-zipkin, like a stats aggregator. This would be independent of the sampling mechanism. To test impact of this, we had to understand if such was affordable. We benchmarked our example apps, using wrk and found something quite odd: an order of magnitude latency spike when tracing 100% under load.

Here's an example app tested with tracing disabled via forced "not sampled" decision

$ wrk -t4 -c64 -d30s http://localhost:8081 -H'X-B3-Sampled: 0'
Running 30s test @ http://localhost:8081
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.93ms   32.24ms 541.71ms   94.02%
    Req/Sec     1.38k   482.57     2.42k    82.08%
  80567 requests in 30.08s, 11.15MB read
  Non-2xx or 3xx responses: 596
Requests/sec:   2678.03
Transfer/sec:    379.67KB

Here's the same app with 100% sampled. Notice an order of magnitude different latency

$ wrk -t4 -c64 -d30s http://localhost:8081
Running 30s test @ http://localhost:8081
  4 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   135.80ms  143.86ms 945.27ms   84.16%
    Req/Sec   207.60    150.83   630.00     69.07%
  19370 requests in 30.07s, 2.63MB read
Requests/sec:    644.11
Transfer/sec:     89.42KB

We expect overhead to increase when sending to zipkin, as we are collecting data like IP addresses and that has a cost. What we expected was a percentage, not an order of magnitude increase. Discovering this latency spike was well timed, as thanks to Sheng, we were already thinking about encoding.

The order of magnitude bump wasn't due to encoding on the client threads, rather it was was caused by a separate thread which bundles encoded spans into messages. The problem was this bundling happened under a lock shared with client threads. While "performance people" can often spot this immediately, it can also be detected with contended integrated benchmarks. Even benchmarks against silly hello world example apps. The fix wasn't hard, but story still worth telling.

The morale of the story is don't make the same mistake @adriancole did. If you get a complaint about performance in a performance-sensitive library, look at multiple angles before dismissing. Most importantly, use realistic benchmarks before prioritizing. If you don't have time to do that, ask the requestor to, or file a bug so that someone else (even if later you) can.

If you like this sort of work, please join us! For example, we are looking for a disruptor based alternative to further reduce overhead. Even if you aren't writing zipkin code, comments help as we all have a lot to learn.

Brave 4.11

05 Dec 23:14
Compare
Choose a tag to compare

Brave 4.11 is about time. It takes advantage of Java 9's new clock (when present) and is generally smarter about time.

More precise time when using Java 9

Brave remains bytecode compatible with Java 6. This is important as tracing is equally valuable in applications written a decade ago, but also older versions of android as well. This said, we have some platform detection which allows us to leverage libraries only present in newer JVMs. OpenJDK 9 includes support for more precise (usually at least microsecond) resolution timestamps. We now use these by default when available.

Smarter timestamp caching

Before, we simulated microsecond granularity clock by synchronizing System.currentTimeMillis() with a reading of System.nanoTime(). This was great in so far as operations could be at a stable offset from eachother, allowing you to tell which parallel command started before another at microsecond granularity.

The tradeoff of this design was lack of coherence with system time. If you ran NTP, the tracer wouldn't notice updates. This created an ironic situation, introducing clock skew inside a single host!

We've revised this design, reducing the "tick fixing" to per-trace, as opposed to per-tracer. This gives the same advantage as before, where all timestamps in a trace for a host are both sequential and coherent. However, this reduced scope allows visibility of clock corrections between traces.

Many thanks to @jorgheymans for testing various revisions leading to this design

Brave 4.10

29 Nov 03:17
Compare
Choose a tag to compare

Brave v4.10 adds Netty and Apache HC caching instrumentation, improves MDC integration and adds tracing tools.

Netty Http Server instrumentation

Netty is a popular I/O library used underneath popular frameworks such as gRPC,
CXF and Ratpack. If you are using Brave with a Netty library now, you likely
configure a framework-specific hook. If you aren't writing a Netty framework or
configuring an http pipeline, you can skip this section.

Due to popular demand, we've added trace instrumentation for any
netty-codec-http pipeline. Add NettyHttpTracing.serverHandler() between
infrastructure and application handlers to precisely measure the server-side of
http calls. This is a more precise alternative to doing so at higher layers,
such as servlet, because it is closer to the wire. Thanks to @songxin1990 for
the hard work on this, as well @normanmaurer and @nicmunroe for review.

Depend on io.zipkin.brave:brave-instrumentation-netty-codec-http and
configure your handlers like so to transparently trace http server calls:

NettyHttpTracing nettyHttpTracing = NettyHttpTracing.create(httpTracing);
ChannelPipeline pipeline = ch.pipeline();
... add your infrastructure handlers, in particular HttpRequestDecoder and HttpResponseEncoder
pipeline.addLast("tracing", nettyHttpTracing.serverHandler());
... add your application handlers

Adding parentId to logging contexts

Brave 4.x not only provides tracing, but also logging integration. Even when
a request isn't sampled for tracing, you can see stable trace identifiers in
logs for correlation purposes or for integration with other tools.

Before, we attached traceId and spanId to logging contexts such as SLF4J,
log4j or log4j2. Some integrations, such as PCF metrics, parse logs to create
traces independent of Zipkin. We added parentId to the trace context to
ensure tooling like this can place spans at the right place in the trace tree.

Apache httpclient-cache integration

While commonly done, Apache 4.x HttpClient interceptors cannot reliably attach
trace identifiers to logging contexts. Instead, Brave decorates
HttpClientBuilder to allow both tracing and logging to work properly. One
downside to decorating builders is there are more than one.

You can now substitute TracingCachingHttpClientBuilder to trace and integrate
logs with caching http clients.

New tools for instrumentors

Brave serves at least two audiences: those who want to configure tracing and
those who provide trace instrumentation to others. This section is for those
writing tracing code on behalf of others.

Better Tracer.toString()

Troubleshooting state can be difficult due to overlapping scopes. We've added
a couple things to Tracer.toString() which can significantly demystify state
when debugging. Note: this format is subject to change based on your feedback!

Tracer.toString() always includes the reporter (where spans are going).

Tracer{reporter=AsyncReporter(OkHttpSender(http://myhost:9411/api/v2/spans))}

When spans are in-flight, Tracer.toString() includes a snapshot of each even
if the thread calling toString() is not actively tracing. This part might
change based on feedback. For example, it could be moved to a utility which you
can plumb to an http endpoint (as there could be very many spans in flight!).

Tracer{inFlight=[{"traceId":"48485a3953bb61246b221d5bc9e6496c","id":"6b221d5bc9e6496c","timestamp":1461750491274000,"localEndpoint":{"serviceName":"my-service"}}], reporter=...

When the current thread is tracing, Tracer.toString() includes the current
trace context. Most tracing bugs are about this part.. for example, a callback
wasn't scoped properly. Knowing what's in scope helps those writing tracing
code fix bugs faster.

Tracer{currentSpan=48485a3953bb61246b221d5bc9e6496c/6b221d5bc9e6496c, reporter=...

ThreadLocalSpan

Sometimes you have to instrument a library where There's no attribute namespace
shared across request and response. For this scenario, you can use
ThreadLocalSpan to temporarily store the span between callbacks.

Here's an example:

class MyFilter extends Filter {
  final ThreadLocalSpan threadLocalSpan;

  public void onStart(Request request) {
    // Assume you have code to start the span and add relevant tags...

    // We now set the span in scope so that any code between here and
    // the end of the request can see it with Tracer.currentSpan()
    threadLocalSpan.set(span);
  }

  public void onFinish(Response response, Attributes attributes) {
    // as long as we are on the same thread, we can read the span started above
    Span span = threadLocalSpan.remove();
    if (span == null) return;

    // Assume you have code to complete the span
  }
}

Brave 4.9

17 Oct 16:54
Compare
Choose a tag to compare

Brave 4.9 adds Amazon X-Ray interop, extra field propagation and refined Kafka tracing

Extra Field propagation

We've had requests in the past to propagate extra fields, such as a request ID or experimental group flags.
For example, if you are in a Cloud Foundry environment, your edge request includes a "x-vcap-request-id" field. You can now use ExtraFieldPropagation to portably push arbitrary fields across your call graph.

// when you initialize the builder, define the extra field you want to propagate.
// fields here are added alongside trace identifiers in http or message headers.
tracingBuilder.propagationFactory(
  ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "x-vcap-request-id")
);

// You can also access these ad-hoc for tagging or log correlation
requestId = ExtraFieldPropagation.current("x-vcap-request-id");

Thanks to @jcchavezs for the help vetting this idea

Amazon X-Ray interop

Amazon X-Ray is a distributed tracing service that shares a lot in common with Zipkin. Brave 4.9 adds the ability to pass-through or actively participate in X-Ray services. Thanks very much to Abhishek Singh from Amazon for weekends of support on this and @cemo for early testing.

AWSPropagation

AWSPropagation switches header format from B3 to x-aws-trace-id, used by services like
ALB, API Gateway, and Lambda. Notably, this format uses a single field, not several. Using AWS propagation means you can read incoming traces that pass through Amazon infrastructure such as ALBs. It also means other Amazon services such as Lambda can join your traces, as can anything instrumented with Amazon's X-Ray SDK.

To switch to this header format, use Tracing.Builder.propagationFactory(AWSPropagation.FACTORY)

XRayUDPReporter

XRayUDPReporter converts spans from Zipkin v2 format to Amazon X-Ray format, and sends them via UDP messages. Sending to X-Ray is an alternative to normal Zipkin storage, and is advised if you are running in Amazon's cloud and using AWSPropagation. If you don't, traces in X-Ray and Zipkin will have gaps for spans sent to one and not the other.

To report spans to X-Ray, use Tracing.Builder.spanReporter(XRayUDPReporter.create())

Note XRayUDPReporter is a part of zipkin-aws and can be used by non-brave applications as well.

Lambda Tracing

You can use Brave to trace Java lambda functions. If you are, you'll need to extract to root trace ID differently, as the trace context ends up in env variables. If you are interested in more, please raise an issue!

Here's a snippet from a more complete lambda example:

  @Override
  public O handleRequest(I input, Context context) {
    /** Always start lambda functions from the root in the env */
    Span span = tracer.nextSpan(AWSPropagation.extractLambda())
        .name(context.getFunctionName())
        .start();

Pass through tracing

You may want to pass-through amazon trace headers as opposed to joining those traces. This is a hybrid way, where for example most of your data will be in Zipkin, yet brave won't break X-Ray traces either.

To enable pass-through, just add "x-amzn-trace-id" to extra fields

tracingBuilder.propagationFactory(
  ExtraFieldPropagation.newFactory(B3Propagation.FACTORY, "x-amzn-trace-id")
);

With this in place, you can tag the amazon formatted trace ID for correlation purposes like so:

// will look like 1-67891233-abcdef012345678912345678
String awsTraceId = AWSPropagation.currentTraceId();
if (awsTraceId != null) span.tag("aws.trace_id", awsTraceId);

Putting it all together!

If you want to send to X-Ray, here's the minimum setup in Java and Spring XML:

    return Tracing.newBuilder()
        .localServiceName("your_service_name")
        .propagationFactory(AWSPropagation.FACTORY)
        .spanReporter(XRayUDPReporter.create()).build()
  <bean id="tracing" class="brave.spring.beans.TracingFactoryBean">
    <property name="localServiceName" value="your_service_name"/>
    <property name="propagationFactory">
      <util:constant static-field="brave.propagation.aws.AWSPropagation.FACTORY"/>
    </property>
    <property name="spanReporter">
      <bean class="zipkin.reporter.xray_udp.XRayUDPReporter" factory-method="create"/>
    </property>
  </bean>

Refined Kafka tracing

Before, we created a "consumer span" for each message in a bulk poll. This can create a lot of spans, as for example the default max records per poll is 500. We now create one consumer span per poll/topic.

When you are ready to process a message, use KafkaTracing.nextSpan(record) to create a span.

Please check out the docs for more. Thanks very much to @ImFlog for the help on design and brainstorming this.

Other notes

  • To support a lot of the above, we made a new library hook: Tracer.nextSpan(extractedContext)
    • Use this if you are creating custom RPC or messaging instrumentation, as it automates some dancing
  • We dropped the JSR 305 @Nullable annotations as they interfere both with OSGi and Java 9
    • Pay attention to docs and our source-retention annotations, if you are writing custom code

Brave 4.8

24 Sep 06:01
Compare
Choose a tag to compare

Brave v4.8 supports customizable gRPC tracing, pluggable propagation (headers), and running without zipkin v1 types. You should also check out zipkin-php which is a PHP port of the Brave v4 apis.

Customizable gRPC tracing

Our gRPC tracing is tested against the version range 1.2-1.6. This is great for helping understanding requests while the framework evolves. With effort from @jorgheymans, this is better now. Before, you couldn't declare a span customization policy, like we have in http. Now.. you can.

For example, this will add a tag of the grpc message sent from a client to a server, and renames the span to something of lower cardinality:

grpcTracing = GrpcTracing.newBuilder(tracing)
    .clientParser(new GrpcClientParser() {
      @Override protected <M> void onMessageSent(M message, SpanCustomizer span) {
        span.tag("grpc.message_sent", message.toString());
      }

      @Override protected <ReqT, RespT> String spanName(MethodDescriptor<ReqT, RespT> methodDescriptor) {
        return methodDescriptor.getType().name();
      }
    })
    .build();

For more info, check out the README

Note be careful with this, especially adding payloads as span tags. Spans are typically best < 1KiB for transport and storage efficiency.

Pluggable propagation

While the default has always been B3, there are efforts such as trace-context to define a standards-track format. To support this, we needed to do two things:

A future version will have an experimental implementation of trace-context, and/or the pilot version already in use by gRPC via the OpenCensus project. Please keep an eye out for more on this.

Running without zipkin v1 types

In the past release, we added Tracing.Builder.spanReporter for configuring zipkin v2. Now that Zipkin Reporter 2 is out, you can manually exclude the io.zipkin.java:zipkin, eliminating a 300Kib dependency.

If you want to see how to do this, you can look at our example repository.

Zipkin PHP

Through significant effort by @jcchavezs, there's now a PHP port of brave v4 tracing, called
zipkin-php. Please give it a try and any feedback you might have. If you want a quick start, you can look at the example project.