Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OTLP/HTTP collector #13

Merged
merged 14 commits into from
Sep 16, 2024

Conversation

making
Copy link
Contributor

@making making commented Aug 29, 2024

This pull request adds an OTLP/HTTP implementation of the Zipkin Collector.

It mostly follows the following document, but does not implement fallback for remoteEndpoint name and deprecated attribute names have been changed to new ones.
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk_exporters/zipkin.md

Also, the mapping of resource attributes is TBD in this document, and is not implemented in this PR, but I would like to open a separate issue to discuss the mapping of resource attributes.

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookin good. did you test integrating the module yet?

return this;
}

@Override public String toString() {
@Override
public String toString() {
return "OpenTelemetryHttpCollector{}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to name this similar to the sender?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate this?

@Override
protected HttpResponse doPost(ServiceRequestContext ctx, HttpRequest req) {
CompletableCallback result = new CompletableCallback();
req.aggregate(AggregationOptions.usePooledObjects(ByteBufAllocator.DEFAULT, ctx.eventLoop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will let @anuraaga @kojilin or @minwoox review this part ;)

* Taken from OpenTelemetry codebase.
* https://github.com/open-telemetry/opentelemetry-java/blob/3e8092d086967fa24a0559044651781403033313/api/all/src/main/java/io/opentelemetry/api/internal/OtelEncodingUtils.java
*/
static class OtelEncodingUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I feel we have some utilities like HexCodec or similar somewhere..

* @param traceIdBytes the bytes (16-byte array) representation of the {@code TraceId}.
* @return the lowercase hex (base16) representation of the {@code TraceId}.
*/
static String traceIdFromBytes(byte[] traceIdBytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can convert the bytes to longs, then reduces the amount of copy/paste as we can use this? zipkin2.Span

Copy link
Contributor Author

@making making Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in ce708bc and 3dbd79a

}
Awaitility.waitAtMost(Duration.ofSeconds(5))
.untilAsserted(() -> assertThat(store.acceptedSpanCount()).isEqualTo(size));
List<List<zipkin2.Span>> received = store.getTraces(traceIds).execute();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

assertThat(span.tags()).containsEntry("int", "100");
assertThat(span.tags()).containsEntry("double", "10.5");
assertThat(span.tags()).containsEntry("boolean", "true");
assertThat(span.tags()).containsEntry("array", "[\"a\",\"b\",\"c\"]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting to serialize the json, but guess that's how otel does it. Before otel we just joined on comma as easier on indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/common/README.md#attribute
says

For protocols that do not natively support non-string values, non-string values SHOULD be represented as JSON-encoded strings. For example, the expression int64(100) will be encoded as 100, float64(1.5) will be encoded as 1.5, and an empty array of any type will be encoded as [].

the test-case doesn't have square brackets https://github.com/open-telemetry/opentelemetry-java/blob/main/exporters/zipkin/src/test/java/io/opentelemetry/exporter/zipkin/OtelToZipkinSpanTransformerTest.java#L382-L385

I thought OtelToZipkinSpanTransformer was not implemented as documented, but do you thing aligning the existing implementation is better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes align with existing impl and make a comment of the source line you are referring to and that it is understood to be not the same as the otel spect. The reason is that the experience is worse to json encode, since folks have comma splitters. I suspect that may be why upstream in otel they are not json encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 420507b

@making
Copy link
Contributor Author

making commented Aug 30, 2024

Lookin good. did you test integrating the module yet?

Not yet. I don't think I understand the concept of modules yet 😅

@codefromthecrypt
Copy link
Contributor

Not yet. I don't think I understand the concept of modules yet 😅

no worries that part is simpler for me to help with vs what you've done!

}
catch (IOException e) {
LOG.log(Level.WARNING, "Unable to parse the request:", e);
result.onError(new UncheckedIOException(e));
Copy link

@kojilin kojilin Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if not rethrow, just use result.onError(e); is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. fixed in b162311

byte[] traceIdBytes = spanData.getTraceId().toByteArray();
long high = bytesToLong(traceIdBytes, 0);
if (high == 0) {
spanBuilder.traceId(INVALID_TRACE);
Copy link
Contributor

@codefromthecrypt codefromthecrypt Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invalid spans should end up being dropped instead I think. You can look at collectormetrics and usage for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codefromthecrypt

What does "collectormetrics" refer to?

If I create a span with an invalid ID in Zipkin it throws an IllegalArgumentException. Does that mean that catch the exception and increment the "drop" counter metric?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increment spanDropped if the received span is invalid in c869d69

@codefromthecrypt
Copy link
Contributor

ps when in doubt direction matters

in collector we are receiving data in a foreign format and putting it into ours. since we define our api and format, it follows that if plain comma separated is a norm in zipkin, we should coerce to that (it is)

on the other side if we are exporting to a foreign format, we should follow that format's conventions as much as possible for the same albeit visa versa reason. (e.g. bytesmessageformat for brave)

hope this helps

try {
spans.add(generateSpan(span, scope, resourceSpans.getResource()));
}
catch (RuntimeException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally i would capture at call site calling translator if you can, as this is more like any other issue about malformed inclusing bad trace ID but not limited to that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is caught outside of the translator call, the valid spans in the trace data may also be lost. Is that expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, as consider if it were json and one span was in incorrect format, you would lose the message. I think bugs should be fixed basically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say I'm wrong, we get an issue asking for a different behavior also. meanwhile there isn't code at possibly not the right abstraction ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IOException that occurs when converting a protobuf payload to a Java object occurs before calling translate and is handled. This runtime exception occurs when converting an otel span to a zipkin span, and may be an illegal argument or null pointer exception. If a span with an invalid value is inserted among multiple valid spans, is it okay to drop all of them? This is a rare case, so I don't really care.

Copy link
Contributor

@codefromthecrypt codefromthecrypt Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably one thing that I didn't highlight is that when in doubt we can refer to the actual collector.java which handles exception and error the same way https://github.com/openzipkin/zipkin/blob/master/zipkin-collector/core/src/main/java/zipkin2/collector/Collector.java#L171-L180

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically what I mean is that error handling and stats handling were all done in collector.java. it is true that we cannot always re-use that type, but the behavior expectations we can borrow from it anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I looked at the implementation of Collector and left incrementing up to Collector as much as possible. I thought I would accept valid spans as much as possible. However, if it's not a good design for Translator to depend on CollectorMetrics, I'll count Dropped outside of Translator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider my opinion non binding basically i feel this code is a little cleaner not injecting collectormetrics to it. i will approve either way. your call

Copy link
Contributor Author

@making making Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically i feel this code is a little cleaner not injecting collectormetrics to it.

Agreed.

I considered passing a callback to decouple, but that would be too much for some rare cases, so I simply caught the exception outside the translator.
5195c30

@codefromthecrypt codefromthecrypt merged commit 96a3550 into openzipkin-contrib:main Sep 16, 2024
2 checks passed
@codefromthecrypt
Copy link
Contributor

thanks for all the hard work @making and also thanks for the help reviewing @kojilin and @minwoox!

@making
Copy link
Contributor Author

making commented Sep 16, 2024

Thanks all!

@making making deleted the otlp-collector branch September 16, 2024 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants