Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incoming breaking changes with OpenTelemetry Java Instrumentation >= 1.25.0 #714

Closed
rapphil opened this issue May 23, 2023 · 5 comments
Closed
Labels
bug Something isn't working Stale

Comments

@rapphil
Copy link
Contributor

rapphil commented May 23, 2023

Describe the bug
There are breaking changes that were inserted in this PR in the OpenTelemetry Java Instrumentation for aws-lambda.

To describe what will happen once the lambda layer starts using Java Instrumentation >= 1.25.0 I created a test case with the following scenario:

Client -> API Gateway Rest -> lambda a -> API Gateway Rest -> Lambda b.

API Gateway and Lambda have active tracing enabled.

The following image presents the behaviour observed when ingesting the traces into AWS X-Ray with lambdas with a lambda layer wrapper using Java Instrument 1.24.0.

image

The nodes containing cogs (lambda-java-a and lambda-java-b) are generated from spans created using the lambda function instrumented with OpenTelemetry.

The following image presents the behaviour observed when ingesting the traces into AWS X-Ray with lambdas with a lambda layer wrapper using Java Instrument 1.26.0.
image

As we can see, there will be a disconnection between the AWS components and the lambda code instrumented using OpenTelemetry. That was the expected behaviour of the PR: it will only consider the headers and the spans present in the _X_AMZN_TRACE_ID environment variable will be added to the span as span links.

Having said that, I would like to challenge how this change was put in place. There is no mechanism to avoid existing users to face a breaking change.

Moreover, I'm not even sure if the current implementation makes sense, because the linked spans will not be accessible in backends other than X-Ray: the traces emitted from the AWS components are sent directly to X-Ray and therefore there is no point in always adding span links for spans that will no be accessible in case the backend is not X-Ray.

Steps to reproduce
Use a lambda

What did you expect to see?
I expected that there is a way for users to avoid facing breaking change.

What version of collector/language SDK version did you use?
OpenTelemetry Java Instrumentation v1.26.0

What language layer did you use?
Java

Additional context
We are creating this issue to raise awareness of incoming breaking changes. I understand that this behaviour is defined in the spec, however I wanted to register that this is still a breaking change that will cause pain for users.

Related PRs in other languages:
open-telemetry/opentelemetry-js-contrib#1411
open-telemetry/opentelemetry-python-contrib#1657

@pavankrish123
Copy link

pavankrish123 commented Jun 15, 2023

Moreover, I'm not even sure if the current implementation makes sense, because the linked spans will not be accessible in backends other than X-Ray: the traces emitted from the AWS components are sent directly to X-Ray and therefore there is no point in always adding span links for spans that will no be accessible in case the backend is not X-Ray.

Hi @rapphil, We are trying to setup a monitoring of lambda and read through this issue. Can you please suggest a solution here.

In case of v0.24.0 - if we are sending the data to other monitoring vendors which do not get the parent AWS spans, we will end up having a broken trace with only OTEL spans with reference to those AWS spans.

  1. is there a way to export AWS spans to other monitoring backends or 2) Is there a way to disable AWS spans from being produced by the infrastructure.

With v0.26.0 currently monitoring backend can ignore the links if we don't have those trace/spans but at least we may get a connected trace (albeit without AWS).

But given the issue / specification is under discussion we want to work with v0.24.0 for the time being. Any suggestions on how to proceed. cc: @svrnm

@carlosalberto
Copy link
Contributor

The TC briefly discussed this last week - overall, the existing solution to this 'problem' was deemed as a good one. It was recommended that, if anything breaks a vendor, the vendor should come with an actual proposal, and in the meantime, the group should keep on making progress as usual.

Given this, and given that four weeks have passed, I'd suggest:

  1. we keep this issue open
  2. Keep on making progress (that is, incorporate the existing Spec changes for Python/Lambda, as they were for Java), and
  3. Study a solution/change whenever such PR hits this repo (and then make the actual needed changes, etc)

@carlosalberto
Copy link
Contributor

Hey, just a FYI:

As we iterate (hopefully fast) on this issue, we will prepare to merge open-telemetry/opentelemetry-js-contrib#1411 and open-telemetry/opentelemetry-python-contrib#1657 - as this two incorporate the current Specification & Semantic Conventions (like Java does).

Also a good time to remember that the FaaS section of the mentioned documents is still experimental (and hence we can still get breaking changes - one more motivation to move fast and work towards stability).

Let me know in case of any doubt, and let's definitely make fast progress on this issue.

@tylerbenson
Copy link
Member

For reference, see the latest spec change here: open-telemetry/semantic-conventions#354

Copy link

This issue was marked stale. It will be closed in 30 days without additional activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

4 participants