Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question - What are the requirements to mark a trace with error=yes? #771

Closed
anghelyi opened this issue Jul 28, 2023 · 7 comments
Closed
Assignees
Labels
integrations Used to denote items related to the Integrations project

Comments

@anghelyi
Copy link

Hello! Sorry for my question but I was not able to find exact answer anywhere about what qualifies a trace to be marked as erroneous. I've been playing with OpenSearch+OpenTelemetry agents on and off for a few weeks and I find the traces' error rate very low compared to what I'm used to see in metrics. For me it seems that having an erroneous span in the trace does not automatically mark the trace with error. But sometimes it does, so that's why I'm confused. As I see, and correct me if I'm wrong, if one of the spans in the trace have an exception, that marks the trace with error. What are the other conditions? Is it configurable somewhere? Can you point me into the right direction where can I more information about it?
Thanks in advance!

@YANG-DB
Copy link
Member

YANG-DB commented Jul 29, 2023

Hi @anghelyi
Thanks for bringing this to our attention - can u give some concrete samples that we can look deeper into ?

@anghelyi
Copy link
Author

anghelyi commented Aug 1, 2023

Hi!
Sorry for the late reply, I dug out two examples. Unfortunately I'm not allowed to share the full payload, stripped out a lot, hope this will be enough:

1.txt - this trace is marked as error, it has a span with exception
2.txt - this one is not marked as erroneous even if it has multiple spans marked as error (a7e7dfa4e383c45f,73249608d24d01f1,034cb330da3ca0eb,3cb3c1e831f54cc3). It also has a span with exception.

1.txt
2.txt

@derek-ho derek-ho removed the untriaged label Aug 3, 2023
@derek-ho
Copy link
Collaborator

derek-ho commented Aug 3, 2023

Thanks for raising the issue @anghelyi , I will take a look later today. In the meantime, can you share what you are using - trace analytics with data prepper ingestion agent/looking at jaeger traces, so I can get back to you with a more specific answer?

@anghelyi
Copy link
Author

anghelyi commented Aug 3, 2023

Hi @derek-ho ! It's not an urgent issue, more like a question to understand how the system works/check if it works correctly on our side. Anyway, we are using open telemetry agents with data prepper.
Btw, I'll be pretty much offline for the next two weeks, if you need more information or examples I can only provide them after that.

@anghelyi
Copy link
Author

Hi! Any idea where can I find more information regarding this?

@Swiddis Swiddis assigned Swiddis and YANG-DB and unassigned derek-ho Nov 15, 2023
@Swiddis
Copy link
Collaborator

Swiddis commented Nov 15, 2023

Hi, so sorry for not getting back! The issue seems to have been buried. I'm looking into it now. Thanks for reaching out and being patient.

So, I'm looking at the OTEL error handling docs, and it looks like correctly identifying errors is the responsibility of the library generating the trace. In particular I'm looking at the line:

API methods MUST NOT throw unhandled exceptions when used incorrectly by end users.

I don't know much about the setup that's generating these traces, but I do see that in the trace that's not marked as an error, there are stack traces and 500s. That could be related, but no definitive proof.

I also found this related issue from Spring Sleuth, which seems to show up in the stack traces. They mention something about Brave handling it (which I don't see in the stack trace). They also cite Brave's error handling rationale to explain that error=false might not actually mean there were no errors:

When considered in this context, the value of the "error" tag is less important than its existence.

Given this, I'm making a guess that checking for error=true is the wrong check to make, which would explain the low error rate.

@Swiddis Swiddis added the integrations Used to denote items related to the Integrations project label Dec 5, 2023
@Swiddis
Copy link
Collaborator

Swiddis commented Apr 8, 2024

Closing as stale

@Swiddis Swiddis closed this as not planned Won't fix, can't repro, duplicate, stale Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations Used to denote items related to the Integrations project
Projects
None yet
Development

No branches or pull requests

4 participants