-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayIndexOutOfBoundsException in ArrayDeque.add in GrpcExporter #7019
Comments
Thanks for the report. I will investigate. |
I've been investigating too, just dropping breakpoints that evaluate at the marshalerPool.poll and marshalerPool.add calls. I see that However, the only synchronization I can see upstream of It would be much simpler to reason about if
|
ArrayDeque specifies that: > Array deques ... are not thread-safe; in the absence of external > synchronization, they do not support concurrent access by multiple > threads. `marshalerPool` is concurrently added to by the OkHttp threadpool without synchronization, along with all threads that end spans (with synchronisation in `SimpleSpanProcessor.exportLock`, which is not used to synchronize with the OkHttp threadpool). Just making the ArrayQueue synchronous internally removes all need to worry about upstream locks. Fixes open-telemetry#7019
ArrayDeque specifies that: > Array deques ... are not thread-safe; in the absence of external > synchronization, they do not support concurrent access by multiple > threads. `marshalerPool` is concurrently added to by the OkHttp threadpool without synchronization, along with all threads that end spans (with synchronisation in `SimpleSpanProcessor.exportLock`, which is not used to synchronize with the OkHttp threadpool). Just making the ArrayQueue synchronous internally removes all need to worry about upstream locks. Fixes open-telemetry#7019
I think a fix is necessary, but I wasn't sure why since
Looking at the source code for SimpleSpanProcessor#onEnd, there's an attempt to avoid concurrent calls export, but it arguably doesn't meet spirit of the spec. Its not actually clear to me whether SimpleSpanProcessor is supposed to not only call But I think you're fix is necessary because even if SimpleSpanProcessor is supposed to wait for the CompletableResultCode to complete, our OTLP exporters should be able to export multiple requests concurrently, so long as callers synchronize calls to @Mahoney out of curiosity, why are you using SImpleSpanProcessor instead of BatchSpanProcessor? SimpleSpanProcessor is really only for trivial situations. Exporting one record per request is crazy inefficient for real world situations. cc @jkwatson WDYT - should SimpleSpanProcessor wait for CompletableResultCode to complete before calling the next |
Based on the spec language:
I personally think SimpleSpanProcessor.java is following the spirit, based on the intent here:
|
Yeah I saw that language too. The counter is that SimpleSpanProcessor is inconsistent with BatchSpanProcessor, which waits for the CompletableResultCode to complete. I can't find any language suggesting that simple and batch processors are intended to be different in this respect. |
I think BatchSpanProcessor is the one that's wrong 😁: #4264 |
I don't know, lack of knowledge of the library I guess. |
Describe the bug
We are seeing this stack trace in our live system:
It seems to leave the entire exporter in a state from which it cannot recover; we then see this stack repeatedly:
and after c. 12 of those, just:
over and over again.
We are using the
OtlpGrpcSpanExporter
with the default memory mode ofREUSABLE_DATA
. It looks like somehow concurrent access is happening toSpanReusableDataMarshaler
'smarshalerPool
.Steps to reproduce
Unknown.
What version and what artifacts are you using?
Artifacts:
How did you reference these artifacts? (excerpt from your
build.gradle
,pom.xml
, etc)build.gradle
:Environment
Compiler & runtime: Temurin 23
OS: Alpine 3.20.5
Additional context
We're on
1.45.0
not1.46.0
because the latest version (2.11.0) ofopentelemetry-instrumentation-bom
andopentelemetry-instrumentation-bom-alpha
bring in1.45.0
.The text was updated successfully, but these errors were encountered: