-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid redundant shutdown in TracerProvider::drop when already shut down #2197
base: main
Are you sure you want to change the base?
Changes from 10 commits
32938be
01c970b
c08c055
82d2598
f270dcd
c8f2166
1a3dd67
36012dc
0d7d1ab
01234c5
2279f3a
293120b
21fa84e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,67 @@ | ||
//! # Trace Provider SDK | ||
//! | ||
//! ## Tracer Creation | ||
//! | ||
//! New [`Tracer`] instances are always created through a [`TracerProvider`]. | ||
//! | ||
//! All configuration objects and extension points (span processors, | ||
//! propagators) are provided by the [`TracerProvider`]. [`Tracer`] instances do | ||
//! not duplicate this data to avoid that different [`Tracer`] instances | ||
//! of the [`TracerProvider`] have different versions of these data. | ||
/// # Trace Provider SDK | ||
/// | ||
/// The `TracerProvider` handles the creation and management of [`Tracer`] instances and coordinates | ||
/// span processing. It serves as the central configuration point for tracing, ensuring consistency | ||
/// across all [`Tracer`] instances it creates. | ||
/// | ||
/// ## Tracer Creation | ||
/// | ||
/// New [`Tracer`] instances are always created through a `TracerProvider`. These `Tracer`s share | ||
/// a common configuration, which includes the [`Resource`], span processors, sampling strategies, | ||
/// and span limits. This avoids the need for each `Tracer` to maintain its own version of these | ||
/// configurations, ensuring uniform behavior across all instances. | ||
/// | ||
/// ## Cloning and Shutdown | ||
/// | ||
/// The `TracerProvider` is designed to be lightweight and clonable. Cloning a `TracerProvider` | ||
/// creates a new reference to the same provider, not a new instance. Dropping the last reference | ||
/// to the `TracerProvider` will automatically trigger its shutdown. During shutdown, the provider | ||
/// will flush all remaining spans, ensuring they are exported to the configured exporters. | ||
lalitb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
/// Users can also manually trigger shutdown using the [`shutdown`](TracerProvider::shutdown) | ||
/// method, which will ensure the same behavior. | ||
/// | ||
/// Once shut down, the `TracerProvider` transitions into a disabled state. In this state, further | ||
/// operations on its associated `Tracer` instances will result in no-ops, ensuring that no spans | ||
/// are processed or exported after shutdown. | ||
/// | ||
/// ## Span Processing and Force Flush | ||
/// | ||
/// The `TracerProvider` manages the lifecycle of span processors, which are responsible for | ||
/// collecting, processing, and exporting spans. To ensure all spans are processed before shutdown, | ||
/// users can call the [`force_flush`](TracerProvider::force_flush) method at any time to trigger | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. lets remove force_flush mention here. I have seen many users doing force_flush in their code (and block their threads).. Not sure why, but lets make sure official docs don't recommend it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have reworded it so that it doesn't look as recommendation. I think it's better to at-least document since we provide it.
|
||
/// an immediate flush of all pending spans to the exporters. | ||
/// | ||
/// # Examples | ||
/// | ||
/// ``` | ||
/// use opentelemetry::global; | ||
/// use opentelemetry_sdk::trace::TracerProvider; | ||
/// use opentelemetry::trace::Tracer; | ||
/// | ||
/// fn init_tracing() -> TracerProvider { | ||
/// let provider = TracerProvider::default(); | ||
/// | ||
/// // Set the provider to be used globally | ||
/// let _ = global::set_tracer_provider(provider.clone()); | ||
/// | ||
/// provider | ||
/// } | ||
/// | ||
/// fn main() { | ||
/// let provider = init_tracing(); | ||
/// | ||
/// // create tracer.. | ||
/// let tracer = global::tracer("example/client"); | ||
/// | ||
/// // create span... | ||
/// let span = tracer | ||
/// .span_builder("test_span") | ||
/// .start(&tracer); | ||
/// | ||
/// // Explicitly shut down the provider | ||
/// provider.shutdown(); | ||
/// } | ||
/// ``` | ||
use crate::runtime::RuntimeChannel; | ||
use crate::trace::{ | ||
BatchSpanProcessor, Config, RandomIdGenerator, Sampler, SimpleSpanProcessor, SpanLimits, Tracer, | ||
|
@@ -16,7 +70,7 @@ | |
use crate::{InstrumentationLibrary, Resource}; | ||
use once_cell::sync::{Lazy, OnceCell}; | ||
use opentelemetry::trace::TraceError; | ||
use opentelemetry::{global, trace::TraceResult}; | ||
use opentelemetry::{otel_debug, trace::TraceResult}; | ||
use std::borrow::Cow; | ||
use std::sync::atomic::{AtomicBool, Ordering}; | ||
use std::sync::Arc; | ||
|
@@ -36,36 +90,60 @@ | |
span_limits: SpanLimits::default(), | ||
resource: Cow::Owned(Resource::empty()), | ||
}, | ||
is_shutdown: AtomicBool::new(true), | ||
}), | ||
is_shutdown: Arc::new(AtomicBool::new(true)), | ||
}); | ||
|
||
/// TracerProvider inner type | ||
#[derive(Debug)] | ||
pub(crate) struct TracerProviderInner { | ||
processors: Vec<Box<dyn SpanProcessor>>, | ||
config: crate::trace::Config, | ||
is_shutdown: AtomicBool, | ||
} | ||
|
||
impl Drop for TracerProviderInner { | ||
fn drop(&mut self) { | ||
for processor in &mut self.processors { | ||
impl TracerProviderInner { | ||
/// Crate-private shutdown method to be called both from explicit shutdown | ||
/// and from Drop when the last reference is released. | ||
pub(crate) fn shutdown(&self) -> Vec<TraceError> { | ||
let mut errs = vec![]; | ||
for processor in &self.processors { | ||
if let Err(err) = processor.shutdown() { | ||
global::handle_error(err); | ||
// Log at debug level because: | ||
// - The error is also returned to the user for handling (if applicable) | ||
// - Or the error occurs during `TracerProviderInner::Drop` as part of telemetry shutdown, | ||
// which is non-actionable by the user | ||
otel_debug!(name: "TracerProvider.Drop.ShutdownError", | ||
error = format!("{err}")); | ||
errs.push(err); | ||
} | ||
} | ||
errs | ||
} | ||
} | ||
|
||
impl Drop for TracerProviderInner { | ||
fn drop(&mut self) { | ||
if !self.is_shutdown.load(Ordering::Relaxed) { | ||
let _ = self.shutdown(); // errors are handled within shutdown | ||
} else { | ||
otel_debug!( | ||
name: "TracerProvider.Drop.AlreadyShutdown" | ||
); | ||
} | ||
} | ||
} | ||
|
||
/// Creator and registry of named [`Tracer`] instances. | ||
/// | ||
/// `TracerProvider` is lightweight container holding pointers to `SpanProcessor` and other components. | ||
/// Cloning and dropping them will not stop the span processing. To stop span processing, users | ||
/// must either call `shutdown` method explicitly, or drop every clone of `TracerProvider`. | ||
/// `TracerProvider` is a lightweight container holding pointers to `SpanProcessor` and other components. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not introduced in this PR, but advertising TracerProvider as lightweight is incorrect, and can lead to users repeatedly creating them, instead of doing it once. |
||
/// Cloning a `TracerProvider` instance and dropping it will not stop span processing. To stop span processing, users | ||
/// must either call the `shutdown` method explicitly or allow the last reference to the `TracerProvider` | ||
/// to be dropped. When the last reference is dropped, the shutdown process will be automatically triggered | ||
/// to ensure proper cleanup. | ||
#[derive(Clone, Debug)] | ||
pub struct TracerProvider { | ||
inner: Arc<TracerProviderInner>, | ||
is_shutdown: Arc<AtomicBool>, | ||
} | ||
|
||
impl Default for TracerProvider { | ||
|
@@ -79,7 +157,6 @@ | |
pub(crate) fn new(inner: TracerProviderInner) -> Self { | ||
TracerProvider { | ||
inner: Arc::new(inner), | ||
is_shutdown: Arc::new(AtomicBool::new(false)), | ||
} | ||
} | ||
|
||
|
@@ -101,7 +178,7 @@ | |
/// true if the provider has been shutdown | ||
/// Don't start span or export spans when provider is shutdown | ||
pub(crate) fn is_shutdown(&self) -> bool { | ||
self.is_shutdown.load(Ordering::Relaxed) | ||
self.inner.is_shutdown.load(Ordering::Relaxed) | ||
} | ||
|
||
/// Force flush all remaining spans in span processors and return results. | ||
|
@@ -153,28 +230,20 @@ | |
/// Note that shut down doesn't means the TracerProvider has dropped | ||
pub fn shutdown(&self) -> TraceResult<()> { | ||
if self | ||
.inner | ||
.is_shutdown | ||
.compare_exchange(false, true, Ordering::SeqCst, Ordering::SeqCst) | ||
.is_ok() | ||
{ | ||
// propagate the shutdown signal to processors | ||
// it's up to the processor to properly block new spans after shutdown | ||
let mut errs = vec![]; | ||
for processor in &self.inner.processors { | ||
if let Err(err) = processor.shutdown() { | ||
errs.push(err); | ||
} | ||
} | ||
|
||
let errs = self.inner.shutdown(); | ||
if errs.is_empty() { | ||
Ok(()) | ||
} else { | ||
Err(TraceError::Other(format!("{errs:?}").into())) | ||
} | ||
} else { | ||
Err(TraceError::Other( | ||
"tracer provider already shut down".into(), | ||
)) | ||
Err(TraceError::AlreadyShutdown) | ||
} | ||
} | ||
} | ||
|
@@ -215,7 +284,7 @@ | |
} | ||
|
||
fn library_tracer(&self, library: Arc<InstrumentationLibrary>) -> Self::Tracer { | ||
if self.is_shutdown.load(Ordering::Relaxed) { | ||
if self.inner.is_shutdown.load(Ordering::Relaxed) { | ||
return Tracer::new(library, NOOP_TRACER_PROVIDER.clone()); | ||
} | ||
Tracer::new(library, self.clone()) | ||
|
@@ -292,7 +361,12 @@ | |
p.set_resource(config.resource.as_ref()); | ||
} | ||
|
||
TracerProvider::new(TracerProviderInner { processors, config }) | ||
let is_shutdown = AtomicBool::new(false); | ||
TracerProvider::new(TracerProviderInner { | ||
processors, | ||
config, | ||
is_shutdown, | ||
}) | ||
} | ||
} | ||
|
||
|
@@ -311,6 +385,7 @@ | |
use std::env; | ||
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering}; | ||
use std::sync::Arc; | ||
use std::sync::Mutex; | ||
|
||
// fields below is wrapped with Arc so we can assert it | ||
#[derive(Default, Debug)] | ||
|
@@ -391,6 +466,7 @@ | |
Box::from(TestSpanProcessor::new(false)), | ||
], | ||
config: Default::default(), | ||
is_shutdown: AtomicBool::new(false), | ||
}); | ||
|
||
let results = tracer_provider.force_flush(); | ||
|
@@ -534,6 +610,7 @@ | |
let tracer_provider = super::TracerProvider::new(TracerProviderInner { | ||
processors: vec![Box::from(processor)], | ||
config: Default::default(), | ||
is_shutdown: AtomicBool::new(false), | ||
}); | ||
|
||
let test_tracer_1 = tracer_provider.tracer("test1"); | ||
|
@@ -554,14 +631,136 @@ | |
|
||
// after shutdown we should get noop tracer | ||
let noop_tracer = tracer_provider.tracer("noop"); | ||
|
||
// noop tracer cannot start anything | ||
let _ = noop_tracer.start("test"); | ||
assert!(assert_handle.started_span_count(2)); | ||
// noop tracer's tracer provider should be shutdown | ||
assert!(noop_tracer.provider().is_shutdown.load(Ordering::SeqCst)); | ||
assert!(noop_tracer.provider().is_shutdown()); | ||
|
||
// existing tracer becomes noops after shutdown | ||
let _ = test_tracer_1.start("test"); | ||
assert!(assert_handle.started_span_count(2)); | ||
} | ||
|
||
#[derive(Debug)] | ||
struct CountingShutdownProcessor { | ||
shutdown_count: Arc<Mutex<i32>>, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: Atomics maybe easier here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
flush_called: Arc<Mutex<bool>>, | ||
} | ||
|
||
impl CountingShutdownProcessor { | ||
fn new(shutdown_count: Arc<Mutex<i32>>, flush_called: Arc<Mutex<bool>>) -> Self { | ||
CountingShutdownProcessor { | ||
shutdown_count, | ||
flush_called, | ||
} | ||
} | ||
} | ||
|
||
impl SpanProcessor for CountingShutdownProcessor { | ||
fn on_start(&self, _span: &mut Span, _cx: &Context) { | ||
// No operation needed for this processor | ||
} | ||
|
||
fn on_end(&self, _span: SpanData) { | ||
// No operation needed for this processor | ||
} | ||
|
||
fn force_flush(&self) -> TraceResult<()> { | ||
*self.flush_called.lock().unwrap() = true; | ||
Ok(()) | ||
} | ||
|
||
fn shutdown(&self) -> TraceResult<()> { | ||
let mut count = self.shutdown_count.lock().unwrap(); | ||
*count += 1; | ||
Ok(()) | ||
} | ||
} | ||
|
||
#[test] | ||
fn drop_test_with_multiple_providers() { | ||
let shutdown_called = Arc::new(Mutex::new(0)); | ||
let flush_called = Arc::new(Mutex::new(false)); | ||
|
||
{ | ||
// Create a shared TracerProviderInner and use it across multiple providers | ||
let shared_inner = Arc::new(TracerProviderInner { | ||
processors: vec![Box::new(CountingShutdownProcessor::new( | ||
shutdown_called.clone(), | ||
flush_called.clone(), | ||
))], | ||
config: Config::default(), | ||
is_shutdown: AtomicBool::new(false), | ||
}); | ||
|
||
{ | ||
let tracer_provider1 = super::TracerProvider { | ||
inner: shared_inner.clone(), | ||
}; | ||
let tracer_provider2 = super::TracerProvider { | ||
inner: shared_inner.clone(), | ||
}; | ||
|
||
let tracer1 = tracer_provider1.tracer("test-tracer1"); | ||
let tracer2 = tracer_provider2.tracer("test-tracer2"); | ||
|
||
let _span1 = tracer1.start("span1"); | ||
let _span2 = tracer2.start("span2"); | ||
|
||
// TracerProviderInner should not be dropped yet, since both providers and `shared_inner` | ||
// are still holding a reference. | ||
} | ||
// At this point, both `tracer_provider1` and `tracer_provider2` are dropped, | ||
// but `shared_inner` still holds a reference, so `TracerProviderInner` is NOT dropped yet. | ||
assert_eq!(*shutdown_called.lock().unwrap(), 0); | ||
} | ||
// Verify shutdown was called during the drop of the shared TracerProviderInner | ||
assert_eq!(*shutdown_called.lock().unwrap(), 1); | ||
// Verify flush was not called during drop | ||
assert!(!*flush_called.lock().unwrap()); | ||
} | ||
|
||
#[test] | ||
fn drop_after_shutdown_test_with_multiple_providers() { | ||
let shutdown_called = Arc::new(Mutex::new(0)); // Count the number of times shutdown is called | ||
let flush_called = Arc::new(Mutex::new(false)); | ||
|
||
// Create a shared TracerProviderInner and use it across multiple providers | ||
let shared_inner = Arc::new(TracerProviderInner { | ||
processors: vec![Box::new(CountingShutdownProcessor::new( | ||
shutdown_called.clone(), | ||
flush_called.clone(), | ||
))], | ||
config: Config::default(), | ||
is_shutdown: AtomicBool::new(false), | ||
}); | ||
|
||
// Create a scope to test behavior when providers are dropped | ||
{ | ||
let tracer_provider1 = super::TracerProvider { | ||
inner: shared_inner.clone(), | ||
}; | ||
let tracer_provider2 = super::TracerProvider { | ||
inner: shared_inner.clone(), | ||
}; | ||
|
||
// Explicitly shut down the tracer provider | ||
let shutdown_result = tracer_provider1.shutdown(); | ||
assert!(shutdown_result.is_ok()); | ||
|
||
// Verify that shutdown was called exactly once | ||
assert_eq!(*shutdown_called.lock().unwrap(), 1); | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should verify that shutdown was not called by asserting that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please have a look again, the tests are more focused now. |
||
// TracerProvider2 should observe the shutdown state but not trigger another shutdown | ||
let shutdown_result2 = tracer_provider2.shutdown(); | ||
assert!(shutdown_result2.is_err()); | ||
|
||
// Both tracer providers will be dropped at the end of this scope | ||
} | ||
|
||
// Verify that shutdown was only called once, even after drop | ||
assert_eq!(*shutdown_called.lock().unwrap(), 1); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think TracerProvider is lightweight. It is pretty heavy, and we expect user to create it only once. It is correct to mention cloning is cheap as it is just creating a new ref.