Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address the comments from the reviews #3

Open
wants to merge 2 commits into
base: fix/improve-startup-performance
Choose a base branch
from

Conversation

ansman
Copy link
Member

@ansman ansman commented Dec 11, 2024

No description provided.

Description
When initializing the OpenTelemetry Android SDK with disk buffering enabled, we
discovered that synchronous disk space checks were causing ANRs in production.
These checks occur during the creation of disk buffering exporters, specifically
in `DiskManager.getMaxFolderSize()`, which makes blocking IPC calls through
`StorageManager.getAllocatableBytes()` on the main thread. The issue manifests
in the following ANR stacktrace:
```
android.os.BinderProxy.transact (BinderProxy.java:662)
android.os.storage.IStorageManager$Stub$Proxy.getAllocatableBytes (IStorageManager.java:2837)
android.os.storage.StorageManager.getAllocatableBytes (StorageManager.java:2414)
android.os.storage.StorageManager.getAllocatableBytes (StorageManager.java:2404)
io.opentelemetry.android.internal.services.CacheStorage.getAvailableSpace (CacheStorage.java:66)
io.opentelemetry.android.internal.services.CacheStorage.ensureCacheSpaceAvailable (CacheStorage.java:50)
io.opentelemetry.android.internal.features.persistence.DiskManager.getMaxFolderSize (DiskManager.kt:58)
io.opentelemetry.android.OpenTelemetryRumBuilder.createStorageConfiguration (OpenTelemetryRumBuilder.java:338)
io.opentelemetry.android.OpenTelemetryRumBuilder.build (OpenTelemetryRumBuilder.java:286)
```

Our Solution
To fix this we moved initialization to run on a background executor and
buffer the data in memory until it completes.

The process works like this:
1. Initialize the SDK with `BufferDelegatingExporter` instances that can
   immediately accept telemetry data.
2. Move exporter initialization off the main thread.
3. Once async initialization completes, flush buffered signals to initialized
   exporters and delegate all future signals.

The primary goal of this solution is to be unobtrusive and prevent ANRs caused
by initialization of disk exporters, while preventing signals from being
dropped.

Testing
We have added unit tests to cover the buffering, delevation, and RUM
building. We've also verified this with both disk enabled and disk
disabled.
@ansman ansman force-pushed the fix/improve-startup-performance-changes branch from 1f5fd47 to 6bcdec0 Compare December 11, 2024 22:02
@ansman ansman force-pushed the fix/improve-startup-performance-changes branch from 6bcdec0 to afaf8e9 Compare December 12, 2024 02:06
Copy link
Collaborator

@Ejdangerfield Ejdangerfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta remove the signal files generated. When running the sdk locally they have it set up to collect signals in local files

@ansman ansman force-pushed the fix/improve-startup-performance branch from ff505fc to 5c4f0e5 Compare December 16, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants