-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Profiler] Fix crash at shutdown with the timer_create
-based CPU profiler
#6268
Conversation
Datadog ReportBranch report: ❌ 1 Failed (1 Known Flaky), 370728 Passed, 2397 Skipped, 16h 45m 54.1s Total Time ❌ Failed Tests (1)
New Flaky Tests (1)
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing the following branches/commits: Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6268) - mean (70ms) : 68, 72
. : milestone, 70,
master - mean (70ms) : 68, 71
. : milestone, 70,
section CallTarget+Inlining+NGEN
This PR (6268) - mean (1,118ms) : 1101, 1136
. : milestone, 1118,
master - mean (1,116ms) : 1099, 1133
. : milestone, 1116,
gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6268) - mean (108ms) : 106, 110
. : milestone, 108,
master - mean (108ms) : 106, 110
. : milestone, 108,
section CallTarget+Inlining+NGEN
This PR (6268) - mean (774ms) : 761, 788
. : milestone, 774,
master - mean (775ms) : 762, 787
. : milestone, 775,
gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6268) - mean (92ms) : 90, 94
. : milestone, 92,
master - mean (91ms) : 90, 93
. : milestone, 91,
section CallTarget+Inlining+NGEN
This PR (6268) - mean (727ms) : 712, 743
. : milestone, 727,
master - mean (731ms) : 716, 746
. : milestone, 731,
gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6268) - mean (189ms) : 186, 193
. : milestone, 189,
master - mean (191ms) : 185, 196
. : milestone, 191,
section CallTarget+Inlining+NGEN
This PR (6268) - mean (1,226ms) : 1200, 1252
. : milestone, 1226,
master - mean (1,229ms) : 1208, 1250
. : milestone, 1229,
gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6268) - mean (276ms) : 272, 279
. : milestone, 276,
master - mean (275ms) : 270, 280
. : milestone, 275,
section CallTarget+Inlining+NGEN
This PR (6268) - mean (953ms) : 932, 973
. : milestone, 953,
master - mean (949ms) : 934, 964
. : milestone, 949,
gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat X
axisFormat %s
todayMarker off
section Baseline
This PR (6268) - mean (264ms) : 260, 268
. : milestone, 264,
master - mean (264ms) : 260, 268
. : milestone, 264,
section CallTarget+Inlining+NGEN
This PR (6268) - mean (927ms) : 912, 943
. : milestone, 927,
master - mean (935ms) : 913, 957
. : milestone, 935,
|
352e5e7
to
e011b72
Compare
Throughput/Crank Report ⚡Throughput results for AspNetCoreSimpleController comparing the following branches/commits: Cases where throughput results for the PR are worse than latest master (5% drop or greater), results are shown in red. Note that these results are based on a single point-in-time result for each branch. For full results, see one of the many, many dashboards! gantt
title Throughput Linux x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6268) (11.158M) : 0, 11158423
master (11.152M) : 0, 11152138
benchmarks/2.9.0 (11.033M) : 0, 11032866
section Automatic
This PR (6268) (7.280M) : 0, 7280331
master (7.262M) : 0, 7261734
benchmarks/2.9.0 (7.786M) : 0, 7785853
section Trace stats
master (7.549M) : 0, 7548778
section Manual
master (11.173M) : 0, 11173023
section Manual + Automatic
This PR (6268) (6.781M) : 0, 6781201
master (6.794M) : 0, 6793544
section DD_TRACE_ENABLED=0
master (10.214M) : 0, 10214196
gantt
title Throughput Linux arm64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6268) (9.751M) : 0, 9751363
master (9.531M) : 0, 9530515
benchmarks/2.9.0 (9.495M) : 0, 9494821
section Automatic
This PR (6268) (6.497M) : 0, 6496948
master (6.486M) : 0, 6486418
section Trace stats
master (6.867M) : 0, 6867398
section Manual
master (9.426M) : 0, 9425938
section Manual + Automatic
This PR (6268) (6.009M) : 0, 6008606
master (6.122M) : 0, 6121877
section DD_TRACE_ENABLED=0
master (8.820M) : 0, 8820217
gantt
title Throughput Windows x64 (Total requests)
dateFormat X
axisFormat %s
section Baseline
This PR (6268) (9.761M) : 0, 9761440
master (9.876M) : 0, 9875828
benchmarks/2.9.0 (10.020M) : 0, 10019592
section Automatic
This PR (6268) (6.314M) : 0, 6314238
master (6.174M) : 0, 6174285
benchmarks/2.9.0 (7.255M) : 0, 7255257
section Trace stats
master (6.971M) : 0, 6971003
section Manual
master (9.416M) : 0, 9415959
section Manual + Automatic
This PR (6268) (5.955M) : 0, 5955291
master (5.884M) : 0, 5883842
section DD_TRACE_ENABLED=0
master (8.894M) : 0, 8893965
|
Benchmarks Report for tracer 🐌Benchmarks for #6268 compared to master:
The following thresholds were used for comparing the benchmark speeds:
Allocation changes below 0.5% are ignored. Benchmark detailsBenchmarks.Trace.ActivityBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AgentWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.AspNetCoreBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.DbCommandBenchmark - Same speed ✔️ Same allocations ✔️Raw results
Benchmarks.Trace.ElasticsearchBenchmark - Slower
|
Benchmark | diff/base | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearch‑net6.0 | 1.126 | 1,172.70 | 1,320.35 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | CallElasticsearch |
net6.0 | 1.17μs | 0.384ns | 1.38ns | 0.0135 | 0 | 0 | 976 B |
master | CallElasticsearch |
netcoreapp3.1 | 1.54μs | 2.2ns | 8.52ns | 0.0133 | 0 | 0 | 976 B |
master | CallElasticsearch |
net472 | 2.69μs | 2.28ns | 8.85ns | 0.158 | 0 | 0 | 995 B |
master | CallElasticsearchAsync |
net6.0 | 1.38μs | 0.616ns | 2.39ns | 0.0131 | 0 | 0 | 952 B |
master | CallElasticsearchAsync |
netcoreapp3.1 | 1.71μs | 0.599ns | 2.16ns | 0.0137 | 0 | 0 | 1.02 KB |
master | CallElasticsearchAsync |
net472 | 2.68μs | 0.98ns | 3.67ns | 0.166 | 0 | 0 | 1.05 KB |
#6268 | CallElasticsearch |
net6.0 | 1.32μs | 0.378ns | 1.36ns | 0.0139 | 0 | 0 | 976 B |
#6268 | CallElasticsearch |
netcoreapp3.1 | 1.67μs | 1.14ns | 4.4ns | 0.0134 | 0 | 0 | 976 B |
#6268 | CallElasticsearch |
net472 | 2.61μs | 2.98ns | 10.7ns | 0.157 | 0 | 0 | 995 B |
#6268 | CallElasticsearchAsync |
net6.0 | 1.34μs | 0.616ns | 2.39ns | 0.0134 | 0 | 0 | 952 B |
#6268 | CallElasticsearchAsync |
netcoreapp3.1 | 1.6μs | 1.27ns | 4.94ns | 0.0136 | 0 | 0 | 1.02 KB |
#6268 | CallElasticsearchAsync |
net472 | 2.61μs | 2.09ns | 8.08ns | 0.166 | 0 | 0 | 1.05 KB |
Benchmarks.Trace.GraphQLBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | ExecuteAsync |
net6.0 | 1.37μs | 0.819ns | 3.06ns | 0.0131 | 0 | 0 | 952 B |
master | ExecuteAsync |
netcoreapp3.1 | 1.71μs | 0.913ns | 3.42ns | 0.0128 | 0 | 0 | 952 B |
master | ExecuteAsync |
net472 | 1.85μs | 0.496ns | 1.92ns | 0.145 | 0 | 0 | 915 B |
#6268 | ExecuteAsync |
net6.0 | 1.26μs | 0.581ns | 2.25ns | 0.0133 | 0 | 0 | 952 B |
#6268 | ExecuteAsync |
netcoreapp3.1 | 1.58μs | 1.16ns | 4.5ns | 0.0126 | 0 | 0 | 952 B |
#6268 | ExecuteAsync |
net472 | 1.81μs | 0.796ns | 2.98ns | 0.145 | 0 | 0 | 915 B |
Benchmarks.Trace.HttpClientBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | SendAsync |
net6.0 | 4.4μs | 2.26ns | 8.15ns | 0.0331 | 0 | 0 | 2.31 KB |
master | SendAsync |
netcoreapp3.1 | 5.19μs | 6.13ns | 23.7ns | 0.0382 | 0 | 0 | 2.85 KB |
master | SendAsync |
net472 | 7.5μs | 1.98ns | 7.69ns | 0.494 | 0 | 0 | 3.12 KB |
#6268 | SendAsync |
net6.0 | 4.42μs | 2.55ns | 9.55ns | 0.0331 | 0 | 0 | 2.31 KB |
#6268 | SendAsync |
netcoreapp3.1 | 5.34μs | 1.75ns | 6.77ns | 0.0374 | 0 | 0 | 2.85 KB |
#6268 | SendAsync |
net472 | 7.35μs | 2.24ns | 8.66ns | 0.493 | 0 | 0 | 3.12 KB |
Benchmarks.Trace.ILoggerBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 1.52μs | 1.19ns | 4.47ns | 0.0234 | 0 | 0 | 1.64 KB |
master | EnrichedLog |
netcoreapp3.1 | 2.16μs | 0.869ns | 3.13ns | 0.0225 | 0 | 0 | 1.64 KB |
master | EnrichedLog |
net472 | 2.59μs | 3ns | 11.6ns | 0.249 | 0 | 0 | 1.57 KB |
#6268 | EnrichedLog |
net6.0 | 1.53μs | 0.433ns | 1.62ns | 0.023 | 0 | 0 | 1.64 KB |
#6268 | EnrichedLog |
netcoreapp3.1 | 2.18μs | 1.3ns | 4.85ns | 0.0218 | 0 | 0 | 1.64 KB |
#6268 | EnrichedLog |
net472 | 2.7μs | 2.26ns | 8.76ns | 0.249 | 0 | 0 | 1.57 KB |
Benchmarks.Trace.Log4netBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 118μs | 152ns | 590ns | 0 | 0 | 0 | 4.28 KB |
master | EnrichedLog |
netcoreapp3.1 | 123μs | 176ns | 681ns | 0 | 0 | 0 | 4.28 KB |
master | EnrichedLog |
net472 | 152μs | 151ns | 585ns | 0.675 | 0.225 | 0 | 4.46 KB |
#6268 | EnrichedLog |
net6.0 | 118μs | 211ns | 816ns | 0.0591 | 0 | 0 | 4.28 KB |
#6268 | EnrichedLog |
netcoreapp3.1 | 123μs | 295ns | 1.14μs | 0.0608 | 0 | 0 | 4.28 KB |
#6268 | EnrichedLog |
net472 | 154μs | 147ns | 569ns | 0.694 | 0.231 | 0 | 4.46 KB |
Benchmarks.Trace.NLogBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 3.05μs | 1.91ns | 7.39ns | 0.0305 | 0 | 0 | 2.2 KB |
master | EnrichedLog |
netcoreapp3.1 | 4.08μs | 0.894ns | 3.46ns | 0.0285 | 0 | 0 | 2.2 KB |
master | EnrichedLog |
net472 | 4.84μs | 1.45ns | 5.61ns | 0.32 | 0 | 0 | 2.02 KB |
#6268 | EnrichedLog |
net6.0 | 3.06μs | 1.4ns | 5.24ns | 0.0306 | 0 | 0 | 2.2 KB |
#6268 | EnrichedLog |
netcoreapp3.1 | 4.23μs | 1.11ns | 4.29ns | 0.0296 | 0 | 0 | 2.2 KB |
#6268 | EnrichedLog |
net472 | 4.87μs | 0.965ns | 3.74ns | 0.32 | 0 | 0 | 2.02 KB |
Benchmarks.Trace.RedisBenchmark - Faster 🎉 Same allocations ✔️
Faster 🎉 in #6268
Benchmark
base/diff
Base Median (ns)
Diff Median (ns)
Modality
Benchmarks.Trace.RedisBenchmark.SendReceive‑net472
1.137
2,302.87
2,025.78
Benchmark | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.RedisBenchmark.SendReceive‑net472 | 1.137 | 2,302.87 | 2,025.78 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | SendReceive |
net6.0 | 1.5μs | 1.01ns | 3.92ns | 0.0158 | 0 | 0 | 1.14 KB |
master | SendReceive |
netcoreapp3.1 | 1.79μs | 0.92ns | 3.44ns | 0.0151 | 0 | 0 | 1.14 KB |
master | SendReceive |
net472 | 2.3μs | 1.64ns | 6.34ns | 0.183 | 0 | 0 | 1.16 KB |
#6268 | SendReceive |
net6.0 | 1.41μs | 1.15ns | 4.44ns | 0.0161 | 0 | 0 | 1.14 KB |
#6268 | SendReceive |
netcoreapp3.1 | 1.75μs | 1.43ns | 5.54ns | 0.0157 | 0 | 0 | 1.14 KB |
#6268 | SendReceive |
net472 | 2.03μs | 0.893ns | 3.22ns | 0.183 | 0 | 0 | 1.16 KB |
Benchmarks.Trace.SerilogBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | EnrichedLog |
net6.0 | 2.69μs | 0.725ns | 2.81ns | 0.0215 | 0 | 0 | 1.6 KB |
master | EnrichedLog |
netcoreapp3.1 | 3.85μs | 1.5ns | 5.79ns | 0.0213 | 0 | 0 | 1.65 KB |
master | EnrichedLog |
net472 | 4.45μs | 3.04ns | 11.8ns | 0.322 | 0 | 0 | 2.04 KB |
#6268 | EnrichedLog |
net6.0 | 2.71μs | 0.666ns | 2.49ns | 0.0231 | 0 | 0 | 1.6 KB |
#6268 | EnrichedLog |
netcoreapp3.1 | 4.01μs | 2.09ns | 8.1ns | 0.022 | 0 | 0 | 1.65 KB |
#6268 | EnrichedLog |
net472 | 4.75μs | 2.99ns | 11.6ns | 0.324 | 0 | 0 | 2.04 KB |
Benchmarks.Trace.SpanBenchmark - Faster 🎉 Same allocations ✔️
Faster 🎉 in #6268
Benchmark
base/diff
Base Median (ns)
Diff Median (ns)
Modality
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0
1.135
558.01
491.85
Benchmark | base/diff | Base Median (ns) | Diff Median (ns) | Modality |
---|---|---|---|---|
Benchmarks.Trace.SpanBenchmark.StartFinishScope‑net6.0 | 1.135 | 558.01 | 491.85 |
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | StartFinishSpan |
net6.0 | 398ns | 0.469ns | 1.76ns | 0.00798 | 0 | 0 | 576 B |
master | StartFinishSpan |
netcoreapp3.1 | 634ns | 0.471ns | 1.83ns | 0.00767 | 0 | 0 | 576 B |
master | StartFinishSpan |
net472 | 676ns | 0.662ns | 2.56ns | 0.0917 | 0 | 0 | 578 B |
master | StartFinishScope |
net6.0 | 558ns | 0.362ns | 1.4ns | 0.00972 | 0 | 0 | 696 B |
master | StartFinishScope |
netcoreapp3.1 | 745ns | 0.72ns | 2.79ns | 0.00963 | 0 | 0 | 696 B |
master | StartFinishScope |
net472 | 904ns | 0.487ns | 1.89ns | 0.104 | 0 | 0 | 658 B |
#6268 | StartFinishSpan |
net6.0 | 411ns | 0.189ns | 0.733ns | 0.00804 | 0 | 0 | 576 B |
#6268 | StartFinishSpan |
netcoreapp3.1 | 570ns | 0.667ns | 2.58ns | 0.00775 | 0 | 0 | 576 B |
#6268 | StartFinishSpan |
net472 | 693ns | 0.613ns | 2.37ns | 0.0917 | 0 | 0 | 578 B |
#6268 | StartFinishScope |
net6.0 | 492ns | 0.388ns | 1.5ns | 0.00964 | 0 | 0 | 696 B |
#6268 | StartFinishScope |
netcoreapp3.1 | 760ns | 0.671ns | 2.6ns | 0.00941 | 0 | 0 | 696 B |
#6268 | StartFinishScope |
net472 | 910ns | 1.83ns | 7.09ns | 0.104 | 0 | 0 | 658 B |
Benchmarks.Trace.TraceAnnotationsBenchmark - Same speed ✔️ Same allocations ✔️
Raw results
Branch | Method | Toolchain | Mean | StdError | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
master | RunOnMethodBegin |
net6.0 | 666ns | 1.55ns | 5.99ns | 0.00981 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
netcoreapp3.1 | 893ns | 2.04ns | 7.62ns | 0.00932 | 0 | 0 | 696 B |
master | RunOnMethodBegin |
net472 | 1.15μs | 1.27ns | 4.93ns | 0.104 | 0 | 0 | 658 B |
#6268 | RunOnMethodBegin |
net6.0 | 611ns | 0.497ns | 1.93ns | 0.00981 | 0 | 0 | 696 B |
#6268 | RunOnMethodBegin |
netcoreapp3.1 | 928ns | 0.859ns | 3.33ns | 0.00931 | 0 | 0 | 696 B |
#6268 | RunOnMethodBegin |
net472 | 1.16μs | 1.09ns | 4.2ns | 0.104 | 0 | 0 | 658 B |
b8afe0d
to
449cfb1
Compare
15b90e3
to
c0a1e84
Compare
c0a1e84
to
e0b4bb8
Compare
timer_create
-based CPU profiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary of changes
Make sure the profiler does not crash at shutdown.
Reason for change
For the
timer_create
-based CPU profiler, each managed thread is associated to timer so that every 9ms, a callback is called and we collect the thread snapshot (metadata + callstack). To store in a signal-safe manner the thread snapshot we use asynchronized_pool_resource
(https://en.cppreference.com/w/cpp/memory/synchronized_pool_resource) +mmap
as an upstream (when the pool needs more memory, it calls intommap
and notmalloc
). As an optimization, this resource uses a thread local variable so each native thread has its own pools (reduce lock contention).When an application is shutting down, the CLR follows a sequence of steps to stop and clean the resources it used. One of those steps is to notify the ICorprofiler profilers (through the callbacks). When a managed thread is terminated, it cleans up the associated resource, call the
ThreadDestroyed
callback... When the CLR is done with the thread clean up (the routine passed to the native thread ends), the libc takes on and cleans up the native resources (clean up thread local storage...).But sometimes, those
ThreadDestroyed
is not called and it happens when the Execution Engine has started to shutdown https://github.com/dotnet/runtime/blob/main/src/coreclr/vm/threads.cpp#L2777This leads up to a crash situation where:
Implementation details
pthread_create
wrapping to wrap the thread entrypoint.CorProfilerCallback
to do extra cleanup.Test coverage
Current tests should not crash at shutdown
Other details