-
Our current application suffers a bit from thread contention, and we believe this is in part due to lock utilization around caching, which is used in many places of our APIs. I thought about the idea of tracking lock acquisitions and lock exits in telemetry, specifically as span events: evey time a lock was requested, an event would be recorded, then when the lock is acquired, another event would be recorded, and finally when the critical section was over, a lock exit event would be recorded. From what I checked thus far, there is no way to "be notified" of those events using the Alternatively, I wondered if leveraging the new .NET8 interceptors capabilities could achieve something like this without touching the original code by adding the additional tracing to My questions are as follows:
Somewhat related: |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 7 replies
-
For 2: I have used https://github.com/open-telemetry/opentelemetry-dotnet-contrib/tree/main/src/OpenTelemetry.Instrumentation.Runtime#processruntimedotnetmonitorlock_contentioncount to first find that an app is suffering from too much contention. And then For 1: I am not sure if using Activity for this purpose is well suited... Activity itself introduce some contention, and it may not be worth the cost (cost here = the cost of storing the spans somewhere, like a vendor). For 3: I'll tag the .NET runtime owners who can help with this. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/dotnet/runtime/blob/4822e3c3aa77eb82b2fb33c9321f923cf11ddde6/src/libraries/System.Diagnostics.DiagnosticSource/src/System/Diagnostics/ActivitySource.cs#L423 This is the contention which I ultimately found in my case! (As must be obvious now, I was trying to find contention within OTel SDK itself). I believe in your case, you already know of contention, but you want to better track it and make changes.. |
Beta Was this translation helpful? Give feedback.
-
I suspect for many scenarios the telemetry this generates would be very verbose. .NET supports workloads that acquire and release locks millions of times per second. There certainly could be other scenarios where the verbosity is much lower or where the dev is willing to take high overheads, but my initial guess is that it would be somewhat limited.
If you wanted to experiment with it, you can also use EventListener to listen to those same contention events @cijothomas mentioned. You could then encode the events as logs (or ActivityEvents, or anything else) and include them with other telemetry. One gotcha is that because the contention events are captured from the runtime's native code implementation the events are placed in a buffer and dispatched asynchronously from another thread. All the events include a timestamp and thread ID for the thread where the contention originally occurred. However other context that you might find interesting such as a callstack or a reference to originating thread's current Activity object isn't something that EventListener supports. There are some other ways to get those events depending on how far down the rabbit hole you wanted to go. @cijothomas mentioned dotnet-trace which is very straightforward if you can run additional tools on the production machine. There is also the DiagnosticClient and TraceEvent libraries (example) that dotnet-trace is built on top of or the ICorProfiler APIs if you want to get down to the metal. (Traditionally ICorProfiler is complicated enough to use that very few people do it outside of dedicated profiling tool authors)
In terms of telemetry that would be uploaded and stored I assume it is too high volume to be effective for general use. For better understanding long application pauses/hangs/poor performance I suspect stack sampling somewhere between 1-1000Hz would be useful at a lower total amount of telemetry collected. I believe the OTel profiling working group is exploring scenarios like that. HTH! |
Beta Was this translation helpful? Give feedback.
I suspect for many scenarios the telemetry this generates would be very verbose. .NET supports workloads that acquire and release locks millions of times per second. There certainly could be other scenarios where the verbosity is much lower or where the dev is willing to take high overheads, but my initial guess is that it would be somewhat limited.