-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large CPU impact when running continuous profiler for dozens of applications #4498
Comments
Hello @jbparker This is known issue and we do not recommend enabling the profiler for multiple applications on the same host. We have a section That said, I would like to say that's something we would like to address in the future, because as you would imagine, you are not the first one reporting this to us. I have question(s) and maybe a lead to lower the CPU consumption:
The main offender in the .NET profiler is the WallTime profiler (every 16ms, we collect the stack of 5 threads). If this profiling data are not that useful to you, you could disable it by setting the environment variable Explanation about the observed overhead: |
Got it, thanks @gleocadie. I'll definitely try with setting Thanks for the details & breakdown on the overhead. Assuming we have drastically less CPU consumption with wall-time profiler disabled, this will be a very decent spot to land in. I'll follow up shortly with any further questions or if we can just close this one out. |
@gleocadie was able to put this in - CPU idle went from 5% with Not bad at all. That said, I'm trying to figure out what exactly in the UI this might show. The method-level detail seems to have disappeared from both "Code Hotspots" and the Profiles "Flame graph" (although I guess expecting CPU overhead alone to drive that was a bit of a stretch). Is it fair to say that code we write can't have method-level CPU time in a way that might show detail similar to what shows in "Code Hotspots"? If so, I think we may be better off disabling this entirely until the possibility of this running a little less "hot" becomes a possibility. |
👋 @jbparker We could change some other settings if you would like to keep the walltime/code hotspot and lowering the profiler overhead in the meantime. First remove the environment variable
|
Describe the bug
When running the continuous profiler in dozens of Windows Services running on a VM, the profiler runs idle with materially (~+25%) more CPU resources than with the profiler off.
When using the strategy described here to allow CLR profiling to occur for tracing without the DD CP using
DD_PROFILING_ENABLED=0
andCOR_ENABLE_PROFILING=1
(since we are using netframework), the CPU impact is minimal (~+1%).To Reproduce
Steps to reproduce the behavior:
DD_PROFILING_ENABLED=1
andCOR_ENABLE_PROFILING=1
on a few dozen .NET Framework 4.8 application running in a windows serviceExpected behavior
We expect some level of performance hit here - somewhere in the level of 5-10% given this comment and this note from Reduce overhead when using the profiler:
But, 25-30% seems excessive. This is especially the case when using older versions, as we experienced the same issue as #3625 where the CPU and open handles would stair and eventually lock up a VM entirely.
Overall, we need something that matches what the profiler's stated aim is so that we can run it in production without worrying that DD (and not our own code) is indeed responsible for performance problems:
Screenshots
CPU view from Azure Monitor showing CPU Percentage for the instrumented applications
Runtime environment (please complete the following information):
Additional context
We love the detail that the CP gives you without having to instrument so much of the code. If we need to plan on only enabling it via deployment when there is a problem in the application, that would be a significant loss and would prevent us from utilizing DD to the extent that we'd like to replace virtually existing observability platforms.
The text was updated successfully, but these errors were encountered: