Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak from the profiler #1025

Closed
nik-andreev opened this issue Oct 6, 2021 · 20 comments
Closed

Possible memory leak from the profiler #1025

nik-andreev opened this issue Oct 6, 2021 · 20 comments
Assignees
Labels
bug unintended behavior that has to be fixed needs-investigation requires work from team member profiler

Comments

@nik-andreev
Copy link

nik-andreev commented Oct 6, 2021

Starting with version v1.33.0 there appears to be a memory leak in the profiler. I can't reproduce yet but I'm basing the observation on series of profile snapshots taken from a prod application that ran overnight and in the last 30 minutes was mostly idle.

I saw that there was a large refactor in v1.33.0 (#842) but I'd like not to downgrade to v.1.32.1 because we really need the other memory related fix from #976

image

image

image

image

image

image

@nik-andreev
Copy link
Author

If useful I could try to revert to v1.32.1 + #976 - gopkg.in/DataDog/dd-trace-go.v1 v1.32.1-0.20210813061908-394c515a0fb3

@felixge felixge self-assigned this Oct 15, 2021
@felixge felixge added needs-investigation requires work from team member profiler bug unintended behavior that has to be fixed labels Oct 15, 2021
@felixge
Copy link
Member

felixge commented Oct 15, 2021

@nik-andreev sorry for the late reply, I was on vacation. I'm also very sorry you're hitting this problem!

This is indeed an unexpected amount of memory usage and I consider it high priority to figure out the root cause for it.

  1. Did the memory usage continue to grow? Or does it sometimes shrink and eventually settle on a certain value?
  2. Is this application using a lot of recursion?
  3. Does the application have a huge code base?
  4. Can you capture a time series of BuckHashSys from https://pkg.go.dev/runtime#MemStats?

The reason I'm asking is because the way Go's built-in heap profiler is designed, it keeps using more memory until the stack traces of all allocating code paths have been hit. Usually that's a small amount of possible stack traces, but recursion can greatly expand this. Most people are unaware of this, because the memory used by the memory profiler is not reported in the heap profile. However, it can be observed via MemStats.BuckHashSys as mentioned above.

v1.33.0 of dd-trace-go added a new feature which ends up duplicating some of the memory already allocated by Go internally which also ends up making it visible in the heap profile. The way it's implemented it should not be a memory leak (beyond the leakyness that's already inherent to the built-in heap profiler), but seeing > 500MB allocated in your last screenshots makes me wonder if there is a bug.

If you want, feel free to ping me on our public slack for direct support on this.

@nik-andreev
Copy link
Author

No worries @felixge, thanks for looking!

We ended up reverting to v1.32.1 + #976 and that solved the immediate problem. I won't be able to go back to v1.33 for at least few weeks so the data on 4 might be delayed.

To answer your questions:

  1. We publish new versions of our application about 30+ times per day, so the only time I can notice it is in the early mornings/weekends when there was no dev activity. In all of these cases it just continue growing, though at different velocity. It could be small for 8 hours (~100MB) then jump to 500MB and will stay like this until the app is shut down in the morning.

  2. There are some places that use recursion but the depth is usually shallow (5-6 max).

  3. Yes, the code base is quite large, mostly due to generated code. As an estimate - we have seen gopls regularly taking 10-15GB+ of RAM and the binary size is about 250MB.

  4. I'll add instrumentation for BuckHashSys but as I mentioned before it will be using the pre v1.33.0 version for a while.

@felixge
Copy link
Member

felixge commented Oct 15, 2021

@nik-andreev thanks for the fast response. If you can instrument BuckHashSys, there should be no need to go back to v1.33.0. I suspect it will be enough to tell us if there is a real memory leak here or not. Even if it's not a "real" memory leak, the amount of memory usage would definitely be problematic and we'll have to consider ways to mitigate it.

I just thought about another thing worth instrumenting: Can you pass a statsd client to the profiler like this? If yes, it'd be very interesting to see the datadog.profiler.go.uploaded_profile_bytes metric this will enable. If the problem is indeed caused by the large code base, this should show profile uploads as large as 250MB when the problem hits.

@nik-andreev
Copy link
Author

@felixge, finally was able to collect some data. The app emits the size of BuckHashSys every 20 seconds. Here is the chart over the past weekend:
image

I haven't yet changed the profiler to have statsd client.

@felixge
Copy link
Member

felixge commented Oct 28, 2021

@nik-andreev thank you so much for sharing this. Is each line representing a different instance of your app?

If yes it seems like one instance is using 160 MB of BuckHashSys and still growing towards the end (blue line). That's a lot. I just had a look at our internal apps written in Go, and the highest value I found was one app peaking at 70 MB. But even that one was a huge outlier, most apps were hovering around 10MB. And those are all pretty big code bases.

Based on this I'd say that the profiler memory usage you're seeing in v1.33.0 makes sense. The way we've implemented delta profiles requires hanging on to the most recent allocation profile for a bit and in your case that can be ~160 MB * X where X is the overhead caused by the fact that the profile dd-trace-go uses is symbolized (BuckHashSys is not). This is very unfortunate, but it's really just an amplification of the BuckHashSys memory usage of the Go runtime itself.

Anyway, I can think of a few ways to improve the situation:

  1. We could add an option to disable delta profiles for apps like yours that produce an incredible amount of allocation profiling data. The downside is that you won't be able to use the aggregation/compare feature for allocations that is enabled by delta profiles.
  2. We could try to make the implementation of delta profiles more efficient so the X overhead factor compared to BuckHashSys becomes smaller.
  3. I could look at some of your profiles to help you to understand if your high BuckHashSys is caused by the code generation you mentioned or by recursion which might help you to make changes to your app to reduce BuckHashSys which will also reduce the overhead of delta profiles.

Let me know if you have any preferences on the above. Personally I'd favor option 3 and/or 1.

@nik-andreev
Copy link
Author

nik-andreev commented Oct 28, 2021

Yep, each line is instance of the app.

The first option is really appealing as we can get back to the main version of the library and experiment easily. It would be great if you can add it in the next scheduled released.

Also appreciate the offer in option 3! I might ping you at some time in the slack channel though not sure when I will get back to it.

@felixge
Copy link
Member

felixge commented Oct 28, 2021

@nik-andreev sounds good. I'll try to land a patch to add a toggle for delta profiles ASAP.

And my offer for option 3 isn't going away. Ping me on slack, or even better send an e-mail to [email protected] and reference my name. This would be the best way to securely exchange/access the relevant data.

felixge added a commit that referenced this issue Oct 28, 2021
Delta profiles require a certain amount of heap memory that is
proportional to MemStats.BuckHashSys (typically ~2x). For applications
that make heavy use of code generation or recursion the memory usage can
be several hundred MB.

Add WithDeltaProfiles() that allows to disable delta profiles for
applications where this is a deal breaker. Those applications will still
see allocation profiles, but they won't be able to aggregate or compare
them.

The default value for WithDeltaProfiles() is true since applications
with high BuckHashSys seem to be rare.

See https://dtdg.co/go-delta-profile-docs for more information about
delta profiles.

Mitigates #1025 and implements PROF-4305.
felixge added a commit that referenced this issue Oct 28, 2021
Delta profiles require a certain amount of heap memory that is
proportional to MemStats.BuckHashSys (typically ~2x). For applications
that make heavy use of code generation or recursion the memory usage can
be several hundred MB.

Add WithDeltaProfiles() that allows to disable delta profiles for
applications where this is a deal breaker. Those applications will still
see allocation profiles, but they won't be able to aggregate or compare
them.

The default value for WithDeltaProfiles() is true since applications
with high BuckHashSys seem to be rare.

See https://dtdg.co/go-delta-profile-docs for more information about
delta profiles.

Mitigates #1025 and implements PROF-4305.
felixge added a commit that referenced this issue Nov 21, 2021
Delta profiles require a certain amount of heap memory that is
proportional to MemStats.BuckHashSys (typically ~2x). For applications
that make heavy use of code generation or recursion the memory usage can
be several hundred MB.

Add WithDeltaProfiles() that allows to disable delta profiles for
applications where this is a deal breaker. Those applications will still
see allocation profiles, but they won't be able to aggregate or compare
them.

The default value for WithDeltaProfiles() is true since applications
with high BuckHashSys seem to be rare.

See https://dtdg.co/go-delta-profile-docs for more information about
delta profiles.

Mitigates #1025 and implements PROF-4305.
@felixge
Copy link
Member

felixge commented Nov 29, 2021

Closing this since #1038 mitigates this problem and will be included in the next release.

As discussed, there are more improvements that could be made here to lower the memory usage of delta profiles, but right now it seems like it would only benefit a very small number of customers (using code generation and/or lots of recursion). That being said, I'm happy to reopen this if more people chime in here.

@felixge felixge closed this as completed Nov 29, 2021
@natebrennand
Copy link

High, just pinging with another datapoint as a user that still had delta profiles enabled (since it's the default setting).
Our application includes a good deal of generated code and weighs in ~100mb.

image

We're going to try the recommended mitigation of disabling the delta profiles.

@felixge
Copy link
Member

felixge commented Apr 20, 2022

@natebrennand thanks, this feedback is really appreciated and helps to bump this higher on our priority list!

One Q: In your case the memory overhead seems to be just 45 MiB? Is this prohibitive for your application? What would be an acceptable amount of memory usage for you? I realize your app has low memory usage in general (just 94 MiB total), so this is quite a lot relatively speaking, but overall it seems like you might not be hurting from using too much memory?

PS: Sorry for the late reply, I was on vacation.

@natebrennand
Copy link

Hi @felixge, no worries about timing, thank you for getting back to me.

Fair questions, I believe this is mainly due to the time period that I took that screenshot of.
Unfortunately I cannot access that data in the aggregate form any more because more than a week has passed to obtain a higher signal snapshot.

Our primary application that we observed this issue on typically runs utilizing 100-150mb of memory, we limit it at ~500mb.
We observed steady increases in consumed memory until it spiked and many pods were OOM killed.
The subsequent drop was our rollback after identifying the issue.

Memory usage & limits when we were affected:
image

@felixge
Copy link
Member

felixge commented Apr 22, 2022

Thanks for sharing those details @natebrennand.

Looking at the screenshot, it seems like the actual OOMs happen when there is a significant spike of memory usage which is probably from spikes of traffic? Do you still see those spikes now? Do you have any rate/concurrency limits in place to avoid traffic spikes consuming too much memory resources?

Anyway, 500 MB is a somewhat tight memory limit, so if you intend to keep it you might be better off without delta profiles for now. That being said, we definitely consider this to be a problem with the feature. I've just bumped it up on our internal roadmap, so hopefully we'll get a chance to iterate on it in the near future!

@pgrzesik
Copy link

Hey @felixge 👋 Was there any further development related to this? I'm observing similar behavior in one of the application that does not use recursion and/or a lot of code generation. Right now I'm observing growing allocations on the Heap in related to deltas and overall growing memory usage that is not allocated on the heap. It results in more frequent OOM kills. What would you advise in that case?

@felixge
Copy link
Member

felixge commented Nov 21, 2024

We implemented #1511 which should have addressed the issue that was raised here. Below is an example data point of the impact this had on one of our internal Go applications:

  • 36x less allocations/min (30.8M to 851k)
  • 6.85x less allocated memory/min (3.31 GiB to 483 MiB)
  • 760x less heap live objects (4.8M to 6.4k)
  • 3.7x less heap live size (524 to 140 MiB)
  • 2.3x less GC overhead resulting in 7.5% lower overall CPU usage

If you're still seeing issues, it'd be great if you could raise a new issue with more details. But it sounds like you might have a memory leak on your end (I'm basing this on your "overall growing memory usage that is not allocated on the heap" comment)?

@PrayagS
Copy link

PrayagS commented Jan 9, 2025

@felixge Any ETA on when this fix will be present in a stable release? For the version that we're using in production, the changelog is massive: v1.67.1...1.999.0-beta.11

And as for the issue, we're also seeing a similar behavior where runtime.go.mem_stats.buck_hash_sys is continuously increasing after enabling the profiler.

Image

The resets in memory usage correspond to deployment rollouts.

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Jan 9, 2025

@PrayagS you might be affected by golang/go#69590. TL;DR: set the environment variable GODEBUG=profstackdepth=32. We changed the profiler in Go 1.23 to collect deeper stacks. But this can result in significantly increased memory usage for some programs. That debug setting restores the previous stack depth.

@PrayagS
Copy link

PrayagS commented Jan 9, 2025

@nsrip-dd Thanks for getting back. We're on Go 1.21.9 and I can see that the stack depth is still 32 in that version (https://go.googlesource.com/go/+/refs/tags/go1.21.9/src/runtime/mprof.go#43).

@nsrip-dd
Copy link
Contributor

nsrip-dd commented Jan 9, 2025

Ah, indeed. To answer your question "Any ETA on when this fix will be present in a stable release?", we have been using more memory-efficient profile processing code by default since v1.47.0, implemented in #1511, which Felix linked above. That fix is for the processing done by this client library.

Unfortunately there's not much we can do in this library to affect the amount of memory used by the Go runtime for profiling (what you see in runtime.go.mem_stats.buck_hash_sys). The heap, block, and mutex profilers never get rid of data over the lifetime of the program. We'd need to change the design upstream. Assuming we come up with a better design and get it accepted, it'll be 6 months minimum before it gets released.

In the mean time, you could perhaps change the sampling rates for the profilers to collect data less often. For example, runtime.MemProfileRate defaults to 512*1024*1024 (512KiB). You could increase that value to make the heap profiler collect data less frequently. You'd still see memory usage grow over time, but more slowly. Not sure if there's any other workaround/mitigation, maybe @felixge has some ideas?

@PrayagS
Copy link

PrayagS commented Jan 11, 2025

Thanks for getting back. Appreciate the detailed response.

Unfortunately there's not much we can do in this library to affect the amount of memory used by the Go runtime for profiling (what you see in runtime.go.mem_stats.buck_hash_sys). The heap, block, and mutex profilers never get rid of data over the lifetime of the program. We'd need to change the design upstream. Assuming we come up with a better design and get it accepted, it'll be 6 months minimum before it gets released.

Noted. Are there any scenarios in your experience where the application code can be affecting this behavior?

Because IIRC we haven't seen this with Pyroscope's Go client. I'll try to test with it again to record more recent data; specifically runtime.go.mem_stats.buck_hash_sys since that wasn't something that we were looking at when using Pyroscope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unintended behavior that has to be fixed needs-investigation requires work from team member profiler
Projects
None yet
Development

No branches or pull requests

6 participants