-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preallocate p.stacks() before calling runtime.GoroutineProfile() #11
Closed
+18
−5
Closed
Changes from 1 commit
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I was thinking about doing this outside the
GoroutineProfile()
func, otherwise it's costing us extra cycles even after we've settled on a suitably largestacks
size. (I wasn't able to see a significant effect from this in the benchmark, but I think it's theoretically clear why this is higher overhead)Maybe something like this? And then changing all the places where we initialize a &profiler{} struct right now?
Let me know what you think!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! This is the correct way.
I was trying to avoid
make()
byif len(p.stacks) < nGoroutines {
but as you indicatedruntime.NumGoroutine()
will be redundantly called most of the time.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Do you have interest/time in modifying your PR as suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I have done some further tests on this.
The issue is: as
runtime.NumGoroutine
is faster than callingGoroutineProfile
I thought it would make sense to optimize for this as I opened the case.But I did not knowp.stacks
is reused. My previous implementation was still not good enough as it still adds more cycles where goroutine counts do not change between calls toGoroutineProfile
.I pushed a sample implementation for the above one, but it seems like it does not make too much difference because the
newProfile
is not on the hot path anymore only initialized once. It adds more complexity with too little gain. So, it is up to you::)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that it won't have a big impact. So if you feel like this is a bad tradeoff, then I'd also prefer to not merge the patch
Ok, I think we should close this and look for other opportunities.
I think the main issue with fgprof right now is that it doesn't scale to programs that use tons of goroutines. I'm starting a new job at Datadog's Profiler team in January that might give me the opportunity to submit proposals and patches to the Go core for it.
But meanwhile it might be nice for the profiler to detect if it's too slow (and causing issues for the profiled program). When that happens we could either dynamically adjust the sampling rate, or fail with an error.
Let me know if you're interested in playing with this idea.
Also, if you change your mind and want to merge the PR, I'd be okay with that too. It's not a lot of complexity to add IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mental model is this: Calling
runtime.GoroutineProfile()
callsstopTheWorld()
so any time we spent inside of it, we assume the users application is totally stopped, and we're "stealing" its time for profiling. The goal is to not steal more than e.g 1% of the programs wall clock time.We don't really know how stuff the program is doing when not profiling, but I don't think it's relevant for the goal of the profiler to minimize interfering with the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I understand. I forgot about the STW in
goroutineprofile
. I am still in learning phase for the runtime. Now it makes sense also looking at the wall time as well (instead of CPU time).I will do some experiments myself and think a bit more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was too busy lately and could not find time to look/think into this too much.
But I thought about the problem a bit more and I agree that your approach of trying to bound profiler wall time into %1 seems both simple to implement and powerful enough like you suggested.
Couple of issues that came to my mind, a bit overthinking maybe for the current phase, but curious about your thoughts on these:
While I have not tested it yet: I assume when STW occurs, it might have some adverse effects on the system as it will abuse cache locality of some processors. One test that we could either do manually or automatically is to have a large number of CPU intensive goroutines executing and do some work with them. THen we calculate these both with profiler and without profiler and see if the output is also changed like %1 or so. It would be nice to test this as there might be other kind of adverse effects as well.
I think this is more major problem: what happens if
GoroutineProfile
itself is actually taking more than %1? Yeah, we could simply skip some samples on the next iterations but that might actually hurt accuracy? What if it took 30ms? 50ms? 100ms? Those numbers are just (maybe?) too big for the application? Well again: I have not yet tested it myself but I think this is not too uncommon if you have many goroutines: runtime: GoroutineProfile causes latency spikes in high-goroutine-count application golang/go#33250. Well one idea might be to make an estimation on how long aGoroutineProfile
might take based on previous data, but again: I don't know what we will do even if that is the case. Another idea: get some help from the runtime: like what ifGoroutineProfile
returns some data even iflen(p)<=n
. Even when we don't have all the goroutine data, at least we might have something(maybe some kind of randomness can be applied in the runtime for returning different GOROs everytime?). So, instead of returning nothing, we will have at least something to profile this way. Otherwise it is completely binary, right? Any idea on this?Ah one last thing:
While I was re-reading your previous comments, I could not understand this one:
Can you elaborate a bit what this means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sumerc this PR is an odd place for this discussion, so I've created a new issue and will reply there: #12
If you're okay with it, I think we should close this PR for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.