-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overhead negligible goroutine sampling #31
Comments
This will eventually be implemented upstream in Go. Once this is done, fgprof will automatically support it. So I think this issue can be closed? |
Stop the world and start the world are so expensive that goroutine sampling cannot be enabled continuously online. If we can use ebpf to uprobe goroutine status change and then extract the trace id from the request (the offset is fixed), we can analyze the time spent on offcpu (block or syscall) for each request. |
Are you sure? It should not be too expensive.
uprobe's are pretty expensive (2-3usec). Do you have benchmarks? |
this would only supported in golang 1.23+。 but if we support it as a go module which could be imported. fgprof could be enabled in lower go version. |
benchmark. run benchmark on src/encoding/json:
bench coderecord all goroutine trace back with 100hz.
bench resultbenchstat:
from benchmark, seems the stop and start the world have small overhead for program. But a running goroutine needs to be suspended before it can run again, which can be expensive in rpc services. |
I think you are mistakenly assuming that the goroutine profile does a STW while taking the stack trace of all goroutines. That has not been the case for a while now. See https://go-review.googlesource.com/c/go/+/387415 |
it always need stop the world, if we run goroutine profile per one minute, it is ok. but run it with 100hz is unacceptable. |
Can you provide an execution trace I'm also not sure how to read your benchmark. Is the "old" version without goroutine profiling and the "new" version with goroutine profiling? If yes, then it seems like the version with goroutine profiling is actually faster (!). |
The main cost of goroutine performance analysis comes from stop the world.
And each goroutine needs to traverse the stack through an expensive unwinder function.
We can optimize it by using fp traceback.
The main cost of goroutine performance analysis comes from stop the world.
And each goroutine needs to traverse the stack through an expensive unwinder function.
We can optimize it by using fp traceback. Even if we traverse thousands of goroutines, the cost is negligible, so the only overhead is to stop the world and restart the world.
see golang/go#66915
The text was updated successfully, but these errors were encountered: