Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing for multithreaded use #1550

Open
FelixKrayer opened this issue Aug 1, 2024 · 2 comments
Open

Timing for multithreaded use #1550

FelixKrayer opened this issue Aug 1, 2024 · 2 comments
Labels
bug parallel Parallel Goblint

Comments

@FelixKrayer
Copy link
Contributor

FelixKrayer commented Aug 1, 2024

While implementing parallel solvers for Goblint, it became apparent that the timing functions used in Goblint are not thread-safe. Thus, issues arise, when multiple domains running in parallel call the Timing.wrap function
The execution with more than one parallel Domain often fails because of an assertion error in the exit function in the module Make of goblint_timing.ml (line 134: assert (tree.name = name)). I assume the likely cause for that is that the current stack is not thread-safe.
We need a thread-safe way of Timing in Goblint.

Furthermore, it has to be decided how timing would even work with multiple parallel Domains.
When two Domains run in parallel and both provide a time value for a specific timed section, how would these time contributions be merged? Is it more reasonable to add these values or to use the largest value? Adding would more accurately show the work done, but would ignore the potential speedup through parallelization. Thus a final timing value generated by adding parallel times could potentially be longer than the whole run of Goblint.

@sim642 sim642 added bug parallel Parallel Goblint labels Aug 1, 2024
@sim642
Copy link
Member

sim642 commented Aug 1, 2024

Indeed, the big question is how it's even supposed to work. The current implementation (improved from CIL's Stats) requires well-nested enter and exit calls globally, which wrap from multiple domains does not guarantee.

Our timing measures both CPU time and wall time. Summing CPU times across domains would make sense because that's the total amount of work done by the CPU. Although currently we just look it up by Unix.times which is about the entire process. I'm not sure if there's any way to even get CPU time used per-domain.

Measurement of wall time itself is easy, but aggregation is hard, neither sum nor max makes sense. Since parallelism is only to be used in a specific place (not parsing, etc), it might simply have to be that there's wall time for the entire solving, but not its sub-timings which have been aggregated.

We also measure allocated memory using Gc.allocated_bytes, which I also don't know how it works with multiple domains.

It would be worth trying to find others (not necessarily OCaml) that try to profile multi-threaded programs and see if they've come up with anything better.

How urgent is this? Can we get by by just disabling timing for our multicore attempts? The minimum fix might be to make the current stack in the timing implementation use DLS.

@FelixKrayer
Copy link
Contributor Author

For now, it is only important that errors are avoided in a temporary fix (as you mentioned through DLS or disabling of the timing). We can get timings for benchmarks in other ways. For example, we can measure the time of the whole solving process which should not be an issue.
However, I believe that we should find a suitable solution before merging the parallelized solvers into the master branch. There are other Issues to be solved before a merge as well, e.g. the performance issues from Ocaml 5 #1137 since this version is a prerequisite for the Domains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parallel Parallel Goblint
Projects
None yet
Development

No branches or pull requests

2 participants