-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
irmin-pack: max memory usage 3x, main vs 3.3.1 #1988
Comments
Some initial candidates for increase in max memory usage
|
A comparison using tree.exe benchmark, comparing main to 3.3.1, shows minor memory usage difference:
|
The size of the mapping file, at least with this particular test run on main, is only 1.7MB; so this is not a source of significant memory usage. |
The following gives the machine setups for the benchmarks from #1959. good-gc runs were using ocaml 4.12.1 whereas the others used 4.14. Also gc.space_overhead for good-gc was 80, whereas it was 120 for the other machines. So we can't just compare these stats directly.
The OCaml manual has this for gc.space_overhead: The major GC speed is computed from this parameter. This is the memory that will be "wasted" because the GC does not immediately collect unreachable blocks. It is expressed as a percentage of the memory used for live data. The GC will work more (use more CPU time and collect blocks more eagerly) if space_overhead is smaller. Default: 120 So if we just look at the number, it seems that good-gc should have a more aggressive GC, and so should ideally use less memory. However, in OCaml 4.13.0, GC was changed to best-fit, and various other GC improvements were made. So, again, we can't just compare good-gc directly to the others, when considering max memory usage. |
Another comparison: tree benchmark, with gc, irmin main, but ocaml 4.12 vs 4.14. Max memory usage was 1.335G in 4.12, 1.212G in 4.14 (so not much difference). Aside: Wall time was 13m28s vs 11m01s, so 4.14 seems about 20% quicker on this benchmark. |
At this point, it seems that the tree benchmarks do not exhibit 3x memory usage (at least in the runs detailed above). So investigation should continue with the lib_context benchmarks instead. |
Something else we should be careful at, is that I think the max memory usage reported by all our benchmarks are for the main process. I started a bench which record the maxrss (Mb) for the Gc child process at the beginning and at end of Gc.run:
the begin value corresponds approx to the maxrss of the parent process. |
Are you saying that the main process uses 133MB maxrss at the point the child is forked, and that the child process then gets to 2442MB? The child process is supposed to use almost no memory at all (this was part of the design requirements, and was satisfied I believe in the previous version of the code). Do you think it is possible the new "reachable" calculation has changed the memory usage? (I think there is a hashtable there that perhaps grows very large?) |
Memory in the main process is now in a good spot. See #1959 (comment) Tracking GC stats for benchmarking here #2046 |
This issue is based on #1959, but focusing solely on the increased memory usage. From the metrics listed in that issue:
For good-gc, the "max memory usage (bytes)" is increased roughly 3x compared to 3.3.1. We should identify the cause.
The text was updated successfully, but these errors were encountered: