-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memmgr performance bug caused by high number of free vmas #2033
Comments
@adrianlut: Hi, would you like to contribute that change? If not they I can do that and reference you in the commit for the idea ;) |
Sure, I'm happy to contribute. I'll create a pull request. |
Wow, great find, @adrianlut. I'm surprised we haven't noticed this before. By the way, there is another PR that may be of interest to you (I wrote it): #1795 |
Hi @dimakuv, thank you! I guess you haven't noticed this because having millions of vmas in the free list is rather uncommon, isn't it? Also, thank you for the pointer to your PR. This could potentially further decrease the Gramine performance penalty for Hyrise. Although I think that the bookkeeping operations in memory management are mostly write operations. I'll give it a try if I find the time. |
Fixes the memmgr performance bug described in gramineproject#2033 by changing the `CHECK_LIST_HEAD` macro in `common/include/list.h` in release mode to `(void)0`. Signed-off-by: Adrian Lutsch <[email protected]>
I finally came arround to work on this. I again thought about solution options and now I am wondering if just The main problem I see with this solution is the fact that it still runs the check in One option I see is only including the check in Another option would be to create a special "sanity check" compile option that is deactivated by default and could be activated for testing purposes, but this increases the scope of fixing this simple bug a lot. |
I think this is intended (or at least not a big problem), |
Fixes the memmgr performance bug described in gramineproject#2033 by changing the `CHECK_LIST_HEAD` macro in `common/include/list.h` in release mode to `(void)0`. Signed-off-by: Adrian Lutsch <[email protected]>
Description of the problem
While benchmarking the Hyrise DBMS inside Gramine, a student and I found the following performance issue: Hyrise regularly allocates a lot of memory with small
mmap
calls and then callsmunmap
on them. This creates a lot of freevma
s in thememmgr.h
free list (possibly millions).The problem occurs due to the
CHECK_LIST_HEAD
macro inlist.h
, which is called in bothfree_mem_obj_to_mgr()
andget_mem_obj_from_mgr_enlarge()
(i.e. on most interactions with thememmgr
). It traverses the whole free list to check its correctness while thememmgr
is locked, blocking the whole memory system for other threads. If the free list contains millions of entries and calls tommap
are common, this prevents any useful work from happening.From looking at the
CHECK_LIST_HEAD
macro, I think this is not intended to happen in release mode since the asserts in the loop are then replaced with(void)0
. However, our performance tests show that the issue occurs both in debug and release mode. I guess that although the loop does not contain useful work, it is not removed by the compiler.Steps to reproduce
(I will follow up with code to reproduce the issue in the coming days if necessary)
Alternative: Write a micro-benchmark
mmap
andmunmap
and measure the required time as baselinememmgr
free list withmmap
mmap
andmunmap
and measure their time with a long free list.I guess the thread-local vma cache will probably interfere with such a simple benchmark design, but the goal should be clear.
Expected results
Throughput of Hyrise benchmark when running with Gramine > 50% of throughput running without Gramine
Alternative micro-benchmark result: latency of
mmap
andmunmap
is independent of free list length.Actual results
Throughput of Hyrise when running with Gramine is approximately 3% of throughput running without Gramine. According to the included perf functionality of Gramine debug builds, 80 % of runtime is spent in memory management/bookkeeping functions.
Alternative micro-benchmark result: latency of
mmap
andmunmap
depends on free list length.Gramine commit hash
91c90b4
The text was updated successfully, but these errors were encountered: