[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252

mergify · 2024-07-31T11:57:52Z

Why I'm doing:

Life cycle of memory in Jemalloc: free -> dirty -> muzzy -> retain -> os.

Currently, in jemalloc, each CPU will have 4 Arenas in per-cpu mode, and the virtual memory of each Arena is unlimited and will not be released if conf retain=true. (The reason for not returning virtual mem to os is to prevent the creation of a large number of virtual memory holes, which will lead to a decrease in virtual memory allocation performance.)

In theory, the maximum virtual memory limit of BE is the number of cores × 4 × (physical memory), so when execute core dump, the generated core file is relatively large.

This is the current status of the process(starrocks_be) in a user's production environment.

Physical mem of machine is 256G, physical mem of be is 127G, virtual memory of be is 830G, so it will generate on core file of 830G, it's crazy.

Retain pages takes up 700G of memory here

starrocks_be_jemalloc_active_bytes 134848319488
starrocks_be_jemalloc_allocated_bytes 126967526544
starrocks_be_jemalloc_mapped_bytes 142722424832
starrocks_be_jemalloc_metadata_bytes 3599567424
starrocks_be_jemalloc_metadata_thp 0
starrocks_be_jemalloc_resident_bytes 141923610624
starrocks_be_jemalloc_retained_bytes 700186402816 (700G)

So we add an interface to jemalloc and set all retain pages to DONTDUMP using madvise before execute core dump.

This can reduce the file size and prevent core dump from stuck long time to output core file.

Test:

Core: 8
Mem: 64G
Test Set: Tpch-100G
Test case 1: After running all the 22 test case of tcph in a single thread and then trigger core.
Test case 2: Running the sql select count(b.l_orderkey), count(repeat(b.l_shipmode, 10)), count(b.l_suppkey), count(b.l_comment), count(b.l_quantity) from lineitem a join [shuffle] lineitem b on a.l_orderkey=b.l_orderkey; multi times and then trigger core dump.

Before the pr:

Core file size of test case 1: 28G
Core file size of test case 2: 77G

After the pr:

Core file size of test case 1: 5G
Core file size of test case 2: 4G

Usage example

Set retain pages of all arena to nodump:

        std::string str = "arena." + std::to_string(MALLCTL_ARENAS_ALL) + ".dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);

Set retain pages of arena 1 to nodump:

        std::string str = "arena.1.dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);

Todo: Don't dump the mem page of dirty and muzzy.

What I'm doing:

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

This is an automatic backport of pull request #48796 done by [Mergify](https://mergify.com). ## Why I'm doing:

Life cycle of memory in Jemalloc: free -> dirty -> muzzy -> retain -> os.

Currently, in jemalloc, each CPU will have 4 Arenas in per-cpu mode, and the virtual memory of each Arena is unlimited and will not be released if conf retain=true. (The reason for not returning virtual mem to os is to prevent the creation of a large number of virtual memory holes, which will lead to a decrease in virtual memory allocation performance.)

In theory, the maximum virtual memory limit of BE is the number of cores × 4 × (physical memory), so when execute core dump, the generated core file is relatively large.

This is the current status of the process(starrocks_be) in a user's production environment.

Physical mem of machine is 256G, physical mem of be is 127G, virtual memory of be is 830G, so it will generate on core file of 830G, it's crazy.

Retain pages takes up 700G of memory here

starrocks_be_jemalloc_active_bytes 134848319488
starrocks_be_jemalloc_allocated_bytes 126967526544
starrocks_be_jemalloc_mapped_bytes 142722424832
starrocks_be_jemalloc_metadata_bytes 3599567424
starrocks_be_jemalloc_metadata_thp 0
starrocks_be_jemalloc_resident_bytes 141923610624
starrocks_be_jemalloc_retained_bytes 700186402816 (700G)

So we add an interface to jemalloc and set all retain pages to DONTDUMP using madvise before execute core dump.

This can reduce the file size and prevent core dump from stuck long time to output core file.

Test:

Core: 8
Mem: 64G
Test Set: Tpch-100G
Test case 1: After running all the 22 test case of tcph in a single thread and then trigger core.
Test case 2: Running the sql select count(b.l_orderkey), count(repeat(b.l_shipmode, 10)), count(b.l_suppkey), count(b.l_comment), count(b.l_quantity) from lineitem a join [shuffle] lineitem b on a.l_orderkey=b.l_orderkey; multi times and then trigger core dump.

Before the pr:

Core file size of test case 1: 28G
Core file size of test case 2: 77G

After the pr:

Core file size of test case 1: 5G
Core file size of test case 2: 4G

Usage example

Set retain pages of all arena to nodump:

        std::string str = "arena." + std::to_string(MALLCTL_ARENAS_ALL) + ".dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);

Set retain pages of arena 1 to nodump:

        std::string str = "arena.1.dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);

Todo: Don't dump the mem page of dirty and muzzy.

What I'm doing:

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

…n pages to not dump (#48796) Signed-off-by: trueeyu <[email protected]> (cherry picked from commit 2d4cf3a)

[Enhancement] Mallctl of jemalloc support setting the memory of retai…

e94f20d

…n pages to not dump (#48796) Signed-off-by: trueeyu <[email protected]> (cherry picked from commit 2d4cf3a)

mergify bot mentioned this pull request Jul 31, 2024

[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump #48796

Merged

24 tasks

github-actions bot assigned trueeyu Jul 31, 2024

github-actions bot added the automerge label Jul 31, 2024

wanpengfei-git enabled auto-merge (squash) July 31, 2024 11:58

trueeyu approved these changes Aug 1, 2024

View reviewed changes

wanpengfei-git merged commit 9d93966 into branch-3.3 Aug 8, 2024
36 checks passed

wanpengfei-git deleted the mergify/bp/branch-3.3/pr-48796 branch August 8, 2024 07:08

github-actions bot added the version:3.3.3 label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252

[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252

mergify bot commented Jul 31, 2024 •

edited by wanpengfei-git

Loading

[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252

[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252

Conversation

mergify bot commented Jul 31, 2024 • edited by wanpengfei-git Loading

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

What I'm doing:

What type of PR is this:

Checklist:

mergify bot commented Jul 31, 2024 •

edited by wanpengfei-git

Loading