[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
Life cycle of memory in Jemalloc: free -> dirty -> muzzy -> retain -> os.
Currently, in
jemalloc
, each CPU will have 4Arenas
in per-cpu mode, and the virtual memory of eachArena
is unlimited and will not be released ifconf retain=true
. (The reason for not returning virtual mem to os is to prevent the creation of a large number of virtual memory holes, which will lead to a decrease in virtual memory allocation performance.)In theory, the maximum virtual memory limit of BE is the number of cores × 4 × (physical memory), so when execute core dump, the generated core file is relatively large.
This is the current status of the process(starrocks_be) in a user's production environment.
Physical mem of machine is 256G, physical mem of be is 127G, virtual memory of be is 830G, so it will generate on core file of 830G, it's crazy.
Retain pages takes up 700G of memory here
So we add an interface to
jemalloc
and set all retain pages toDONTDUMP
using madvise before execute core dump.This can reduce the file size and prevent core dump from stuck long time to output core file.
Test:
Core: 8
Mem: 64G
Test Set: Tpch-100G
Test case 1: After running all the 22 test case of tcph in a single thread and then trigger core.
Test case 2: Running the sql
select count(b.l_orderkey), count(repeat(b.l_shipmode, 10)), count(b.l_suppkey), count(b.l_comment), count(b.l_quantity) from lineitem a join [shuffle] lineitem b on a.l_orderkey=b.l_orderkey;
multi times and then trigger core dump.Before the pr:
After the pr:
Usage example
Todo: Don't dump the mem page of dirty and muzzy.
What I'm doing:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #48796 done by [Mergify](https://mergify.com). ## Why I'm doing:
Life cycle of memory in Jemalloc: free -> dirty -> muzzy -> retain -> os.
Currently, in
jemalloc
, each CPU will have 4Arenas
in per-cpu mode, and the virtual memory of eachArena
is unlimited and will not be released ifconf retain=true
. (The reason for not returning virtual mem to os is to prevent the creation of a large number of virtual memory holes, which will lead to a decrease in virtual memory allocation performance.)In theory, the maximum virtual memory limit of BE is the number of cores × 4 × (physical memory), so when execute core dump, the generated core file is relatively large.
This is the current status of the process(starrocks_be) in a user's production environment.
Physical mem of machine is 256G, physical mem of be is 127G, virtual memory of be is 830G, so it will generate on core file of 830G, it's crazy.
Retain pages takes up 700G of memory here
So we add an interface to
jemalloc
and set all retain pages toDONTDUMP
using madvise before execute core dump.This can reduce the file size and prevent core dump from stuck long time to output core file.
Test:
Core: 8
Mem: 64G
Test Set: Tpch-100G
Test case 1: After running all the 22 test case of tcph in a single thread and then trigger core.
Test case 2: Running the sql
select count(b.l_orderkey), count(repeat(b.l_shipmode, 10)), count(b.l_suppkey), count(b.l_comment), count(b.l_quantity) from lineitem a join [shuffle] lineitem b on a.l_orderkey=b.l_orderkey;
multi times and then trigger core dump.Before the pr:
After the pr:
Usage example
Todo: Don't dump the mem page of dirty and muzzy.
What I'm doing:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist: