Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Mallctl of jemalloc support setting the memory of retain pages to not dump (backport #48796) #49252

Merged
merged 1 commit into from
Aug 8, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jul 31, 2024

Why I'm doing:

Life cycle of memory in Jemalloc: free -> dirty -> muzzy -> retain -> os.

Currently, in jemalloc, each CPU will have 4 Arenas in per-cpu mode, and the virtual memory of each Arena is unlimited and will not be released if conf retain=true. (The reason for not returning virtual mem to os is to prevent the creation of a large number of virtual memory holes, which will lead to a decrease in virtual memory allocation performance.)

In theory, the maximum virtual memory limit of BE is the number of cores × 4 × (physical memory), so when execute core dump, the generated core file is relatively large.

This is the current status of the process(starrocks_be) in a user's production environment.

image

Physical mem of machine is 256G, physical mem of be is 127G, virtual memory of be is 830G, so it will generate on core file of 830G, it's crazy.

Retain pages takes up 700G of memory here

starrocks_be_jemalloc_active_bytes 134848319488
starrocks_be_jemalloc_allocated_bytes 126967526544
starrocks_be_jemalloc_mapped_bytes 142722424832
starrocks_be_jemalloc_metadata_bytes 3599567424
starrocks_be_jemalloc_metadata_thp 0
starrocks_be_jemalloc_resident_bytes 141923610624
starrocks_be_jemalloc_retained_bytes 700186402816 (700G)

So we add an interface to jemalloc and set all retain pages to DONTDUMP using madvise before execute core dump.

This can reduce the file size and prevent core dump from stuck long time to output core file.

Test:

Core: 8
Mem: 64G
Test Set: Tpch-100G
Test case 1: After running all the 22 test case of tcph in a single thread and then trigger core.
Test case 2: Running the sql select count(b.l_orderkey), count(repeat(b.l_shipmode, 10)), count(b.l_suppkey), count(b.l_comment), count(b.l_quantity) from lineitem a join [shuffle] lineitem b on a.l_orderkey=b.l_orderkey; multi times and then trigger core dump.

Before the pr:

  • Core file size of test case 1: 28G
  • Core file size of test case 2: 77G

After the pr:

  • Core file size of test case 1: 5G
  • Core file size of test case 2: 4G

Usage example

  1. Set retain pages of all arena to nodump:
        std::string str = "arena." + std::to_string(MALLCTL_ARENAS_ALL) + ".dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);
  1. Set retain pages of arena 1 to nodump:
        std::string str = "arena.1.dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);

Todo: Don't dump the mem page of dirty and muzzy.

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

This is an automatic backport of pull request #48796 done by [Mergify](https://mergify.com). ## Why I'm doing:

Life cycle of memory in Jemalloc: free -> dirty -> muzzy -> retain -> os.

Currently, in jemalloc, each CPU will have 4 Arenas in per-cpu mode, and the virtual memory of each Arena is unlimited and will not be released if conf retain=true. (The reason for not returning virtual mem to os is to prevent the creation of a large number of virtual memory holes, which will lead to a decrease in virtual memory allocation performance.)

In theory, the maximum virtual memory limit of BE is the number of cores × 4 × (physical memory), so when execute core dump, the generated core file is relatively large.

This is the current status of the process(starrocks_be) in a user's production environment.

image

Physical mem of machine is 256G, physical mem of be is 127G, virtual memory of be is 830G, so it will generate on core file of 830G, it's crazy.

Retain pages takes up 700G of memory here

starrocks_be_jemalloc_active_bytes 134848319488
starrocks_be_jemalloc_allocated_bytes 126967526544
starrocks_be_jemalloc_mapped_bytes 142722424832
starrocks_be_jemalloc_metadata_bytes 3599567424
starrocks_be_jemalloc_metadata_thp 0
starrocks_be_jemalloc_resident_bytes 141923610624
starrocks_be_jemalloc_retained_bytes 700186402816 (700G)

So we add an interface to jemalloc and set all retain pages to DONTDUMP using madvise before execute core dump.

This can reduce the file size and prevent core dump from stuck long time to output core file.

Test:

Core: 8
Mem: 64G
Test Set: Tpch-100G
Test case 1: After running all the 22 test case of tcph in a single thread and then trigger core.
Test case 2: Running the sql select count(b.l_orderkey), count(repeat(b.l_shipmode, 10)), count(b.l_suppkey), count(b.l_comment), count(b.l_quantity) from lineitem a join [shuffle] lineitem b on a.l_orderkey=b.l_orderkey; multi times and then trigger core dump.

Before the pr:

  • Core file size of test case 1: 28G
  • Core file size of test case 2: 77G

After the pr:

  • Core file size of test case 1: 5G
  • Core file size of test case 2: 4G

Usage example

  1. Set retain pages of all arena to nodump:
        std::string str = "arena." + std::to_string(MALLCTL_ARENAS_ALL) + ".dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);
  1. Set retain pages of arena 1 to nodump:
        std::string str = "arena.1.dontdump";
        je_mallctl(str.c_str(), NULL, NULL, NULL, 0);

Todo: Don't dump the mem page of dirty and muzzy.

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

…n pages to not dump (#48796)

Signed-off-by: trueeyu <[email protected]>
(cherry picked from commit 2d4cf3a)
@wanpengfei-git wanpengfei-git merged commit 9d93966 into branch-3.3 Aug 8, 2024
36 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-3.3/pr-48796 branch August 8, 2024 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants