Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2061]fix(server): Fix netty memory leak when removeBuffer and cacheShuffle… #2059

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

lwllvyb
Copy link
Contributor

@lwllvyb lwllvyb commented Aug 19, 2024

What changes were proposed in this pull request?

Fix netty memory leak when removeBuffer and cacheShuffleData happen concurrent

Why are the changes needed?

(Please clarify why the changes are needed. For instance,

  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, describe the bug.)

Fix: #2061

Does this PR introduce any user-facing change?

(Please list the user-facing changes introduced by your change, including

  1. Change in user-facing APIs.
  2. Addition or removal of property keys.)

No.

How was this patch tested?

(Please test your changes, and provide instructions on how to test it:

  1. If you add a feature or fix a bug, add a test to cover your changes.
  2. If you fix a flaky test, repeat it for many times to prove it works.)

Copy link

github-actions bot commented Aug 19, 2024

Test Results

 2 895 files   - 31   2 895 suites   - 31   5h 48m 30s ⏱️ - 14m 54s
 1 041 tests ± 0   1 038 ✅  -  1   2 💤 ±0  1 ❌ +1 
12 943 runs   - 60  12 912 ✅  - 61  30 💤 ±0  1 ❌ +1 

For more details on these failures, see this check.

Results for commit 8926722. ± Comparison against base commit 78fe934.

♻️ This comment has been updated with latest results.

@lwllvyb lwllvyb changed the title fix(server): Fix netty memory leak when removeBuffer and cacheShuffle… [#2061]fix(server): Fix netty memory leak when removeBuffer and cacheShuffle… Aug 19, 2024
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you help describe more why this(remove/cache buffer) will happen at the same time?

@lwllvyb
Copy link
Contributor Author

lwllvyb commented Aug 26, 2024

Could you help describe more why this(remove/cache buffer) will happen at the same time?
When i test the configuration "rss.server.huge-partition.size.hard.limit", the test failed because the partition was oversized.
(in chronological order)

  1. In function ShuffleBufferManager@cacheShuffleData, getShuffleBufferEntry be called and return success.
  2. Function removeResources be triggered and appId be removed from bufferPool.
  3. In function ShuffleBufferManager@cacheShuffleData, buffer.append be called and the ByteBuf be added to the buffer. This ByteBuf will never be released.

@@ -110,7 +111,8 @@ public int getBlockCount() {
}

@Override
public void release() {
public synchronized void release() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think close is more appropriate than release. The close means that we has closed this and cannot be used again. Release just means that we releases the resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will to create a new PR to do this if necessary.

@lwllvyb lwllvyb force-pushed the memoryLeak02 branch 2 times, most recently from 30229b5 to 5357372 Compare October 10, 2024 07:54
@lwllvyb
Copy link
Contributor Author

lwllvyb commented Oct 10, 2024

image The failed test case is not related to the PR.

…Data happen concurrent (merge request !278)
@lwllvyb
Copy link
Contributor Author

lwllvyb commented Oct 15, 2024

Ping @jerqi @rickyma @zhengchenyu
Could you please take some time to review this PR? If it doesn't need to be merged, please let me know.

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Sorry for the later reply, I missed this review.

@zuston zuston merged commit aab887d into apache:master Oct 15, 2024
40 of 43 checks passed
@zuston
Copy link
Member

zuston commented Oct 15, 2024

BTW, If I missed somthing, feel free to ping me @lwllvyb or by wechat

xianjingfeng pushed a commit that referenced this pull request Oct 15, 2024
### What changes were proposed in this pull request?

### Why are the changes needed?
Fix: #2061

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
CI
maobaolong pushed a commit to maobaolong/incubator-uniffle that referenced this pull request Nov 4, 2024
…fer and cacheShuffleData happen at the same time (apache#2059)

### What changes were proposed in this pull request?

 Fix netty memory leak when removeBuffer and cacheShuffleData happen concurrent

### Why are the changes needed?

Fix: apache#2061

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests

---

Co-authored-by: wenlongwlli <[email protected]>
(cherry picked from commit aab887d)
maobaolong pushed a commit to maobaolong/incubator-uniffle that referenced this pull request Nov 4, 2024
### What changes were proposed in this pull request?

### Why are the changes needed?
Fix: apache#2061

### Does this PR introduce any user-facing change?
No.

### How was this patch tested?
CI

(cherry picked from commit bb06eeb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] netty memory leak when removeBuffer and cacheShuffleData happen concurrent
3 participants