Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM while running LDR under heavy load with stats disabled #134445

Open
benbardin opened this issue Nov 6, 2024 · 1 comment
Open

OOM while running LDR under heavy load with stats disabled #134445

benbardin opened this issue Nov 6, 2024 · 1 comment
Labels
A-disaster-recovery branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA T-disaster-recovery

Comments

@benbardin
Copy link
Collaborator

benbardin commented Nov 6, 2024

We observed OOM events twice on SHA v24.3.0-alpha.3-dev-b908a2f2bc3af2b1529b58e8242a37cfa6f6c1ca. These OOMs were not observed on the 24.3 release branch, just on master.
cockroach-health.glenn-ldr-east-0003.ubuntu.2024-10-31T23_59_12Z.009606.log.zip

On the first event, we have a 5-second time-limited heap profile. Total heap created is small for the window, but we can see ~40GB allocated and removed in that window. That's a lot.
pprof_1730431585.out.zip
cockroach-health.glenn-ldr-east-0003.ubuntu.2024-10-30T22_05_07Z.009606.log.zip

On the second even, we have a full heap profile. 10GB are accounted for, but we can tell from runtime stats that some ~20GB are not. This suggests objects that are unreachable but have not yet been garbage collected, or that we're seeing extreme sampling effects on the heap profile.

cockroach-health.log.zip
profile-full.out.zip

The various profiles don't appear to show memory leaks, or other unusual usage. Rather, we suspect generic memory-management of multiple concurrent full-table scans experienced a regression between 24.3 branch cut and the above SHA.

Jira issue: CRDB-44081

@benbardin benbardin added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery branch-master Failures and bugs on the master branch. T-disaster-recovery labels Nov 6, 2024
Copy link

blathers-crl bot commented Nov 6, 2024

cc @cockroachdb/disaster-recovery

@exalate-issue-sync exalate-issue-sync bot added the P-3 Issues/test failures with no fix SLA label Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-master Failures and bugs on the master branch. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. P-3 Issues/test failures with no fix SLA T-disaster-recovery
Projects
None yet
Development

No branches or pull requests

1 participant