-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] OOM kill because sudden memory increase and requests #3341
Comments
Haruka:
|
Might be: #2970 |
More snarkos OOM kill incidents: Incident Date: July 4, 2024 Incident Details Node IP: Node E
Node: Node A
Node: Node B
Effect on Validator(s) Incident Date: July 5, 2024 Incident Details Node IP: Node C
Effect on Validator(s) |
I'll be working on reimplementing #2970 tomorrow |
🐛 Bug Report
snarkOS has presented several OOM Kill incidents with the following characteristics:
Incident Date: June 30, 2024
Affected Service: snarkos Service
Nodes Affected: Node A, Node B, Node C
Summary
On June 30, 2024, the snarkos service on testnet beta client nodes Node A, Node B, and Node C experienced an outage due to the OOM (Out of Memory) killer being invoked around the same time on all three nodes.
Incident Details
Node A:
Node B:
Node C:
Effect on Validator(s)
Validator nodes were minimally affected by the client nodes going down, only experiencing a brief spike of 15% more memory usage than normal.
Incident Date: July 2, 2024
Affected Service: snarkos Service
Nodes Affected: Node D, Node A
Summary
On June 30, 2024, the snarkos service on testnet beta client nodes Node D, and Node A experienced an outage due to the OOM (Out of Memory) killer being invoked at different times.
Incident Details
Node D:
Node A:
Effect on Validator(s)
Validator nodes were minimally affected by the client nodes going down, only experiencing a brief spike of 7% more memory usage than normal.
Steps to Reproduce
Not known, but the blocks close to the last advanced by the clients should indicate what triggered the issue and lead to way to reproduce.
It's been mentioned that blocks 55840, 56108, 68201, 70928 have complex program deployments.
Expected Behavior
snarkOS controlling the amount that it requests and not letting the amount of complexity of deployments to dictate memory, so that also it does not becomes a security issue (like DDoS).
Your Environment
The text was updated successfully, but these errors were encountered: