-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nftables] remediation component shutdowns after a failed response #369
Comments
@LaurenceJJones: Thanks for opening an issue, it is currently awaiting triage. In the meantime, you can:
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
@LaurenceJJones: There are no 'kind' label on this issue. You need a 'kind' label to start the triage process.
DetailsI am a bot created to help the crowdsecurity developers manage community feedback and contributions. You can check out my manifest file to understand my behavior and what I can do. If you want to use this for your project, you can check out the BirthdayResearch/oss-governance-bot repository. |
Hello! We met this trouble. Do you have any update about this trouble? |
UPDATE. This only happens if the bouncer is restarted. If the api does not respond when bouncer is running, bouncer tries to get new solutions and continues to work. One more question: Why does bouncer reset nftables set on restart? |
Yes, this is the current design, as if the remediation component doesn't get an initial connection, then it could be a bad configuration
We remove the set because it takes ten times more time to do an initial load if we have to check if each element already exists. So, to be more efficient, we remove the set and then reinstate it upon restart |
But if the host is under attack and clearing the nftables set can negatively affect the server. It is also not entirely clear, if bouncer clears the nftables set, why does it pull all decisions (also outdated) if the set is cleared? |
Yes, but this should only happen if you restart the service when under attack. As the service should be running for a long time unless there is a reason not to run it. Most likely, the way crowdsec sends decisions, bouncers don't have a direct influence on what they get sent unless it's filtered. There is no impact on performance. You just see an unesscary log line that's all |
If the host is under attack, then it is possible that free memory runs out and the OOM process can kill bouncer, so when restarting bouncer clears the table, thereby provoking even more load on the server. I think it's reasonable to add an option that allows you to compare the data received from the API instead of clearing the table when restarting |
Memory as a spike does not equal a memory leak it just means the api is handling the requests, and because it holds decisions in memory whilst it queries, then it will spike. We have a feature flag for streamed decisions it may help https://docs.crowdsec.net/docs/next/configuration/feature_flags#list-of-available-feature-flags If you can capture the memory leak via pprof, we look into it. https://docs.crowdsec.net/docs/next/observability/pprof I understand the OOM part, and we can improve this in the future, but currently, we have no resources to look at this, so contributions are welcome. |
/kind enhancement |
Should I enable this flag on the API server? Correct me if I'm wrong. Does this feature allow you to send decisions in a batch? |
Exactly, so instead of getting all decisions in memory, it will fetch X amount then write to stream, then fetch next batch and write to stream and so on and so on. It may become standard for next releases currently it behind a feature flag since we wanted to ensure stability but we have a large enterprise using it in production for over 2 minor releases with no issues reported from their side. |
And if I use MySQL as a database server, will it work for it too? |
Yes works for all databases |
What happened?
When the remediation component fails to connect to LAPI currently with nftables, the whole service comes down and flushes the nftables set
This is not what we want as the IP's currently within set are useful to the service.
What did you expect to happen?
Remediation component should allow for failures to connect to LAPI after the service has started, EG connect first if failed at startup then yes restart but after that should be resilient
How can we reproduce it (as minimally and precisely as possible)?
Bring up a LAPI and firewall remediation, currently user has reported if the response code > 500 the service comes down
Anything else we need to know?
No response
version
remediation component version:
crowdsec version
crowdsec version:
OS version
The text was updated successfully, but these errors were encountered: