Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting sysctl parameters on nodes #208

Closed
johananl opened this issue Jul 26, 2019 · 14 comments
Closed

Setting sysctl parameters on nodes #208

johananl opened this issue Jul 26, 2019 · 14 comments

Comments

@johananl
Copy link
Contributor

johananl commented Jul 26, 2019

This issue is partially discussed in #116.

General

At the moment the Go version of the operator sets vm.max_map_count on worker nodes by running a privileged init container:

func newSysctlLimitsContainer(cdc *cassandraoperatorv1alpha1.CassandraDataCenter) *corev1.Container {
return &corev1.Container{
Name: "sysctl-limits",
Image: cdc.Spec.CassandraImage,
ImagePullPolicy: cdc.Spec.ImagePullPolicy,
SecurityContext: &corev1.SecurityContext{
Privileged: func() *bool { b := true; return &b }(),
},
Command: []string{"bash", "-xuec"},
Args: []string{
`sysctl -w vm.max_map_count=1048575 || true`,
},
}
}

Doing this by default can be unexpected for a user as this is a host-level setting which could potentially affect other pods on the relevant nodes. In addition, some environments may not allow running privileged containers, which can make deployments break.

The examples seem to hint this is an optional feature, however this field isn't included in the CassandraDataCenter API, which means it is ignored by the operator.

Suggested Changes

Assuming this setting is indeed necessary, I would consider the following:

  • Make this setting optional and off by default.
  • Mention in the docs that Cassandra works best when setting this option, and let the user decide whether they want to ignore it, use the provided init container or set the option at the infrastructure level.
  • Rename the option from privilegedSupported to something which better describes what it does. Example: optimizeSysctl.

Implementing the suggestions above would make less assumptions on the user's environment and avoid taking decisions on behalf of the user which may exceed the responsibility of this operator.

@zegelin
Copy link
Contributor

zegelin commented Jul 26, 2019

privilegedSupported being missing on the Go side is indeed a regression. @alourie we need to add this.

Agree with it being off by default to avoid surprises.

Also agree that perhaps privilegedSupported isn't the best name, and that something like optimizeSysctl or tuneSysctls would be better.

The original name is derived from that fact that to call sysctl the security context Privileged flag needs to be set to true.

@zegelin
Copy link
Contributor

zegelin commented Jul 26, 2019

Agree with it being off by default to avoid surprises.

Actually, I'd like to open this for discussion with @benbromhead @smiklosovic

Cassandra really likes to create a large number of memory mappings, and doesn't handle things gracefully when this fails.

I can see a few options:

  • Default to true. This results in Pod creation failure on clusters that don't support privileged containers. This gives upfront discovery of the problem, rather than having C* be unstable down the road. When supported, you get (potentially) higher performance clusters. The downside is that things may not work out-of-the-box, and we may end up with frequent issues being raised (or "me too"'ed) around this problem.

  • Default to false, but also disable memory-mapped file access via the various switches in cassandra.yaml. This gives a stable and working out-of-the-box solution. The downside of this is that C* would be, by default, running with (potentially) limited performance, with users potentially being clueless that they're running in limp-along mode.

  • Default to false, and just let C* burn baby.

@johananl
Copy link
Contributor Author

johananl commented Jul 29, 2019

I'm not a C* expert but FWIW, I'm for option 2 (default to false, memory-hungry features off, warn the users they should tweak settings for production). My rationale:

  • It is very common for OSS projects to give something that just works by default, with proper warnings in the docs (and maybe in the logs, too) regarding production readiness.
  • There is a good chance that a large portion of project visitors just want to experiment and not run a production deployment yet. We should give them a good experience.
  • Users who want a production deployment are more "committed" and will probably be fine with going through some steps to enhance performance. Forcing every user to worry about PSPs, kubelet flags and whatnot can turn people away from the project.
  • It's probably better to warn the user about potential pitfalls than take a decision for them which could potentially affect their infrastructure negatively. IMO the risk of a low-performing C* is less problematic than the risk of disrupting something else on the user's cluster.

@benbromhead
Copy link
Collaborator

I think the defaults should prioritize a stable, easy initial experience. So happy to leave it set to false and C* configured however it needs to be to make it work better.

As long as we have comments in the example yaml files and helm packages saying it should be turned on in production, maybe a WARN in the operator logs and a production documentation page.

alourie added a commit to alourie/cassandra-operator that referenced this issue Aug 1, 2019
Fixes instaclustr#213
Fixes instaclustr#208

* Fixed some comments from PR review
* Added Env to CRD to allow specifying environment for containers
(exists in java version)
* Added userConfigMap handling (instaclustr#213)
* Restored PrivelegedSupported handling (instaclustr#208)
* Backup secret volume allows providing GOOGLE_APPLICATION_CREDENTIALS
in a secret
* Cloud providers creds can be set using Env field in CRD

Signed-off-by: Alex Lourie <[email protected]>
@alourie
Copy link
Contributor

alourie commented Aug 6, 2019

@zegelin @benbromhead so, if it's disabled (by default), what corresponding params should/shouldn't be in cassandra yaml? I can do it together within #218

benbromhead pushed a commit that referenced this issue Aug 21, 2019
…andra and sidecar (#218)

* Initial work for clouds secrets and user-defined configmap

Signed-off-by: Alex Lourie <[email protected]>

* PR comments fixes

Fixes #213
Fixes #208

* Fixed some comments from PR review
* Added Env to CRD to allow specifying environment for containers
(exists in java version)
* Added userConfigMap handling (#213)
* Restored PrivelegedSupported handling (#208)
* Backup secret volume allows providing GOOGLE_APPLICATION_CREDENTIALS
in a secret
* Cloud providers creds can be set using Env field in CRD

Signed-off-by: Alex Lourie <[email protected]>

* Support TLS certificates for internal communication

Signed-off-by: Alex Lourie <[email protected]>

* PR comments, docs updates

Signed-off-by: Alex Lourie <[email protected]>

* Cleanups

Signed-off-by: Alex Lourie <[email protected]>

* Path naming update

Signed-off-by: Alex Lourie <[email protected]>

* Cleanups

* go fmt

Signed-off-by: Alex Lourie <[email protected]>
@zegelin
Copy link
Contributor

zegelin commented Aug 22, 2019

Is this finished?

@johananl
Copy link
Contributor Author

No, we are still creating C* pods with these capabilities:

SecurityContext: &corev1.SecurityContext{
Capabilities: &corev1.Capabilities{
Add: []corev1.Capability{
"IPC_LOCK", // C* wants to mlock/mlockall
"SYS_RESOURCE", // permit ulimit adjustments
},
},
},

I can create a PR to disable these but I am not sure which C* features to tune along with this change.

@smiklosovic
Copy link
Collaborator

smiklosovic commented Aug 27, 2019

@johananl

Based on the commend of @zegelin

Default to false, but also disable memory-mapped file access via the various switches in cassandra.yaml. This gives a stable and working out-of-the-box solution. The downside of this is that C* would be, by default, running with (potentially) limited performance, with users potentially being clueless that they're running in limp-along mode.

and @benbromhead

think the defaults should prioritize a stable, easy initial experience. So happy to leave it set to false and C* configured however it needs to be to make it work better.

Hence, we should set it to false and configure cassandra.yaml to disable memory-mapped files.

To my knowledge there is only one such flag: disk_access_mode. If not set to "mmap" (which would enable memory-mapped access) (1), it is set to "auto" by default, so we should be covered here.

If we do set it up to true via some CRD field / flag, we should set that to "mmap" in cassandra.yaml.

(1) https://www.oreilly.com/library/view/cassandra-high-performance/9781849515122/ch04s11.html

@johananl
Copy link
Contributor Author

@smiklosovic great, so I guess I can make the required changes in the operator, tighten up the PSPs, update the relevant C* config and open a PR. Then you folks can verify performance looks OK.

How does that sound?

@smiklosovic
Copy link
Collaborator

Sounds like a plan!

@johananl
Copy link
Contributor Author

@smiklosovic looks like we aren't setting disk_access_mode anywhere, unless I am missing something.

@smiklosovic
Copy link
Collaborator

smiklosovic commented Aug 27, 2019

no we apparently dont but we should (based on whether we use these capabilities or not), that is my understanding of that

I believe that disk_access_mode should be set to mmap only in case flag is "true", otherwise we should not do anything.

@zegelin
Copy link
Contributor

zegelin commented Aug 27, 2019 via email

@johananl
Copy link
Contributor Author

johananl commented Sep 4, 2019

Fixed in #269.

@johananl johananl closed this as completed Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants