Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BalanceReplicas called even when PopulatePodsOnScaleUp and VacatePodsOnScaleDown are false #724

Open
sigram opened this issue Sep 23, 2024 · 3 comments

Comments

@sigram
Copy link

sigram commented Sep 23, 2024

When both of these options are set to false (in an environment where replica management is handled externally, not using Solr placement plugins) the natural expectation is that the Solr Operator will not initiate any replica movements on scale-up / scale-down or rolling updates.

However, this is not the case in 0.8.1. When Solr Operator discovers that this API is available it will call /api/cluster/replicas/balance in quite a few scenarios, causing replica movements even when both of the above options are disabled.

I can work around this by implementing a custom PlacementPluginFactory but I think that this behavior should be optional, with an option to turn it on when desired. And the situations when BalanceReplicas is called should be better documented (listing cases when and how often this API is called). I think that the comments in BalanceReplicasForCluster about re-tries on errors and async operations should be included in the main documentation, because they can result in potentially resource-intensive operations.

@HoustonPutman
Copy link
Contributor

It definitely calls it when doing a rolling restart of an ephemeral collection, but that's because we are moving replicas around anyways. We can definitely document that better, but I'm not sure I'd want to remove that feature.

Can you describe the other scenarios in which its called?

@sigram
Copy link
Author

sigram commented Sep 30, 2024

If I'm reading it right, this section solr_cluster_ops_util.determineScaleClusterOpLockIfNecessary:200 always calls Balancing if a scaleDown fails for some reason, even if vacate/populate is false, regardless of the storage options.

@HoustonPutman
Copy link
Contributor

Ahhhh ok then yeah certainly that's a bug. Should be an easy enough fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants