-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance on Autoscaling with distinct_hosts Constraint #797
Comments
Hi @DTTerastar 👋 I'm not sure if I fully understood the question, so I will try to answer as best as I can, but let me know if I missed anything. The autoscaler doesn't take any job constraint into consideration, it will only affect the It sounds like you want to have a 1:1 match between the number of allocations of a job and the number of clients in your cluster? If that's the case then you need two components:
Let's say that you will control the number of allocations manually, and so you want to have an equal number of clients. You can accomplish this with a policy like so: scaling "match_job" {
enabled = true
min = 1
max = 5
policy {
cooldown = "2m"
evaluation_interval = "5m"
check "number_of_allocs" {
source = "prometheus"
query = "sum(nomad_nomad_job_summary_queued{exported_job=~\"example\"} + nomad_nomad_job_summary_running{exported_job=~\"example\"}) OR on() vector(0)"
strategy "pass-through" {}
}
target "aws-asg" {
dry-run = "false"
aws_asg_name = "hashistack-nomad_client"
node_class = "hashistack"
node_drain_deadline = "15m"
}
}
} Notice the So if you increase the job group You could also try to invert this, and control the number of clients using the job job "example" {
constraint {
operator = "distinct_hosts"
value = "true"
}
group "cache" {
count = 3
scaling {
min = 1
max = 5
policy {
cooldown = "2m"
evaluation_interval = "10s"
check "number_of_clients" {
source = "prometheus"
query = "count(nomad_client_allocations_running)"
strategy "pass-through" {}
}
}
}
task "redis" {
driver = "docker"
config {
image = "redis:7"
ports = ["db"]
}
}
}
} Unfortunately this doesn't work as well because the group policy will not be able to take into consideration the number of queued allocations. So you will be able to scale up the number of clients, but not down 😅 So the general idea is to try to isolate cause-and-effect. Instead of trying to figure out a way to execute multiple actions focus instead on simple "when X happens do Y". In the example above, "when the number of running and queue allocations go up/down, add/remove clients from the cluster". Checkout this tutorial for more details: https://developer.hashicorp.com/nomad/tutorials/autoscaler/horizontal-cluster-scaling-on-demand-batch. I hope this helps! |
Stumbled upon this myself. However it can be partially mitigated if the cluster has other jobs and only some of them need to be on separate hosts. So nomad will kick off some allocations that are more lenient in their constraints and make room for those that have the constriant. Would it make sense to expose these |
I keep seeing so many metrics we need but for some reason the architecture of nomad makes proper metrics difficult to implement. |
I am seeking advice on the best practices for using the Nomad Autoscaler in scenarios where the distinct_hosts constraint is applied in job configurations. Specifically, I'd like to understand how to effectively scale up a Nomad cluster using the autoscaler when each job instance must be placed on a separate host.
My primary concern is ensuring that the autoscaler responds appropriately to the unique requirements imposed by the distinct_hosts constraint. For instance, if a job is configured to launch instances across different hosts, how can the autoscaler be configured to ensure there are enough hosts in the cluster to accommodate scaling actions?
Any insights, recommendations, or examples of similar configurations would be greatly appreciated.
Thank you for your assistance and for the great work on Nomad Autoscaler.
The text was updated successfully, but these errors were encountered: