Guidance on Autoscaling with distinct_hosts Constraint #797

DTTerastar · 2023-12-11T17:50:44Z

I am seeking advice on the best practices for using the Nomad Autoscaler in scenarios where the distinct_hosts constraint is applied in job configurations. Specifically, I'd like to understand how to effectively scale up a Nomad cluster using the autoscaler when each job instance must be placed on a separate host.

My primary concern is ensuring that the autoscaler responds appropriately to the unique requirements imposed by the distinct_hosts constraint. For instance, if a job is configured to launch instances across different hosts, how can the autoscaler be configured to ensure there are enough hosts in the cluster to accommodate scaling actions?

Any insights, recommendations, or examples of similar configurations would be greatly appreciated.

Thank you for your assistance and for the great work on Nomad Autoscaler.

lgfa29 · 2023-12-22T02:41:05Z

Hi @DTTerastar 👋

I'm not sure if I fully understood the question, so I will try to answer as best as I can, but let me know if I missed anything.

The autoscaler doesn't take any job constraint into consideration, it will only affect the count value of a group (in the case of horizontal application scaling) or the number of instances in a cluster group (like an AWS ASG, GCP MIG, Azure VMSS etc.).

It sounds like you want to have a 1:1 match between the number of allocations of a job and the number of clients in your cluster?

If that's the case then you need two components:

A query that returns one of the numbers.
A policy that uses the pass-through strategy.

Let's say that you will control the number of allocations manually, and so you want to have an equal number of clients. You can accomplish this with a policy like so:

scaling "match_job" {
  enabled = true
  min     = 1
  max     = 5

  policy {
    cooldown            = "2m"
    evaluation_interval = "5m"

    check "number_of_allocs" {
      source = "prometheus"
      query  = "sum(nomad_nomad_job_summary_queued{exported_job=~\"example\"} + nomad_nomad_job_summary_running{exported_job=~\"example\"}) OR on() vector(0)"

      strategy "pass-through" {}
    }

    target "aws-asg" {
      dry-run             = "false"
      aws_asg_name        = "hashistack-nomad_client"
      node_class          = "hashistack"
      node_drain_deadline = "15m"
    }
  }
}

Notice the query value. It's using the job_summary to sum the number of allocations running and those that are queued for a job called example. The queued allocations are those that Nomad tried to place but failed due to the distinct_host constraint. Using the pass-through strategy, we instruct the autoscaler to always match this sum.

So if you increase the job group count to a value higher than the current number of clients in your cluster, these unplaced allocations will increase the queued counter, triggering the autoscaler to create a new node. Similarly, if you reduce count to a value below the number of clients you have, the autoscaler will remove nodes.

Here's how that would look :

You could also try to invert this, and control the number of clients using the job count value. You would then apply the policy to the job:

job "example" {
  constraint {
    operator = "distinct_hosts"
    value    = "true"
  }

  group "cache" {
    count = 3

    scaling {
      min = 1
      max = 5


      policy {
        cooldown            = "2m"
        evaluation_interval = "10s"

        check "number_of_clients" {
          source = "prometheus"
          query  = "count(nomad_client_allocations_running)"

          strategy "pass-through" {}
        }
      }
    }

    task "redis" {
      driver = "docker"

      config {
        image = "redis:7"
        ports = ["db"]
      }
    }
  }
}

Unfortunately this doesn't work as well because the group policy will not be able to take into consideration the number of queued allocations. So you will be able to scale up the number of clients, but not down 😅

So the general idea is to try to isolate cause-and-effect. Instead of trying to figure out a way to execute multiple actions focus instead on simple "when X happens do Y". In the example above, "when the number of running and queue allocations go up/down, add/remove clients from the cluster".

Checkout this tutorial for more details: https://developer.hashicorp.com/nomad/tutorials/autoscaler/horizontal-cluster-scaling-on-demand-batch.

I hope this helps!

Blefish · 2024-04-08T11:05:37Z

Stumbled upon this myself. However it can be partially mitigated if the cluster has other jobs and only some of them need to be on separate hosts. So nomad will kick off some allocations that are more lenient in their constraints and make room for those that have the constriant.

Would it make sense to expose these distinct_hosts allocations via some kind of metrics from Nomad?
Similarly to current metrics for cpu & memory blocked evals, such as nomad_nomad_blocked_evals_distinct_hosts?

DTTerastar · 2024-05-02T13:34:31Z

I keep seeing so many metrics we need but for some reason the architecture of nomad makes proper metrics difficult to implement.

lgfa29 added stage/waiting-reply theme/policy Policy source, parsing and validation type/question labels Dec 22, 2023

lgfa29 self-assigned this Dec 22, 2023

lgfa29 mentioned this issue Dec 22, 2023

Set job group count to the number of nodes in a node pool #681

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance on Autoscaling with distinct_hosts Constraint #797

Guidance on Autoscaling with distinct_hosts Constraint #797

DTTerastar commented Dec 11, 2023

lgfa29 commented Dec 22, 2023

Blefish commented Apr 8, 2024

DTTerastar commented May 2, 2024

Guidance on Autoscaling with distinct_hosts Constraint #797

Guidance on Autoscaling with distinct_hosts Constraint #797

Comments

DTTerastar commented Dec 11, 2023

lgfa29 commented Dec 22, 2023

Blefish commented Apr 8, 2024

DTTerastar commented May 2, 2024