Skip to content

Commit

Permalink
Add devices to NUMA section of CPU page
Browse files Browse the repository at this point in the history
  • Loading branch information
aimeeu committed Oct 2, 2024
1 parent 8ae7f21 commit 1e57122
Showing 1 changed file with 73 additions and 21 deletions.
94 changes: 73 additions & 21 deletions website/content/docs/concepts/cpu.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ page_title: CPU
description: Learn about how Nomad manages CPU resources.
---

# Modern Processors
# Modern processors

Every Nomad node has a Central Processing Unit (CPU) providing the computational
power needed for running operating system processes. Nomad uses the CPU to
Expand All @@ -24,7 +24,7 @@ topologies into account.

[![PE Cores](/img/nomad-pe-cores.png)](/img/nomad-pe-cores.png)

## Calculating CPU Resources
## Calculating CPU resources

The total CPU bandwidth of a Nomad node is the sum of the product between the
frequency of each core type and the total number of cores of that type in the
Expand Down Expand Up @@ -67,7 +67,7 @@ cpu.totalcompute = 56000
cpu.usablecompute = 56000
```

## Reserving CPU Resources
## Reserving CPU resources

In the fingerprinted node attributes, `cpu.totalcompute` indicates the total
amount of CPU bandwidth the processor is capable of delivering. In some cases it
Expand Down Expand Up @@ -137,7 +137,7 @@ task {
Nomad Enterprise supports NUMA aware scheduling, which enables operators to
more finely control which CPU cores may be reserved for tasks.

### CPU Hard Limits
### CPU hard limits

Some task drivers support the configuration option `cpu_hard_limit`. If enabled
this option restricts tasks from bursting above their CPU limit even when there
Expand All @@ -146,7 +146,7 @@ A task with too few CPU resources may operate fine until another task is placed
on the node causing a reduction in available CPU bandwidth, which could cause
disruption for the underprovisioned task.

### CPU Environment Variables
### CPU environment variables

To help tasks understand the resources available to them, Nomad sets the
following environment variables in their runtime environment.
Expand All @@ -162,18 +162,18 @@ NOMAD_CPU_CORES=3-5
NOMAD_CPU_LIMIT=9000
```

# NUMA
## NUMA

Nomad clients are commonly provisioned on real hardware in an on-premise
environment or in the cloud on large `.metal` instance types. In either case it
is likely the underlying server is designed around a [NUMA topology][numa_wiki].
environment or in the cloud on large `.metal` instance types. In either case, it
is likely the underlying server is designed around a [non-uniform memory access (NUMA) topology][numa_wiki].
Servers that contain multiple CPU sockets or multiple RAM banks per CPU socket
are characterized by the non-uniform access times involved in accessing system
memory.

[![NUMA](/img/nomad-numa.png)](/img/nomad-numa.png)

The simplified example machine above has the following topology
The simplified example machine above has the following topology:
- 2 physical CPU sockets
- 4 system memory banks, 2 per socket
- 8 physical cpu cores (4 per socket)
Expand Down Expand Up @@ -229,7 +229,7 @@ node   0   1   2   3
  3:  32  32  12  10
```

These SLIT table "node distance" values are presented as approximate relative
These SLIT table node distance values are presented as approximate relative
ratios. The value of 10 represents an optimal situation where a memory access
is occurring from a CPU that is part of the same NUMA node. A value of 20 would
indicate a 200% performance degradation, 30 for 300%, etc.
Expand All @@ -252,28 +252,42 @@ numa.node3.cores = 72-95,168-191
## NUMA aware scheduling <EnterpriseAlert inline />

Nomad Enterprise is capable of scheduling tasks in a way that is optimized for
the NUMA topology of a client node. A task may specify a `numa` block indicating
its NUMA optimization preference.
the NUMA topology of a client node. Nomad is able to correlate CPU cores with
memory nodes and assign tasks to run on specific CPU cores so as to minimize any
cross-memory node access patterns. Additionally, Nomad is able to correlate
devices to memory nodes and enable NUMA-aware scheduling to take device
associativity into account when making scheduling decisions.

A task may specify a `numa` block indicating its NUMA optimization preference. This example allocates a `1080ti` GPU and ensures it is on the same NUMA node as the 4 CPU cores reserved for the task.

```hcl
task {
resources {
cores = 6
cores = 4
memory = 2048
device "nvidia/gpu/1080ti" {
count = 1
}
numa {
affinity = "require"
}
devices = [
"nvidia/gpu/1080ti"
]
}
}
}
```

### `affinity` Options
### `affinity` options

There are three supported affinity options: `none`, `prefer`, and `require`,
each with their own advantages and tradeoffs.
This is a required field. There are three supported affinity options:
`none`, `prefer`, and `require`, each with their own advantages and tradeoffs.

#### option `none`

In the `none` mode the Nomad scheduler leverages the apathy of jobs without
In the `none` mode, the Nomad scheduler leverages the apathy of jobs without
preference of NUMA affinity to help reduce core fragmentation within NUMA nodes.
It does so by bin-packing the core request of these jobs onto the NUMA nodes
with the fewest unused cores available.
Expand All @@ -291,7 +305,7 @@ resources {

#### option `prefer`

In the `prefer` mode the Nomad scheduler uses the hardware topology of a node
In the `prefer` mode, the Nomad scheduler uses the hardware topology of a node
to calculate an optimized selection of available cores, but does not limit
those cores to come from a single NUMA node.

Expand All @@ -306,7 +320,7 @@ resources {

#### option `require`

In the `require` mode the Nomad scheduler uses the topology of each potential
In the `require` mode, the Nomad scheduler uses the topology of each potential
client to find a set of available CPU cores that belong to the same NUMA node.
If no such set of cores can be found, that node is marked exhausted for the
resource of `numa-cores`.
Expand All @@ -320,7 +334,45 @@ resources {
}
```

## Virtual CPU Fingerprinting
### `devices` options

`devices` is an optional list of devices that must be colocated on the NUMA node
along with allocated CPU cores.

The following diagram shows how a set of devices can be correlated to CPU and memory.

[![How a set of devices can be correlated to CPU and memory](/img/nomad-devices-correlate-cpu-memory.png)](/img/nomad-devices-correlate-cpu-memory.png)

This example declares three devices and configures two in the `numa` block.

```hcl
task {
resources {
cores = 8
memory = 16384
device "nvidia/gpu/H100" {
count = 2
}
device "intel/net/XXVDA2" {
count = 1
}
device "xilinx/fpga/X7" {
count = 1
}
numa {
affinity = "require"
devices = [
"nvidia/gpu/H100",
"intel/net/XXVDA2"
]
}
}
}
```

## Virtual CPU fingerprinting

When running on a virtualized host such as Amazon EC2 Nomad makes use of the
`dmidecode` tool to detect CPU performance data. Some Linux distributions will
Expand Down

0 comments on commit 1e57122

Please sign in to comment.