404
Not Found
diff --git a/master/404.html b/master/404.html index e9a1fb460e..5522174fec 100644 --- a/master/404.html +++ b/master/404.html @@ -1 +1 @@ -
Not Found
Not Found
You can reach us via the following channels:
This is a SIG-node subproject, hosted under the Kubernetes SIGs organization in Github. The project was established in 2016 and was migrated to Kubernetes SIGs in 2018.
This is open source software released under the Apache 2.0 License.
You can reach us via the following channels:
This is a SIG-node subproject, hosted under the Kubernetes SIGs organization in Github. The project was established in 2016 and was migrated to Kubernetes SIGs in 2018.
This is open source software released under the Apache 2.0 License.
Node Feature Discovery provides a Helm chart to manage its deployment.
NOTE: NFD is not ideal for other Helm charts to depend on as that may result in multiple parallel NFD deployments in the same cluster which is not fully supported by the NFD Helm chart.
Helm package manager should be installed.
To install the latest stable version:
export NFD_NS=node-feature-discovery
+ Helm · Node Feature Discovery
Deployment with Helm
Table of contents
Node Feature Discovery provides a Helm chart to manage its deployment.
NOTE: NFD is not ideal for other Helm charts to depend on as that may result in multiple parallel NFD deployments in the same cluster which is not fully supported by the NFD Helm chart.
Prerequisites
Helm package manager should be installed.
Deployment
To install the latest stable version:
export NFD_NS=node-feature-discovery
helm repo add nfd https://kubernetes-sigs.github.io/node-feature-discovery/charts
helm repo update
helm install nfd/node-feature-discovery --namespace $NFD_NS --create-namespace --generate-name
@@ -41,4 +41,4 @@
helm upgrade node-feature-discovery nfd/node-feature-discovery --namespace $NFD_NS
Uninstalling the chart
To uninstall the node-feature-discovery
deployment:
export NFD_NS=node-feature-discovery
helm uninstall node-feature-discovery --namespace $NFD_NS
-
The command removes all the Kubernetes components associated with the chart and deletes the release. It also runs a post-delete hook that cleans up the nodes of all labels, annotations, taints and extended resources that were created by NFD.
Chart parameters
To tailor the deployment of the Node Feature Discovery to your needs following Chart parameters are available.
General parameters
Name Type Default Description image.repository
string gcr.io/k8s-staging-nfd/node-feature-discovery
NFD image repository image.tag
string master
NFD image tag image.pullPolicy
string Always
Image pull policy imagePullSecrets
array [] ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. More info. nameOverride
string Override the name of the chart fullnameOverride
string Override a default fully qualified app name featureGates.NodeFeatureGroupAPI
bool false Enable the NodeFeatureGroup CRD API. featureGates.DisableAutoPrefix
bool false Enable DisableAutoPrefix feature gate. Disables automatic prefixing of unprefixed labels, annotations and extended resources. prometheus.enable
bool false Specifies whether to expose metrics using prometheus operator prometheus.labels
dict {} Specifies labels for use with the prometheus operator to control how it is selected prometheus.scrapeInterval
string 10s Specifies the interval by which metrics are scraped priorityClassName
string The name of the PriorityClass to be used for the NFD pods.
Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster.
Master pod parameters
Name Type Default Description master.*
dict NFD master deployment configuration master.enable
bool true Specifies whether nfd-master should be deployed master.hostNetwork
bool false Specifies whether to enable or disable running the container in the host's network namespace master.metricsPort
integer 8081 Port on which to expose metrics from components to prometheus operator. DEPRECATED: will be replaced by master.port
in NFD v0.18. master.healthPort
integer 8082 Port on which to expose the grpc health endpoint, will be also used for the probes. DEPRECATED: will be replaced by master.port
in NFD v0.18. master.instance
string Instance name. Used to separate annotation namespaces for multiple parallel deployments master.resyncPeriod
string NFD API controller resync period. master.extraLabelNs
array [] List of allowed extra label namespaces master.enableTaints
bool false Specifies whether to enable or disable node tainting master.replicaCount
integer 1 Number of desired pods. This is a pointer to distinguish between explicit zero and not specified master.podSecurityContext
dict {} PodSecurityContext holds pod-level security attributes and common container settings master.securityContext
dict {} Container security settings master.serviceAccount.create
bool true Specifies whether a service account should be created master.serviceAccount.annotations
dict {} Annotations to add to the service account master.serviceAccount.name
string The name of the service account to use. If not set and create is true, a name is generated using the fullname template master.rbac.create
bool true Specifies whether to create RBAC configuration for nfd-master master.resources.limits
dict {memory: 4Gi} NFD master pod resources limits master.resources.requests
dict {cpu: 100m, memory: 128Mi} NFD master pod resources requests. See [0]
for more info master.tolerations
dict Schedule to control-plane node NFD master pod tolerations master.annotations
dict {} NFD master pod annotations master.affinity
dict NFD master pod required node affinity master.deploymentAnnotations
dict {} NFD master deployment annotations master.nfdApiParallelism
integer 10 Specifies the maximum number of concurrent node updates. master.config
dict NFD master configuration master.extraArgs
array [] Additional command line arguments to pass to nfd-master master.extraEnvs
array [] Additional environment variables to pass to nfd-master master.revisionHistoryLimit
integer Specify how many old ReplicaSets for this Deployment you want to retain. revisionHistoryLimit master.startupProbe.initialDelaySecond s
integer 0 (by Kubernetes) Specifies the number of seconds after the container has started before startup probes are initiated. master.startupProbe.failureThreshold
integer 30 Specifies the number of consecutive failures of startup probes before considering the pod as not ready. master.startupProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the startup probe. master.startupProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. master.livenessProbe.initialDelaySeconds
integer 0 (by Kubernetes) Specifies the number of seconds after the container has started before liveness probes are initiated. master.livenessProbe.failureThreshold
integer 3 (by Kubernetes) Specifies the number of consecutive failures of liveness probes before considering the pod as not ready. master.livenessProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the liveness probe. master.livenessProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. master.readinessProbe.initialDelaySeconds
integer 0 (by Kubernetes) Specifies the number of seconds after the container has started before readiness probes are initiated. master.readinessProbe.failureThreshold
integer 10 Specifies the number of consecutive failures of readiness probes before considering the pod as not ready. master.readinessProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the readiness probe. master.readinessProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. master.readinessProbe.successThreshold
integer 1 (by Kubernetes) Specifies the number of consecutive successes of readiness probes before considering the pod as ready.
[0]
Additional info for master.resources.requests
:
You may want to use the same value for requests.memory
and limits.memory
. The "requests" value affects scheduling to accommodate pods on nodes. If there is a large difference between "requests" and "limits" and nodes experience memory pressure, the kernel may invoke the OOM Killer, even if the memory does not exceed the "limits" threshold. This can cause unexpected pod evictions. Memory cannot be compressed and once allocated to a pod, it can only be reclaimed by killing the pod. Natan Yellin 22/09/2022 that discusses this issue.
Worker pod parameters
Name Type Default Description worker.*
dict NFD worker daemonset configuration worker.enable
bool true Specifies whether nfd-worker should be deployed worker.hostNetwork
bool false Specifies whether to enable or disable running the container in the host's network namespace worker.metricsPort
int 8081 Port on which to expose metrics from components to prometheus operator. DEPRECATED: will be replaced by worker.port
in NFD v0.18. worker.healthPort
int 8082 Port on which to expose the grpc health endpoint, will be also used for the probes. DEPRECATED: will be replaced by worker.port
in NFD v0.18. worker.config
dict NFD worker configuration worker.podSecurityContext
dict {} PodSecurityContext holds pod-level security attributes and common container settins worker.securityContext
dict {} Container security settings worker.serviceAccount.create
bool true Specifies whether a service account for nfd-worker should be created worker.serviceAccount.annotations
dict {} Annotations to add to the service account for nfd-worker worker.serviceAccount.name
string The name of the service account to use for nfd-worker. If not set and create is true, a name is generated using the fullname template (suffixed with -worker
) worker.rbac.create
bool true Specifies whether to create RBAC configuration for nfd-worker worker.mountUsrSrc
bool false Specifies whether to allow users to mount the hostpath /user/src. Does not work on systems without /usr/src AND a read-only /usr worker.resources.limits
dict {memory: 512Mi} NFD worker pod resources limits worker.resources.requests
dict {cpu: 5m, memory: 64Mi} NFD worker pod resources requests worker.nodeSelector
dict {} NFD worker pod node selector worker.tolerations
dict {} NFD worker pod node tolerations worker.priorityClassName
string NFD worker pod priority class worker.annotations
dict {} NFD worker pod annotations worker.daemonsetAnnotations
dict {} NFD worker daemonset annotations worker.extraArgs
array [] Additional command line arguments to pass to nfd-worker worker.extraEnvs
array [] Additional environment variables to pass to nfd-worker worker.revisionHistoryLimit
integer Specify how many old ControllerRevisions for this DaemonSet you want to retain. revisionHistoryLimit worker.livenessProbe.initialDelaySeconds
integer 10 Specifies the number of seconds after the container has started before liveness probes are initiated. worker.livenessProbe.failureThreshold
integer 3 (by Kubernetes) Specifies the number of consecutive failures of liveness probes before considering the pod as not ready. worker.livenessProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the liveness probe. worker.livenessProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. worker.readinessProbe.initialDelaySeconds
integer 5 Specifies the number of seconds after the container has started before readiness probes are initiated. worker.readinessProbe.failureThreshold
integer 10 Specifies the number of consecutive failures of readiness probes before considering the pod as not ready. worker.readinessProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the readiness probe. worker.readinessProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. worker.readinessProbe.successThreshold
integer 1 (by Kubernetes) Specifies the number of consecutive successes of readiness probes before considering the pod as ready.
Topology updater parameters
Name Type Default Description topologyUpdater.*
dict NFD Topology Updater configuration topologyUpdater.enable
bool false Specifies whether the NFD Topology Updater should be created topologyUpdater.hostNetwork
bool false Specifies whether to enable or disable running the container in the host's network namespace topologyUpdater.createCRDs
bool false Specifies whether the NFD Topology Updater CRDs should be created topologyUpdater.serviceAccount.create
bool true Specifies whether the service account for topology updater should be created topologyUpdater.serviceAccount.annotations
dict {} Annotations to add to the service account for topology updater topologyUpdater.serviceAccount.name
string The name of the service account for topology updater to use. If not set and create is true, a name is generated using the fullname template and -topology-updater
suffix topologyUpdater.rbac.create
bool true Specifies whether to create RBAC configuration for topology updater topologyUpdater.metricsPort
integer 8081 Port on which to expose prometheus metrics. DEPRECATED: will be replaced by topologyUpdater.port
in NFD v0.18. topologyUpdater.healthPort
integer 8082 Port on which to expose the grpc health endpoint, will be also used for the probes. DEPRECATED: will be replaced by topologyUpdater.port
in NFD v0.18. topologyUpdater.kubeletConfigPath
string "" Specifies the kubelet config host path topologyUpdater.kubeletPodResourcesSockPath
string "" Specifies the kubelet sock path to read pod resources topologyUpdater.updateInterval
string 60s Time to sleep between CR updates. Non-positive value implies no CR update. topologyUpdater.watchNamespace
string *
Namespace to watch pods, *
for all namespaces topologyUpdater.podSecurityContext
dict {} PodSecurityContext holds pod-level security attributes and common container sett topologyUpdater.securityContext
dict {} Container security settings topologyUpdater.resources.limits
dict {memory: 60Mi} NFD Topology Updater pod resources limits topologyUpdater.resources.requests
dict {cpu: 50m, memory: 40Mi} NFD Topology Updater pod resources requests topologyUpdater.nodeSelector
dict {} Topology updater pod node selector topologyUpdater.tolerations
dict {} Topology updater pod node tolerations topologyUpdater.annotations
dict {} Topology updater pod annotations topologyUpdater.daemonsetAnnotations
dict {} Topology updater daemonset annotations topologyUpdater.affinity
dict {} Topology updater pod affinity topologyUpdater.config
dict configuration topologyUpdater.podSetFingerprint
bool true Enables compute and report of pod fingerprint in NRT objects. topologyUpdater.kubeletStateDir
string /var/lib/kubelet Specifies kubelet state directory path for watching state and checkpoint files. Empty value disables kubelet state tracking. topologyUpdater.extraArgs
array [] Additional command line arguments to pass to nfd-topology-updater topologyUpdater.extraEnvs
array [] Additional environment variables to pass to nfd-topology-updater topologyUpdater.revisionHistoryLimit
integer Specify how many old ControllerRevisions for this DaemonSet you want to retain. revisionHistoryLimit topologyUpdater.livenessProbe.initialDelaySeconds
integer 10 Specifies the number of seconds after the container has started before liveness probes are initiated. topologyUpdater.livenessProbe.failureThreshold
integer 3 (by Kubernetes) Specifies the number of consecutive failures of liveness probes before considering the pod as not ready. topologyUpdater.livenessProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the liveness probe. topologyUpdater.livenessProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. topologyUpdater.readinessProbe.initialDelaySeconds
integer 5 Specifies the number of seconds after the container has started before readiness probes are initiated. topologyUpdater.readinessProbe.failureThreshold
integer 10 Specifies the number of consecutive failures of readiness probes before considering the pod as not ready. topologyUpdater.readinessProbe.periodSeconds
integer 10 (by Kubernetes) Specifies how often (in seconds) to perform the readiness probe. topologyUpdater.readinessProbe.timeoutSeconds
integer 1 (by Kubernetes) Specifies the number of seconds after which the probe times out. topologyUpdater.readinessProbe.successThreshold
integer 1 (by Kubernetes) Specifies the number of consecutive successes of readiness probes before considering the pod as ready.
Garbage collector parameters
Name Type Default Description gc.*
dict NFD Garbage Collector configuration gc.enable
bool true Specifies whether the NFD Garbage Collector should be created gc.hostNetwork
bool false Specifies whether to enable or disable running the container in the host's network namespace gc.serviceAccount.create
bool true Specifies whether the service account for garbage collector should be created gc.serviceAccount.annotations
dict {} Annotations to add to the service account for garbage collector gc.serviceAccount.name
string The name of the service account for garbage collector to use. If not set and create is true, a name is generated using the fullname template and -gc
suffix gc.rbac.create
bool true Specifies whether to create RBAC configuration for garbage collector gc.interval
string 1h Time between periodic garbage collector runs gc.podSecurityContext
dict {} PodSecurityContext holds pod-level security attributes and common container settings gc.resources.limits
dict {memory: 1Gi} NFD Garbage Collector pod resources limits gc.resources.requests
dict {cpu: 10m, memory: 128Mi} NFD Garbage Collector pod resources requests gc.metricsPort
integer 8081 Port on which to serve Prometheus metrics. DEPRECATED: will be replaced by gc.port
in NFD v0.18. gc.nodeSelector
dict {} Garbage collector pod node selector gc.tolerations
dict {} Garbage collector pod node tolerations gc.annotations
dict {} Garbage collector pod annotations gc.deploymentAnnotations
dict {} Garbage collector deployment annotations gc.affinity
dict {} Garbage collector pod affinity gc.extraArgs
array [] Additional command line arguments to pass to nfd-gc gc.extraEnvs
array [] Additional environment variables to pass to nfd-gc gc.revisionHistoryLimit
integer Specify how many old ReplicaSets for this Deployment you want to retain. revisionHistoryLimit
Node Feature Discovery master
\ No newline at end of file
+
The command removes all the Kubernetes components associated with the chart and deletes the release. It also runs a post-delete hook that cleans up the nodes of all labels, annotations, taints and extended resources that were created by NFD.
To tailor the deployment of the Node Feature Discovery to your needs following Chart parameters are available.
Name | Type | Default | Description |
---|---|---|---|
image.repository | string | gcr.io/k8s-staging-nfd/node-feature-discovery | NFD image repository |
image.tag | string | master | NFD image tag |
image.pullPolicy | string | Always | Image pull policy |
imagePullSecrets | array | [] | ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec. More info. |
nameOverride | string | Override the name of the chart | |
fullnameOverride | string | Override a default fully qualified app name | |
featureGates.NodeFeatureGroupAPI | bool | false | Enable the NodeFeatureGroup CRD API. |
featureGates.DisableAutoPrefix | bool | false | Enable DisableAutoPrefix feature gate. Disables automatic prefixing of unprefixed labels, annotations and extended resources. |
prometheus.enable | bool | false | Specifies whether to expose metrics using prometheus operator |
prometheus.labels | dict | {} | Specifies labels for use with the prometheus operator to control how it is selected |
prometheus.scrapeInterval | string | 10s | Specifies the interval by which metrics are scraped |
priorityClassName | string | The name of the PriorityClass to be used for the NFD pods. |
Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster.
Name | Type | Default | Description |
---|---|---|---|
master.* | dict | NFD master deployment configuration | |
master.enable | bool | true | Specifies whether nfd-master should be deployed |
master.hostNetwork | bool | false | Specifies whether to enable or disable running the container in the host's network namespace |
master.metricsPort | integer | 8081 | Port on which to expose metrics from components to prometheus operator. DEPRECATED: will be replaced by master.port in NFD v0.18. |
master.healthPort | integer | 8082 | Port on which to expose the grpc health endpoint, will be also used for the probes. DEPRECATED: will be replaced by master.port in NFD v0.18. |
master.instance | string | Instance name. Used to separate annotation namespaces for multiple parallel deployments | |
master.resyncPeriod | string | NFD API controller resync period. | |
master.extraLabelNs | array | [] | List of allowed extra label namespaces |
master.enableTaints | bool | false | Specifies whether to enable or disable node tainting |
master.replicaCount | integer | 1 | Number of desired pods. This is a pointer to distinguish between explicit zero and not specified |
master.podSecurityContext | dict | {} | PodSecurityContext holds pod-level security attributes and common container settings |
master.securityContext | dict | {} | Container security settings |
master.serviceAccount.create | bool | true | Specifies whether a service account should be created |
master.serviceAccount.annotations | dict | {} | Annotations to add to the service account |
master.serviceAccount.name | string | The name of the service account to use. If not set and create is true, a name is generated using the fullname template | |
master.rbac.create | bool | true | Specifies whether to create RBAC configuration for nfd-master |
master.resources.limits | dict | {memory: 4Gi} | NFD master pod resources limits |
master.resources.requests | dict | {cpu: 100m, memory: 128Mi} | NFD master pod resources requests. See [0] for more info |
master.tolerations | dict | Schedule to control-plane node | NFD master pod tolerations |
master.annotations | dict | {} | NFD master pod annotations |
master.affinity | dict | NFD master pod required node affinity | |
master.deploymentAnnotations | dict | {} | NFD master deployment annotations |
master.nfdApiParallelism | integer | 10 | Specifies the maximum number of concurrent node updates. |
master.config | dict | NFD master configuration | |
master.extraArgs | array | [] | Additional command line arguments to pass to nfd-master |
master.extraEnvs | array | [] | Additional environment variables to pass to nfd-master |
master.revisionHistoryLimit | integer | Specify how many old ReplicaSets for this Deployment you want to retain. revisionHistoryLimit | |
master.startupProbe.initialDelaySecond s | integer | 0 (by Kubernetes) | Specifies the number of seconds after the container has started before startup probes are initiated. |
master.startupProbe.failureThreshold | integer | 30 | Specifies the number of consecutive failures of startup probes before considering the pod as not ready. |
master.startupProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the startup probe. |
master.startupProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
master.livenessProbe.initialDelaySeconds | integer | 0 (by Kubernetes) | Specifies the number of seconds after the container has started before liveness probes are initiated. |
master.livenessProbe.failureThreshold | integer | 3 (by Kubernetes) | Specifies the number of consecutive failures of liveness probes before considering the pod as not ready. |
master.livenessProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the liveness probe. |
master.livenessProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
master.readinessProbe.initialDelaySeconds | integer | 0 (by Kubernetes) | Specifies the number of seconds after the container has started before readiness probes are initiated. |
master.readinessProbe.failureThreshold | integer | 10 | Specifies the number of consecutive failures of readiness probes before considering the pod as not ready. |
master.readinessProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the readiness probe. |
master.readinessProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
master.readinessProbe.successThreshold | integer | 1 (by Kubernetes) | Specifies the number of consecutive successes of readiness probes before considering the pod as ready. |
[0]
Additional info formaster.resources.requests
:
You may want to use the same value forrequests.memory
andlimits.memory
. The "requests" value affects scheduling to accommodate pods on nodes. If there is a large difference between "requests" and "limits" and nodes experience memory pressure, the kernel may invoke the OOM Killer, even if the memory does not exceed the "limits" threshold. This can cause unexpected pod evictions. Memory cannot be compressed and once allocated to a pod, it can only be reclaimed by killing the pod. Natan Yellin 22/09/2022 that discusses this issue.
Name | Type | Default | Description |
---|---|---|---|
worker.* | dict | NFD worker daemonset configuration | |
worker.enable | bool | true | Specifies whether nfd-worker should be deployed |
worker.hostNetwork | bool | false | Specifies whether to enable or disable running the container in the host's network namespace |
worker.metricsPort | int | 8081 | Port on which to expose metrics from components to prometheus operator. DEPRECATED: will be replaced by worker.port in NFD v0.18. |
worker.healthPort | int | 8082 | Port on which to expose the grpc health endpoint, will be also used for the probes. DEPRECATED: will be replaced by worker.port in NFD v0.18. |
worker.config | dict | NFD worker configuration | |
worker.podSecurityContext | dict | {} | PodSecurityContext holds pod-level security attributes and common container settins |
worker.securityContext | dict | {} | Container security settings |
worker.serviceAccount.create | bool | true | Specifies whether a service account for nfd-worker should be created |
worker.serviceAccount.annotations | dict | {} | Annotations to add to the service account for nfd-worker |
worker.serviceAccount.name | string | The name of the service account to use for nfd-worker. If not set and create is true, a name is generated using the fullname template (suffixed with -worker ) | |
worker.rbac.create | bool | true | Specifies whether to create RBAC configuration for nfd-worker |
worker.mountUsrSrc | bool | false | Specifies whether to allow users to mount the hostpath /user/src. Does not work on systems without /usr/src AND a read-only /usr |
worker.resources.limits | dict | {memory: 512Mi} | NFD worker pod resources limits |
worker.resources.requests | dict | {cpu: 5m, memory: 64Mi} | NFD worker pod resources requests |
worker.nodeSelector | dict | {} | NFD worker pod node selector |
worker.tolerations | dict | {} | NFD worker pod node tolerations |
worker.priorityClassName | string | NFD worker pod priority class | |
worker.annotations | dict | {} | NFD worker pod annotations |
worker.daemonsetAnnotations | dict | {} | NFD worker daemonset annotations |
worker.extraArgs | array | [] | Additional command line arguments to pass to nfd-worker |
worker.extraEnvs | array | [] | Additional environment variables to pass to nfd-worker |
worker.revisionHistoryLimit | integer | Specify how many old ControllerRevisions for this DaemonSet you want to retain. revisionHistoryLimit | |
worker.livenessProbe.initialDelaySeconds | integer | 10 | Specifies the number of seconds after the container has started before liveness probes are initiated. |
worker.livenessProbe.failureThreshold | integer | 3 (by Kubernetes) | Specifies the number of consecutive failures of liveness probes before considering the pod as not ready. |
worker.livenessProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the liveness probe. |
worker.livenessProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
worker.readinessProbe.initialDelaySeconds | integer | 5 | Specifies the number of seconds after the container has started before readiness probes are initiated. |
worker.readinessProbe.failureThreshold | integer | 10 | Specifies the number of consecutive failures of readiness probes before considering the pod as not ready. |
worker.readinessProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the readiness probe. |
worker.readinessProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
worker.readinessProbe.successThreshold | integer | 1 (by Kubernetes) | Specifies the number of consecutive successes of readiness probes before considering the pod as ready. |
Name | Type | Default | Description |
---|---|---|---|
topologyUpdater.* | dict | NFD Topology Updater configuration | |
topologyUpdater.enable | bool | false | Specifies whether the NFD Topology Updater should be created |
topologyUpdater.hostNetwork | bool | false | Specifies whether to enable or disable running the container in the host's network namespace |
topologyUpdater.createCRDs | bool | false | Specifies whether the NFD Topology Updater CRDs should be created |
topologyUpdater.serviceAccount.create | bool | true | Specifies whether the service account for topology updater should be created |
topologyUpdater.serviceAccount.annotations | dict | {} | Annotations to add to the service account for topology updater |
topologyUpdater.serviceAccount.name | string | The name of the service account for topology updater to use. If not set and create is true, a name is generated using the fullname template and -topology-updater suffix | |
topologyUpdater.rbac.create | bool | true | Specifies whether to create RBAC configuration for topology updater |
topologyUpdater.metricsPort | integer | 8081 | Port on which to expose prometheus metrics. DEPRECATED: will be replaced by topologyUpdater.port in NFD v0.18. |
topologyUpdater.healthPort | integer | 8082 | Port on which to expose the grpc health endpoint, will be also used for the probes. DEPRECATED: will be replaced by topologyUpdater.port in NFD v0.18. |
topologyUpdater.kubeletConfigPath | string | "" | Specifies the kubelet config host path |
topologyUpdater.kubeletPodResourcesSockPath | string | "" | Specifies the kubelet sock path to read pod resources |
topologyUpdater.updateInterval | string | 60s | Time to sleep between CR updates. Non-positive value implies no CR update. |
topologyUpdater.watchNamespace | string | * | Namespace to watch pods, * for all namespaces |
topologyUpdater.podSecurityContext | dict | {} | PodSecurityContext holds pod-level security attributes and common container sett |
topologyUpdater.securityContext | dict | {} | Container security settings |
topologyUpdater.resources.limits | dict | {memory: 60Mi} | NFD Topology Updater pod resources limits |
topologyUpdater.resources.requests | dict | {cpu: 50m, memory: 40Mi} | NFD Topology Updater pod resources requests |
topologyUpdater.nodeSelector | dict | {} | Topology updater pod node selector |
topologyUpdater.tolerations | dict | {} | Topology updater pod node tolerations |
topologyUpdater.annotations | dict | {} | Topology updater pod annotations |
topologyUpdater.daemonsetAnnotations | dict | {} | Topology updater daemonset annotations |
topologyUpdater.affinity | dict | {} | Topology updater pod affinity |
topologyUpdater.config | dict | configuration | |
topologyUpdater.podSetFingerprint | bool | true | Enables compute and report of pod fingerprint in NRT objects. |
topologyUpdater.kubeletStateDir | string | /var/lib/kubelet | Specifies kubelet state directory path for watching state and checkpoint files. Empty value disables kubelet state tracking. |
topologyUpdater.extraArgs | array | [] | Additional command line arguments to pass to nfd-topology-updater |
topologyUpdater.extraEnvs | array | [] | Additional environment variables to pass to nfd-topology-updater |
topologyUpdater.revisionHistoryLimit | integer | Specify how many old ControllerRevisions for this DaemonSet you want to retain. revisionHistoryLimit | |
topologyUpdater.livenessProbe.initialDelaySeconds | integer | 10 | Specifies the number of seconds after the container has started before liveness probes are initiated. |
topologyUpdater.livenessProbe.failureThreshold | integer | 3 (by Kubernetes) | Specifies the number of consecutive failures of liveness probes before considering the pod as not ready. |
topologyUpdater.livenessProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the liveness probe. |
topologyUpdater.livenessProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
topologyUpdater.readinessProbe.initialDelaySeconds | integer | 5 | Specifies the number of seconds after the container has started before readiness probes are initiated. |
topologyUpdater.readinessProbe.failureThreshold | integer | 10 | Specifies the number of consecutive failures of readiness probes before considering the pod as not ready. |
topologyUpdater.readinessProbe.periodSeconds | integer | 10 (by Kubernetes) | Specifies how often (in seconds) to perform the readiness probe. |
topologyUpdater.readinessProbe.timeoutSeconds | integer | 1 (by Kubernetes) | Specifies the number of seconds after which the probe times out. |
topologyUpdater.readinessProbe.successThreshold | integer | 1 (by Kubernetes) | Specifies the number of consecutive successes of readiness probes before considering the pod as ready. |
Name | Type | Default | Description |
---|---|---|---|
gc.* | dict | NFD Garbage Collector configuration | |
gc.enable | bool | true | Specifies whether the NFD Garbage Collector should be created |
gc.hostNetwork | bool | false | Specifies whether to enable or disable running the container in the host's network namespace |
gc.serviceAccount.create | bool | true | Specifies whether the service account for garbage collector should be created |
gc.serviceAccount.annotations | dict | {} | Annotations to add to the service account for garbage collector |
gc.serviceAccount.name | string | The name of the service account for garbage collector to use. If not set and create is true, a name is generated using the fullname template and -gc suffix | |
gc.rbac.create | bool | true | Specifies whether to create RBAC configuration for garbage collector |
gc.interval | string | 1h | Time between periodic garbage collector runs |
gc.podSecurityContext | dict | {} | PodSecurityContext holds pod-level security attributes and common container settings |
gc.resources.limits | dict | {memory: 1Gi} | NFD Garbage Collector pod resources limits |
gc.resources.requests | dict | {cpu: 10m, memory: 128Mi} | NFD Garbage Collector pod resources requests |
gc.metricsPort | integer | 8081 | Port on which to serve Prometheus metrics. DEPRECATED: will be replaced by gc.port in NFD v0.18. |
gc.nodeSelector | dict | {} | Garbage collector pod node selector |
gc.tolerations | dict | {} | Garbage collector pod node tolerations |
gc.annotations | dict | {} | Garbage collector pod annotations |
gc.deploymentAnnotations | dict | {} | Garbage collector deployment annotations |
gc.affinity | dict | {} | Garbage collector pod affinity |
gc.extraArgs | array | [] | Additional command line arguments to pass to nfd-gc |
gc.extraEnvs | array | [] | Additional environment variables to pass to nfd-gc |
gc.revisionHistoryLimit | integer | Specify how many old ReplicaSets for this Deployment you want to retain. revisionHistoryLimit |
NFD offers two variants of the container image. Released container images are available for x86_64 and Arm64 architectures.
The default is a minimal image based on scratch and only supports running statically linked binaries.
For backwards compatibility a container image tag with suffix -minimal
(e.g. gcr.io/k8s-staging-nfd/node-feature-discovery:master-minimal
) is provided.
This image is based on debian:bookworm-slim and contains a full Linux system for doing live debugging and diagnosis of the NFD images.
The container image tag has suffix -full
(e.g. gcr.io/k8s-staging-nfd/node-feature-discovery:master-full
).
NFD offers two variants of the container image. Released container images are available for x86_64 and Arm64 architectures.
The default is a minimal image based on scratch and only supports running statically linked binaries.
For backwards compatibility a container image tag with suffix -minimal
(e.g. gcr.io/k8s-staging-nfd/node-feature-discovery:master-minimal
) is provided.
This image is based on debian:bookworm-slim and contains a full Linux system for doing live debugging and diagnosis of the NFD images.
The container image tag has suffix -full
(e.g. gcr.io/k8s-staging-nfd/node-feature-discovery:master-full
).
Node Feature Discovery can be deployed on any recent version of Kubernetes (v1.24+).
See Image variants for description of the different NFD container images available.
Using Kustomize provides straightforward deployment with kubectl
integration and declarative customization.
Using Helm provides easy management of NFD deployments with nice configuration management and easy upgrades.
Using Operator provides deployment and configuration management via CRDs.
Node Feature Discovery can be deployed on any recent version of Kubernetes (v1.24+).
See Image variants for description of the different NFD container images available.
Using Kustomize provides straightforward deployment with kubectl
integration and declarative customization.
Using Helm provides easy management of NFD deployments with nice configuration management and easy upgrades.
Using Operator provides deployment and configuration management via CRDs.
Kustomize can be used to deploy NFD. Customization of the deployment is done by maintaining declarative overlays on top of the base overlays in NFD.
To follow the deployment instructions here, kubectl v1.24 or later is required.
The kustomize overlays provided in the repo can be used directly:
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=master"
+ Kustomize · Node Feature Discovery
Deployment with Kustomize
Table of contents
Kustomize can be used to deploy NFD. Customization of the deployment is done by maintaining declarative overlays on top of the base overlays in NFD.
To follow the deployment instructions here, kubectl v1.24 or later is required.
The kustomize overlays provided in the repo can be used directly:
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=master"
This will required RBAC rules and deploy nfd-master (as a deployment) and nfd-worker (as daemonset) in the node-feature-discovery
namespace.
NOTE: nfd-topology-updater is not deployed as part of the default
overlay. Refer to the Master Worker Topologyupdater and Topologyupdater below.
Alternatively you can clone the repository and customize the deployment by creating your own overlays. See kustomize for more information about managing deployment configurations.
Overlays
The NFD repository hosts a set of overlays for different usages and deployment scenarios under deployment/overlays
default
: default deployment of nfd-worker as a daemonset, described above default-job
: see Worker one-shot below master-worker-topologyupdater
: see Master Worker Topologyupdater below topologyupdater
: see Topology Updater below prometheus
: see Metrics below prune
: clean up the cluster after uninstallation, see Removing feature labels samples/custom-rules
: an example for spicing up the default deployment with a separately managed configmap of custom labeling rules, see Custom feature source for more information about custom node labels
Worker one-shot
Feature discovery can alternatively be configured as a one-shot job. The default-job
overlay may be used to achieve this:
NUM_NODES=$(kubectl get no -o jsonpath='{.items[*].metadata.name}' | wc -w)
kubectl kustomize "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default-job?ref=master" | \
sed s"/NUM_NODES/$NUM_NODES/" | \
@@ -21,4 +21,4 @@
kubectl -n $NFD_NS delete sa nfd-master
kubectl delete clusterrole nfd-master
kubectl delete clusterrolebinding nfd-master
-
Node Feature Discovery master
\ No newline at end of file
+
Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster. By default NFD Master and Worker expose metrics on port 8081.
The exposed metrics are
Metric | Type | Description |
---|---|---|
nfd_master_build_info | Gauge | Version from which nfd-master was built |
nfd_worker_build_info | Gauge | Version from which nfd-worker was built |
nfd_gc_build_info | Gauge | Version from which nfd-gc was built |
nfd_topology_updater_build_info | Gauge | Version from which nfd-topology-updater was built |
nfd_master_node_update_requests_total | Counter | Number of node update requests received by the master over gRPC |
nfd_master_node_updates_total | Counter | Number of nodes updated |
nfd_master_node_feature_group_update_requests_total | Counter | Number of cluster feature update requests processed by the master |
nfd_master_node_update_failures_total | Counter | Number of nodes update failures |
nfd_master_node_labels_rejected_total | Counter | Number of nodes labels rejected by nfd-master |
nfd_master_node_extendedresources_rejected_total | Counter | Number of nodes extended resources rejected by nfd-master |
nfd_master_node_taints_rejected_total | Counter | Number of nodes taints rejected by nfd-master |
nfd_master_nodefeaturerule_processing_duration_seconds | Histogram | Time taken to process NodeFeatureRule objects |
nfd_master_nodefeaturerule_processing_errors_total | Counter | Number or errors encountered while processing NodeFeatureRule objects |
nfd_worker_feature_discovery_duration_seconds | Histogram | Time taken to discover features on a node |
nfd_topology_updater_scan_errors_total | Counter | Number of errors in scanning resource allocation of pods. |
nfd_gc_objects_deleted_total | Counter | Number of NodeFeature and NodeResourceTopology objects garbage collected. |
nfd_gc_object_delete_failures_total | Counter | Number of errors in deleting NodeFeature and NodeResourceTopology objects. |
To deploy NFD with metrics enabled using kustomize, you can use the prometheus overlay.
By default metrics are enabled when deploying NFD via Helm. To enable Prometheus to scrape metrics from NFD, you need to pass the following values to Helm:
--set prometheus.enable=true
-
For more info on Helm deployment, see Helm.
It is recommended to specify --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
when deploying prometheus-operator via Helm to enable the prometheus-operator to scrape metrics from any PodMonitor.
or setting labels on the PodMonitor via the helm parameter prometheus.labels
to control which Prometheus instances will scrape this PodMonitor.
NFD contains an example Grafana dashboard. You can import examples/grafana-dashboard.json
to your Grafana instance to visualize the NFD metrics.
Metrics are configured to be exposed using prometheus operator API's by default. If you want to expose metrics using the prometheus operator API's you need to install the prometheus operator in your cluster. By default NFD Master and Worker expose metrics on port 8081.
The exposed metrics are
Metric | Type | Description |
---|---|---|
nfd_master_build_info | Gauge | Version from which nfd-master was built |
nfd_worker_build_info | Gauge | Version from which nfd-worker was built |
nfd_gc_build_info | Gauge | Version from which nfd-gc was built |
nfd_topology_updater_build_info | Gauge | Version from which nfd-topology-updater was built |
nfd_master_node_update_requests_total | Counter | Number of node update requests received by the master over gRPC |
nfd_master_node_updates_total | Counter | Number of nodes updated |
nfd_master_node_feature_group_update_requests_total | Counter | Number of cluster feature update requests processed by the master |
nfd_master_node_update_failures_total | Counter | Number of nodes update failures |
nfd_master_node_labels_rejected_total | Counter | Number of nodes labels rejected by nfd-master |
nfd_master_node_extendedresources_rejected_total | Counter | Number of nodes extended resources rejected by nfd-master |
nfd_master_node_taints_rejected_total | Counter | Number of nodes taints rejected by nfd-master |
nfd_master_nodefeaturerule_processing_duration_seconds | Histogram | Time taken to process NodeFeatureRule objects |
nfd_master_nodefeaturerule_processing_errors_total | Counter | Number or errors encountered while processing NodeFeatureRule objects |
nfd_worker_feature_discovery_duration_seconds | Histogram | Time taken to discover features on a node |
nfd_topology_updater_scan_errors_total | Counter | Number of errors in scanning resource allocation of pods. |
nfd_gc_objects_deleted_total | Counter | Number of NodeFeature and NodeResourceTopology objects garbage collected. |
nfd_gc_object_delete_failures_total | Counter | Number of errors in deleting NodeFeature and NodeResourceTopology objects. |
To deploy NFD with metrics enabled using kustomize, you can use the prometheus overlay.
By default metrics are enabled when deploying NFD via Helm. To enable Prometheus to scrape metrics from NFD, you need to pass the following values to Helm:
--set prometheus.enable=true
+
For more info on Helm deployment, see Helm.
It is recommended to specify --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
when deploying prometheus-operator via Helm to enable the prometheus-operator to scrape metrics from any PodMonitor.
or setting labels on the PodMonitor via the helm parameter prometheus.labels
to control which Prometheus instances will scrape this PodMonitor.
NFD contains an example Grafana dashboard. You can import examples/grafana-dashboard.json
to your Grafana instance to visualize the NFD metrics.
The Node Feature Discovery Operator automates installation, configuration and updates of NFD using a specific NodeFeatureDiscovery custom resource. This also provides good support for managing NFD as a dependency of other operators.
Deployment using the Node Feature Discovery Operator is recommended to be done via operatorhub.io.
Install the operator:
kubectl create -f https://operatorhub.io/install/nfd-operator.yaml
+ NFD Operator · Node Feature Discovery
Deployment with NFD Operator
Table of contents
The Node Feature Discovery Operator automates installation, configuration and updates of NFD using a specific NodeFeatureDiscovery custom resource. This also provides good support for managing NFD as a dependency of other operators.
Deployment
Deployment using the Node Feature Discovery Operator is recommended to be done via operatorhub.io.
- You need to have OLM installed. If you don't, take a look at the latest release for detailed instructions.
-
Install the operator:
kubectl create -f https://operatorhub.io/install/nfd-operator.yaml
-
Create NodeFeatureDiscovery
object (in nfd
namespace here):
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
@@ -17,4 +17,4 @@
EOF
Uninstallation
If you followed the deployment instructions above you can uninstall NFD with:
kubectl -n nfd delete NodeFeatureDiscovery my-nfd-deployment
Optionally, you can also remove the namespace:
kubectl delete ns nfd
-
See the node-feature-discovery-operator and OLM project documentation for instructions for uninstalling the operator and operator lifecycle manager, respectively.
Node Feature Discovery master
\ No newline at end of file
+
See the node-feature-discovery-operator and OLM project documentation for instructions for uninstalling the operator and operator lifecycle manager, respectively.
Follow the uninstallation instructions of the deployment method used (kustomize, helm or operator).
NOTE: This is unnecessary when using the Helm chart for deployment as it will clean up the nodes when NFD is uninstalled.
NFD-Master has a special -prune
command line flag for removing all nfd-related node labels, annotations, extended resources and taints from the cluster.
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/prune?ref=master"
+ Uninstallation · Node Feature Discovery
Uninstallation
Follow the uninstallation instructions of the deployment method used (kustomize, helm or operator).
Removing feature labels
NOTE: This is unnecessary when using the Helm chart for deployment as it will clean up the nodes when NFD is uninstalled.
NFD-Master has a special -prune
command line flag for removing all nfd-related node labels, annotations, extended resources and taints from the cluster.
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/prune?ref=master"
kubectl -n node-feature-discovery wait job.batch/nfd-master --for=condition=complete && \
kubectl delete -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/prune?ref=master"
-
NOTE: You must run prune before removing the RBAC rules (serviceaccount, clusterrole and clusterrolebinding).
Node Feature Discovery master
\ No newline at end of file
+
NOTE: You must run prune before removing the RBAC rules (serviceaccount, clusterrole and clusterrolebinding).
git clone https://github.com/kubernetes-sigs/node-feature-discovery
+ Developer guide · Node Feature Discovery
Developer guide
Table of contents
Building from source
Download the source code
git clone https://github.com/kubernetes-sigs/node-feature-discovery
cd node-feature-discovery
Docker build
Build the container image
See customizing the build below for altering the container image registry, for example.
make
Push the container image
Optional, this example with Docker.
docker push <IMAGE_TAG>
@@ -23,4 +23,4 @@
tilt up
This will override the default value(master
) of IMAGE_TAG_NAME
variable defined in the Tiltfile.
Documentation
All documentation resides under the docs directory in the source tree. It is designed to be served as a html site by GitHub Pages.
Building the documentation is containerized to fix the build environment. The recommended way for developing documentation is to run:
make site-serve
This will build the documentation in a container and serve it under localhost:4000/ making it easy to verify the results. Any changes made to the docs/
will automatically re-trigger a rebuild and are reflected in the served content and can be inspected with a browser refresh.
To just build the html documentation run:
make site-build
-
This will generate html documentation under docs/_site/
.
Node Feature Discovery master
\ No newline at end of file
+
This will generate html documentation under docs/_site/
.
Welcome to Node Feature Discovery – a Kubernetes add-on for detecting hardware features and system configuration!
Continue to:
Introduction for more details on the project.
Quick start for quick step-by-step instructions on how to get NFD running on your cluster.
$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=master
+ Get started · Node Feature Discovery
Node Feature Discovery
Welcome to Node Feature Discovery – a Kubernetes add-on for detecting hardware features and system configuration!
Continue to:
-
Introduction for more details on the project.
-
Quick start for quick step-by-step instructions on how to get NFD running on your cluster.
Quick-start – the short-short version
$ kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=master
namespace/node-feature-discovery created
serviceaccount/nfd-master created
clusterrole.rbac.authorization.k8s.io/nfd-master created
@@ -21,4 +21,4 @@
"feature.node.kubernetes.io/cpu-cpuid.AESNI": "true",
...
-
Node Feature Discovery master
\ No newline at end of file
+
This software enables node feature discovery for Kubernetes. It detects hardware features available on each node in a Kubernetes cluster, and advertises those features using node labels and optionally node extended resources, annotations and node taints. Node Feature Discovery is compatible with any recent version of Kubernetes (v1.24+).
NFD consists of four software components:
NFD-Master is the daemon responsible for communication towards the Kubernetes API. That is, it receives labeling requests from the worker and modifies node objects accordingly.
NFD-Worker is a daemon responsible for feature detection. It then communicates the information to nfd-master which does the actual node labeling. One instance of nfd-worker is supposed to be running on each node of the cluster,
NFD-Topology-Updater is a daemon responsible for examining allocated resources on a worker node to account for resources available to be allocated to new pod on a per-zone basis (where a zone can be a NUMA node). It then creates or updates a NodeResourceTopology custom resource object specific to this node. One instance of nfd-topology-updater is supposed to be running on each node of the cluster.
NFD-GC is a daemon responsible for cleaning obsolete NodeFeature and NodeResourceTopology objects.
One instance of nfd-gc is supposed to be running in the cluster.
Feature discovery is divided into domain-specific feature sources:
Each feature source is responsible for detecting a set of features which. in turn, are turned into node feature labels. Feature labels are prefixed with feature.node.kubernetes.io/
and also contain the name of the feature source. Non-standard user-specific feature labels can be created with the local and custom feature sources.
An overview of the default feature labels:
{
+ Introduction · Node Feature Discovery
Introduction
Table of contents
- NFD-Master
- NFD-Worker
- NFD-Topology-Updater
- NFD-GC
- Feature Discovery
- Node annotations
- Custom resources
This software enables node feature discovery for Kubernetes. It detects hardware features available on each node in a Kubernetes cluster, and advertises those features using node labels and optionally node extended resources, annotations and node taints. Node Feature Discovery is compatible with any recent version of Kubernetes (v1.24+).
NFD consists of four software components:
- nfd-master
- nfd-worker
- nfd-topology-updater
- nfd-gc
NFD-Master
NFD-Master is the daemon responsible for communication towards the Kubernetes API. That is, it receives labeling requests from the worker and modifies node objects accordingly.
NFD-Worker
NFD-Worker is a daemon responsible for feature detection. It then communicates the information to nfd-master which does the actual node labeling. One instance of nfd-worker is supposed to be running on each node of the cluster,
NFD-Topology-Updater
NFD-Topology-Updater is a daemon responsible for examining allocated resources on a worker node to account for resources available to be allocated to new pod on a per-zone basis (where a zone can be a NUMA node). It then creates or updates a NodeResourceTopology custom resource object specific to this node. One instance of nfd-topology-updater is supposed to be running on each node of the cluster.
NFD-GC
NFD-GC is a daemon responsible for cleaning obsolete NodeFeature and NodeResourceTopology objects.
One instance of nfd-gc is supposed to be running in the cluster.
Feature Discovery
Feature discovery is divided into domain-specific feature sources:
- CPU
- Kernel
- Memory
- Network
- PCI
- Storage
- System
- USB
- Custom (rule-based custom features)
- Local (features files)
Each feature source is responsible for detecting a set of features which. in turn, are turned into node feature labels. Feature labels are prefixed with feature.node.kubernetes.io/
and also contain the name of the feature source. Non-standard user-specific feature labels can be created with the local and custom feature sources.
An overview of the default feature labels:
{
"feature.node.kubernetes.io/cpu-<feature-name>": "true",
"feature.node.kubernetes.io/custom-<feature-name>": "true",
"feature.node.kubernetes.io/kernel-<feature name>": "<feature value>",
@@ -10,4 +10,4 @@
"feature.node.kubernetes.io/usb-<device label>.present": "<feature value>",
"feature.node.kubernetes.io/<file name>-<feature name>": "<feature value>"
}
-
Node annotations
NFD also annotates nodes it is running on:
Annotation Description [<instance>.]nfd.node.kubernetes.io/feature-labels Comma-separated list of node labels managed by NFD. NFD uses this internally so must not be edited by users. [<instance>.]nfd.node.kubernetes.io/feature-annotations Comma-separated list of node annotations managed by NFD. NFD uses this internally so must not be edited by users. [<instance>.]nfd.node.kubernetes.io/extended-resources Comma-separated list of node extended resources managed by NFD. NFD uses this internally so must not be edited by users. [<instance>.]nfd.node.kubernetes.io/taints Comma-separated list of node taints managed by NFD. NFD uses this internally so must not be edited by users.
NOTE: the -instance
command line flag affects the annotation names
Unapplicable annotations are not created, i.e. for example nfd.node.kubernetes.io/extended-resources
is only placed if some extended resources were created by NFD.
Custom resources
NFD takes use of some Kubernetes Custom Resources.
NodeFeatures is be used for representing node features and requesting node labels to be generated.
NFD-Master uses NodeFeatureRules for custom labeling of nodes.
NFD-Topology-Updater creates NodeResourceTopology objects that describe the hardware topology of node resources.
Node Feature Discovery master
\ No newline at end of file
+
NFD also annotates nodes it is running on:
Annotation | Description |
---|---|
[<instance>.]nfd.node.kubernetes.io/feature-labels | Comma-separated list of node labels managed by NFD. NFD uses this internally so must not be edited by users. |
[<instance>.]nfd.node.kubernetes.io/feature-annotations | Comma-separated list of node annotations managed by NFD. NFD uses this internally so must not be edited by users. |
[<instance>.]nfd.node.kubernetes.io/extended-resources | Comma-separated list of node extended resources managed by NFD. NFD uses this internally so must not be edited by users. |
[<instance>.]nfd.node.kubernetes.io/taints | Comma-separated list of node taints managed by NFD. NFD uses this internally so must not be edited by users. |
NOTE: the
-instance
command line flag affects the annotation names
Unapplicable annotations are not created, i.e. for example nfd.node.kubernetes.io/extended-resources
is only placed if some extended resources were created by NFD.
NFD takes use of some Kubernetes Custom Resources.
NodeFeatures is be used for representing node features and requesting node labels to be generated.
NFD-Master uses NodeFeatureRules for custom labeling of nodes.
NFD-Topology-Updater creates NodeResourceTopology objects that describe the hardware topology of node resources.
Minimal steps to deploy latest released version of NFD in your cluster.
Deploy with kustomize – creates a new namespace, service and required RBAC rules and deploys nfd-master and nfd-worker daemons.
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=master"
+ Quick start · Node Feature Discovery
Quick start
Minimal steps to deploy latest released version of NFD in your cluster.
Installation
Deploy with kustomize – creates a new namespace, service and required RBAC rules and deploys nfd-master and nfd-worker daemons.
kubectl apply -k "https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=master"
Verify
Wait until NFD master and NFD worker are running.
$ kubectl -n node-feature-discovery get ds,deploy
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/nfd-worker 2 2 2 2 2 <none> 10s
@@ -40,4 +40,4 @@
NAME AGE
kind-control-plane 23s
kind-worker 23s
-
Node Feature Discovery master
\ No newline at end of file
+
Feature gates are a set of key-value pairs that control the behavior of NFD. They are used to enable or disable certain features of NFD. The feature gates are set using the -feature-gates
command line flag or featureGates
value in the Helm chart. The following feature gates are available:
Name | Default | Stage | Since | Until |
---|---|---|---|---|
NodeFeatureAPI | true | Beta | V0.14 | v0.16 |
NodeFeatureAPI | true | GA | V0.17 | |
DisableAutoPrefix | false | Alpha | V0.16 | |
NodeFeatureGroupAPI | false | Alpha | V0.16 |
The NodeFeatureAPI
feature gate enables the Node Feature API. When enabled, NFD will register the Node Feature API with the Kubernetes API server. The Node Feature API is used to expose node-specific hardware and software features to the Kubernetes scheduler. The Node Feature API is a beta feature and is enabled by default.
The NodeFeatureGroupAPI
feature gate enables the Node Feature Group API. When enabled, NFD will register the Node Feature Group API with the Kubernetes API server. The Node Feature Group API is used to create node groups based on hardware and software features. The Node Feature Group API is an alpha feature and is disabled by default.
The DisableAutoPrefix
feature gate controls the automatic prefixing of names. When enabled nfd-master does not automatically add the default feature.node.kubernetes.io/
prefix to unprefixed labels, annotations and extended resources. Automatic prefixing is the default behavior in NFD v0.16 and earlier.
Note that enabling the feature gate effectively causes unprefixed names to be filtered out as NFD does not allow unprefixed names of labels, annotations or extended resources. For example, with the DisableAutoPrefix
feature gate set to false
, a NodeFeatureRule with
labels:
+ Feature Gates · Node Feature Discovery
Feature Gates
Feature gates are a set of key-value pairs that control the behavior of NFD. They are used to enable or disable certain features of NFD. The feature gates are set using the -feature-gates
command line flag or featureGates
value in the Helm chart. The following feature gates are available:
Name Default Stage Since Until NodeFeatureAPI
true Beta V0.14 v0.16 NodeFeatureAPI
true GA V0.17 DisableAutoPrefix
false Alpha V0.16 NodeFeatureGroupAPI
false Alpha V0.16
NodeFeatureAPI
The NodeFeatureAPI
feature gate enables the Node Feature API. When enabled, NFD will register the Node Feature API with the Kubernetes API server. The Node Feature API is used to expose node-specific hardware and software features to the Kubernetes scheduler. The Node Feature API is a beta feature and is enabled by default.
NodeFeatureGroupAPI
The NodeFeatureGroupAPI
feature gate enables the Node Feature Group API. When enabled, NFD will register the Node Feature Group API with the Kubernetes API server. The Node Feature Group API is used to create node groups based on hardware and software features. The Node Feature Group API is an alpha feature and is disabled by default.
DisableAutoPrefix
The DisableAutoPrefix
feature gate controls the automatic prefixing of names. When enabled nfd-master does not automatically add the default feature.node.kubernetes.io/
prefix to unprefixed labels, annotations and extended resources. Automatic prefixing is the default behavior in NFD v0.16 and earlier.
Note that enabling the feature gate effectively causes unprefixed names to be filtered out as NFD does not allow unprefixed names of labels, annotations or extended resources. For example, with the DisableAutoPrefix
feature gate set to false
, a NodeFeatureRule with
labels:
foo: bar
-
will turn into feature.node.kubernetes.io/foo=bar
node label. With DisableAutoPrefix
set to true
, no prefix is added and the label will be filtered out.
Note that taint keys are not affected by this feature gate.
Node Feature Discovery master
\ No newline at end of file
+
will turn into feature.node.kubernetes.io/foo=bar
node label. With DisableAutoPrefix
set to true
, no prefix is added and the label will be filtered out.
Note that taint keys are not affected by this feature gate.
To quickly view available command line flags execute nfd-gc -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master \
+ Garbage Collector Cmdline Reference · Node Feature Discovery
NFD-GC Commandline Flags
Table of Contents
To quickly view available command line flags execute nfd-gc -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master \
nfd-gc -help
-h, -help
Print usage and exit.
-version
Print version and exit.
-gc-interval
The -gc-interval
specifies the interval between periodic garbage collector runs.
Default: 1h
Example:
nfd-gc -gc-interval=1h
-
Node Feature Discovery master
\ No newline at end of file
+
Command line and configuration reference.
Command line and configuration reference.
To quickly view available command line flags execute nfd-master -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-master -help
+ Master cmdline reference · Node Feature Discovery
Commandline flags of nfd-master
Table of contents
- -h, -help
- -version
- -feature-gates
- -prune
- -metrics
- -instance
- -enable-leader-election
- -enable-taints
- -no-publish
- -label-whitelist
- -extra-label-ns
- -deny-label-ns
- -config
- -options
- -nfd-api-parallelism
- Logging
- -resync-period
To quickly view available command line flags execute nfd-master -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-master -help
-h, -help
Print usage and exit.
-version
Print version and exit.
-feature-gates
The -feature-gates
flag is used to enable or disable non GA features. The list of available feature gates can be found in the feature gates documentation.
Example:
nfd-master -feature-gates NodeFeatureGroupAPI=true
-prune
The -prune
flag is a sub-command like option for cleaning up the cluster. It causes nfd-master to remove all NFD related labels, annotations and extended resources from all Node objects of the cluster and exit.
-metrics
DEPRECATED: Will be removed in NFD v0.17 and replaced by -port
.
The -metrics
flag specifies the port on which to expose Prometheus metrics. Setting this to 0 disables the metrics server on nfd-master.
Default: 8081
Example:
nfd-master -metrics=12345
-instance
The -instance
flag makes it possible to run multiple NFD deployments in parallel. In practice, it separates the node annotations between deployments so that each of them can store metadata independently. The instance name must start and end with an alphanumeric character and may only contain alphanumeric characters, -
, _
or .
.
Default: empty
Example:
nfd-master -instance=network
@@ -12,4 +12,4 @@
-options
The -options
flag may be used to specify and override configuration file options directly from the command line. The required format is the same as in the config file i.e. JSON or YAML. Configuration options specified via this flag will override those from the configuration file:
Default: empty
Example:
nfd-master -options='{"noPublish": true}'
-nfd-api-parallelism
The -nfd-api-parallelism
flag can be used to specify the maximum number of concurrent node updates.
Default: 10
Example:
nfd-master -nfd-api-parallelism=1
Logging
The following logging-related flags are inherited from the klog package.
-add_dir_header
If true, adds the file directory to the header of the log messages.
Default: false
-alsologtostderr
Log to standard error as well as files.
Default: false
-log_backtrace_at
When logging hits line file:N, emit a stack trace.
Default: empty
-log_dir
If non-empty, write log files in this directory.
Default: empty
-log_file
If non-empty, use this log file.
Default: empty
-log_file_max_size
Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited.
Default: 1800
-logtostderr
Log to standard error instead of files
Default: true
-skip_headers
If true, avoid header prefixes in the log messages.
Default: false
-skip_log_headers
If true, avoid headers when opening log files.
Default: false
-stderrthreshold
Logs at or above this threshold go to stderr.
Default: 2
-v
Number for the log level verbosity.
Default: 0
-vmodule
Comma-separated list of pattern=N
settings for file-filtered logging.
Default: empty
-resync-period
The -resync-period
flag specifies the NFD API controller resync period. The resync means nfd-master replaying all NodeFeature and NodeFeatureRule objects, thus effectively re-syncing all nodes in the cluster (i.e. ensuring labels, annotations, extended resources and taints are in place).
Default: 1 hour.
Example:
nfd-master -resync-period=2h
-
Node Feature Discovery master
\ No newline at end of file
+
See the sample configuration file for a full example configuration.
noPublish
option disables updates to the Node objects in the Kubernetes API server, making a "dry-run" flag for nfd-master. No Labels, Annotations, Taints or ExtendedResources of nodes are updated.
Default: false
Example:
noPublish: true
+ Master config reference · Node Feature Discovery
Configuration file reference of nfd-master
Table of contents
- noPublish
- extraLabelNs
- denyLabelNs
- autoDefaultNs
- enableTaints
- labelWhiteList
- resyncPeriod
- leaderElection
- nfdApiParallelism
- klog
- restrictions (EXPERIMENTAL)
See the sample configuration file for a full example configuration.
noPublish
noPublish
option disables updates to the Node objects in the Kubernetes API server, making a "dry-run" flag for nfd-master. No Labels, Annotations, Taints or ExtendedResources of nodes are updated.
Default: false
Example:
noPublish: true
extraLabelNs
extraLabelNs
specifies a list of allowed feature label namespaces. This option can be used to allow other vendor or application specific namespaces for custom labels from the local and custom feature sources, even though these labels were denied using the denyLabelNs
parameter.
Default: empty
Example:
extraLabelNs: ["added.ns.io","added.kubernets.io"]
denyLabelNs
denyLabelNs
specifies a list of excluded label namespaces. By default, nfd-master allows creating labels in all namespaces, excluding kubernetes.io
namespace and its sub-namespaces (i.e. *.kubernetes.io
). However, you should note that kubernetes.io
and its sub-namespaces are always denied. This option can be used to exclude some vendors or application specific namespaces.
Default: empty
Example:
denyLabelNs: ["denied.ns.io","denied.kubernetes.io"]
autoDefaultNs
DEPRECATED: Will be removed in NFD v0.17. Use the DisableAutoPrefix feature gate instead.
The autoDefaultNs
option controls the automatic prefixing of names. When set to true (the default in NFD version master) nfd-master automatically adds the default feature.node.kubernetes.io/
prefix to unprefixed labels, annotations and extended resources - this is also the default behavior in NFD v0.15 and earlier. When the option is set to false
, no prefix will be prepended to unprefixed names, effectively causing them to be filtered out (as NFD does not allow unprefixed names of labels, annotations or extended resources). The default will be changed to false
in a future release.
For example, with the autoDefaultNs
set to true
, a NodeFeatureRule with
labels:
@@ -33,4 +33,4 @@
allowOverwrite: false
restrictions.denyNodeFeatureLabels
The denyNodeFeatureLabels
option specifies whether to deny labels from 3rd party NodeFeature objects or not. NodeFeature objects created by nfd-worker are not affected.
Default: false
Example:
restrictions:
denyNodeFeatureLabels: true
-
Node Feature Discovery master
\ No newline at end of file
+
The client is in the experimental v1alpha1
version.
To quickly view available command line flags execute nfd --help
.
Print usage and exit.
Image Compatibility commands.
Perform node validation based on its associated image compatibility artifact.
The --image
flag specifies the URL of the image containing compatibility metadata.
The --plain-http
flag forces the use of HTTP protocol for all registry communications. Default: false
The --platform
flag specifies the artifact platform in the format os[/arch][/variant][:os_version]
.
The --tags
flag specifies a list of tags that must match the tags set on the compatibility objects.
The --output-json
flag prints the output as a JSON object.
The --registry-username
flag specifies the username for the registry.
The --registry-password-stdin
flag enables reading of registry password from stdin.
The --registry-token-stdin
flag enables reading of registry token from stdin.
The client is in the experimental v1alpha1
version.
To quickly view available command line flags execute nfd --help
.
Print usage and exit.
Image Compatibility commands.
Perform node validation based on its associated image compatibility artifact.
The --image
flag specifies the URL of the image containing compatibility metadata.
The --plain-http
flag forces the use of HTTP protocol for all registry communications. Default: false
The --platform
flag specifies the artifact platform in the format os[/arch][/variant][:os_version]
.
The --tags
flag specifies a list of tags that must match the tags set on the compatibility objects.
The --output-json
flag prints the output as a JSON object.
The --registry-username
flag specifies the username for the registry.
The --registry-password-stdin
flag enables reading of registry password from stdin.
The --registry-token-stdin
flag enables reading of registry token from stdin.
To quickly view available command line flags execute kubectl nfd -help
.
Print usage and exit.
Validate a NodeFeatureRule file.
The --nodefeature-file
flag specifies the path to the NodeFeatureRule file to validate.
Test a NodeFeatureRule file against a node without applying it.
The --kubeconfig
flag specifies the path to the kubeconfig file to use for CLI requests.
The --namespace
flag specifies the namespace to use for CLI requests. Default: default
.
The --nodename
flag specifies the name of the node to test the NodeFeatureRule against.
The --nodefeaturerule-file
flag specifies the path to the NodeFeatureRule file to test.
Process a NodeFeatureRule file against a NodeFeature file.
The --nodefeaturerule-file
flag specifies the path to the NodeFeatureRule file to test.
The --nodefeature-file
flag specifies the path to the NodeFeature file to test.
To quickly view available command line flags execute kubectl nfd -help
.
Print usage and exit.
Validate a NodeFeatureRule file.
The --nodefeature-file
flag specifies the path to the NodeFeatureRule file to validate.
Test a NodeFeatureRule file against a node without applying it.
The --kubeconfig
flag specifies the path to the kubeconfig file to use for CLI requests.
The --namespace
flag specifies the namespace to use for CLI requests. Default: default
.
The --nodename
flag specifies the name of the node to test the NodeFeatureRule against.
The --nodefeaturerule-file
flag specifies the path to the NodeFeatureRule file to test.
Process a NodeFeatureRule file against a NodeFeature file.
The --nodefeaturerule-file
flag specifies the path to the NodeFeatureRule file to test.
The --nodefeature-file
flag specifies the path to the NodeFeature file to test.
To quickly view available command line flags execute nfd-topology-updater -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master \
+ Topology Updater Cmdline Reference · Node Feature Discovery
NFD-Topology-Updater Commandline Flags
Table of Contents
- -h, -help
- -version
- -config
- -no-publish
- -oneshot
- -metrics
- -sleep-interval
- -watch-namespace
- -kubelet-config-uri
- -api-auth-token-file
- -podresources-socket
- -pods-fingerprint
- -kubelet-state-dir
To quickly view available command line flags execute nfd-topology-updater -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master \
nfd-topology-updater -help
-h, -help
Print usage and exit.
-version
Print version and exit.
-config
The -config
flag specifies the path of the nfd-topology-updater configuration file to use.
Default: /etc/kubernetes/node-feature-discovery/nfd-topology-updater.conf
Example:
nfd-topology-updater -config=/opt/nfd/nfd-topology-updater.conf
-no-publish
The -no-publish
flag makes for a "dry-run" flag for nfd-topology-updater. NFD-Topology-Updater runs resource hardware topology detection normally, but NodeResourceTopology objects are not created or updated.
Default: false
Example:
nfd-topology-updater -no-publish
@@ -11,4 +11,4 @@
-podresources-socket
The -podresources-socket
specifies the path to the Unix socket where kubelet exports a gRPC service to enable discovery of in-use CPUs and devices, and to provide metadata for them.
Default: /host-var/lib/kubelet/pod-resources/kubelet.sock
Example:
nfd-topology-updater -podresources-socket=/var/lib/kubelet/pod-resources/kubelet.sock
-pods-fingerprint
Enables compute and report the pod set fingerprint in the NRT. A pod fingerprint is a compact representation of the "node state" regarding resources.
Default: true
Example:
nfd-topology-updater -pods-fingerprint=false
-kubelet-state-dir
The -kubelet-state-dir
specifies the path to the Kubelet state directory, where state and checkpoint files are stored. The files are mount as read-only and cannot be change by the updater. Enabled by default. Passing an empty string will disable the watching.
Default: /host-var/lib/kubelet
Example:
nfd-topology-updater -kubelet-state-dir=/var/lib/kubelet
-
Node Feature Discovery master
\ No newline at end of file
+
See the sample configuration file for a full example configuration.
The excludeList
specifies a key-value map of allocated resources that should not be examined by the topology-updater agent per node. Each key is a node name with a value as a list of resources that should not be examined by the agent for that specific node.
Default: empty
Example:
excludeList:
+ Topology-Updater config reference · Node Feature Discovery
Configuration file reference of nfd-topology-updater
Table of contents
See the sample configuration file for a full example configuration.
excludeList
The excludeList
specifies a key-value map of allocated resources that should not be examined by the topology-updater agent per node. Each key is a node name with a value as a list of resources that should not be examined by the agent for that specific node.
Default: empty
Example:
excludeList:
nodeA: [hugepages-2Mi]
nodeB: [memory]
nodeC: [cpu, hugepages-2Mi]
excludeList.*
excludeList.*
is a special value that use to specify all nodes. A resource that would be listed under this key, would be excluded from all nodes.
Default: empty
Example:
excludeList:
'*': [hugepages-2Mi]
-
Node Feature Discovery master
\ No newline at end of file
+
Node Feature Discovery follows semantic versioning where the version number consists of three components, i.e. MAJOR.MINOR.PATCH.
The most recent two minor releases (or release branches) of Node Feature Discovery are supported. That is, with X being the latest release, X and X-1 are supported and X-1 reaches end-of-life when X+1 is released.
Built-in feature labels and features are supported for 2 releases after being deprecated, at minimum. That is, if a feature label is deprecated in version X, it will be supported in X+1 and X+2 and may be dropped in X+3.
Command-line flags and configuration file options are supported for 1 more release after being deprecated, at minimum. That is, if option/flag is deprecated in version X, it will be supported in X+1 and may be removed in X+2.
The same policy (support for 1 release after deprecation) also applies to Helm chart parameters.
Node Feature Discovery is compatible with Kubernetes v1.24 and later.
Node Feature Discovery follows semantic versioning where the version number consists of three components, i.e. MAJOR.MINOR.PATCH.
The most recent two minor releases (or release branches) of Node Feature Discovery are supported. That is, with X being the latest release, X and X-1 are supported and X-1 reaches end-of-life when X+1 is released.
Built-in feature labels and features are supported for 2 releases after being deprecated, at minimum. That is, if a feature label is deprecated in version X, it will be supported in X+1 and X+2 and may be dropped in X+3.
Command-line flags and configuration file options are supported for 1 more release after being deprecated, at minimum. That is, if option/flag is deprecated in version X, it will be supported in X+1 and may be removed in X+2.
The same policy (support for 1 release after deprecation) also applies to Helm chart parameters.
Node Feature Discovery is compatible with Kubernetes v1.24 and later.
To quickly view available command line flags execute nfd-worker -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-worker -help
+ Worker cmdline reference · Node Feature Discovery
Commandline flags of nfd-worker
Table of contents
- -h, -help
- -version
- -feature-gates
- -config
- -options
- -kubeconfig
- -feature-sources
- -label-sources
- -metrics
- -no-publish
- -no-owner-refs
- -oneshot
- Logging
To quickly view available command line flags execute nfd-worker -help
. In a docker container:
docker run gcr.io/k8s-staging-nfd/node-feature-discovery:master nfd-worker -help
-h, -help
Print usage and exit.
-version
Print version and exit.
-feature-gates
The -feature-gates
flag is used to enable or disable non GA features. The list of available feature gates can be found in the feature gates documentation.
Example:
nfd-master -feature-gates NodeFeatureGroupAPI=true
-config
The -config
flag specifies the path of the nfd-worker configuration file to use.
Default: /etc/kubernetes/node-feature-discovery/nfd-worker.conf
Example:
nfd-worker -config=/opt/nfd/worker.conf
-options
The -options
flag may be used to specify and override configuration file options directly from the command line. The required format is the same as in the config file i.e. JSON or YAML. Configuration options specified via this flag will override those from the configuration file:
Default: empty
Example:
nfd-worker -options='{"sources":{"cpu":{"cpuid":{"attributeWhitelist":["AVX","AVX2"]}}}}'
@@ -9,4 +9,4 @@
-no-publish
The -no-publish
flag disables all communication with the nfd-master and the Kubernetes API server. It is effectively a "dry-run" flag for nfd-worker. NFD-Worker runs feature detection normally, but no labeling requests are sent to nfd-master and no NodeFeature objects are created or updated in the API server.
NOTE: This flag takes precedence over the core.noPublish
configuration file option.
Default: false
Example:
nfd-worker -no-publish
-no-owner-refs
The -no-owner-refs
flag disables setting the owner references to Pod of the NodeFeature object.
NOTE: This flag takes precedence over the core.noOwnerRefs
configuration file option.
Default: false
Example:
nfd-worker -no-owner-refs
-oneshot
The -oneshot
flag causes nfd-worker to exit after one pass of feature detection.
Default: false
Example:
nfd-worker -oneshot -no-publish
-
Logging
The following logging-related flags are inherited from the klog package.
NOTE: The logger setup can also be specified via the core.klog
configuration file options. However, the command line flags take precedence over any corresponding config file options specified.
-add_dir_header
If true, adds the file directory to the header of the log messages.
Default: false
-alsologtostderr
Log to standard error as well as files.
Default: false
-log_backtrace_at
When logging hits line file:N, emit a stack trace.
Default: empty
-log_dir
If non-empty, write log files in this directory.
Default: empty
-log_file
If non-empty, use this log file.
Default: empty
-log_file_max_size
Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited.
Default: 1800
-logtostderr
Log to standard error instead of files
Default: true
-skip_headers
If true, avoid header prefixes in the log messages.
Default: false
-skip_log_headers
If true, avoid headers when opening log files.
Default: false
-stderrthreshold
Logs at or above this threshold go to stderr.
Default: 2
-v
Number for the log level verbosity.
Default: 0
-vmodule
Comma-separated list of pattern=N
settings for file-filtered logging.
Default: empty
Node Feature Discovery master
\ No newline at end of file
+
The following logging-related flags are inherited from the klog package.
NOTE: The logger setup can also be specified via the
core.klog
configuration file options. However, the command line flags take precedence over any corresponding config file options specified.
If true, adds the file directory to the header of the log messages.
Default: false
Log to standard error as well as files.
Default: false
When logging hits line file:N, emit a stack trace.
Default: empty
If non-empty, write log files in this directory.
Default: empty
If non-empty, use this log file.
Default: empty
Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited.
Default: 1800
Log to standard error instead of files
Default: true
If true, avoid header prefixes in the log messages.
Default: false
If true, avoid headers when opening log files.
Default: false
Logs at or above this threshold go to stderr.
Default: 2
Number for the log level verbosity.
Default: 0
Comma-separated list of pattern=N
settings for file-filtered logging.
Default: empty
See the sample configuration file for a full example configuration.
The core
section contains common configuration settings that are not specific to any particular feature source.
core.sleepInterval
specifies the interval between consecutive passes of feature (re-)detection, and thus also the interval between node re-labeling. A non-positive value implies infinite sleep interval, i.e. no re-detection or re-labeling is done.
Default: 60s
Example:
core:
+ Worker config reference · Node Feature Discovery
Configuration file reference of nfd-worker
Table of contents
See the sample configuration file for a full example configuration.
core
The core
section contains common configuration settings that are not specific to any particular feature source.
core.sleepInterval
core.sleepInterval
specifies the interval between consecutive passes of feature (re-)detection, and thus also the interval between node re-labeling. A non-positive value implies infinite sleep interval, i.e. no re-detection or re-labeling is done.
Default: 60s
Example:
core:
sleepInterval: 60s
core.featureSources
core.featureSources
specifies the list of enabled feature sources. A special value all
enables all sources. Prefixing a source name with -
indicates that the source will be disabled instead - this is only meaningful when used in conjunction with all
. This option allows completely disabling the feature detection so that neither standard feature labels are generated nor the raw feature data is available for custom rule processing.
Default: [all]
Example:
core:
# Enable all but cpu and local sources
@@ -67,4 +67,4 @@
matchExpressions:
class: {op: In, value: ["0200"]}
vendor: {op: In, value: ["8086"]}
-
Node Feature Discovery master
\ No newline at end of file
+
NFD uses some Kubernetes custom resources.
NodeFeature is an NFD-specific custom resource for communicating node features and node labeling requests. The nfd-master pod watches for NodeFeature objects, labels nodes as specified and uses the listed features as input when evaluating NodeFeatureRules. NodeFeature objects can be used for implementing 3rd party extensions (see customization guide for more details).
apiVersion: nfd.k8s-sigs.io/v1alpha1
+ CRDs · Node Feature Discovery
Custom Resources
Table of contents
NFD uses some Kubernetes custom resources.
NodeFeature
NodeFeature is an NFD-specific custom resource for communicating node features and node labeling requests. The nfd-master pod watches for NodeFeature objects, labels nodes as specified and uses the listed features as input when evaluating NodeFeatureRules. NodeFeature objects can be used for implementing 3rd party extensions (see customization guide for more details).
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeature
metadata:
labels:
@@ -88,4 +88,4 @@
capacity: 3
allocatable: 3
available: 3
-
The NodeResourceTopology objects created by NFD can be used to gain insight into the allocatable resources along with the granularity of those resources at a per-zone level (represented by node-0 and node-1 in the above example) or can be used by an external entity (e.g. topology-aware scheduler plugin) to take an action based on the gathered information.
Node Feature Discovery master
\ No newline at end of file
+
The NodeResourceTopology objects created by NFD can be used to gain insight into the allocatable resources along with the granularity of those resources at a per-zone level (represented by node-0 and node-1 in the above example) or can be used by an external entity (e.g. topology-aware scheduler plugin) to take an action based on the gathered information.
NFD provides multiple extension points for vendor and application specific labeling:
NodeFeature
objects can be used to communicate "raw" node features and node labeling requests to nfd-master.NodeFeatureRule
objects provide a way to deploy custom labeling rules via the Kubernetes API.local
feature source of nfd-worker creates labels by reading text files.custom
feature source of nfd-worker creates labels based on user-specified rules.NodeFeature objects provide a way for 3rd party extensions to advertise custom features, both as "raw" features that serve as input to NodeFeatureRule objects and as feature labels directly.
Note that RBAC rules must be created for each extension for them to be able to create and manipulate NodeFeature objects in their namespace.
Consider the following referential example:
apiVersion: nfd.k8s-sigs.io/v1alpha1
+ Customization guide · Node Feature Discovery
Customization guide
Table of contents
- Overview
- NodeFeature custom resource
- NodeFeatureRule custom resource
- NodeFeatureGroup custom resource
- Local feature source
- Custom feature source
- Node labels
- Feature rule format
Overview
NFD provides multiple extension points for vendor and application specific labeling:
NodeFeature
objects can be used to communicate "raw" node features and node labeling requests to nfd-master. NodeFeatureRule
objects provide a way to deploy custom labeling rules via the Kubernetes API. local
feature source of nfd-worker creates labels by reading text files. custom
feature source of nfd-worker creates labels based on user-specified rules.
NodeFeature custom resource
NodeFeature objects provide a way for 3rd party extensions to advertise custom features, both as "raw" features that serve as input to NodeFeatureRule objects and as feature labels directly.
Note that RBAC rules must be created for each extension for them to be able to create and manipulate NodeFeature objects in their namespace.
A NodeFeature example
Consider the following referential example:
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeature
metadata:
labels:
@@ -363,4 +363,4 @@
- pci.device:
vendor: "0fff"
device: "abcd"
-
Node Feature Discovery master
\ No newline at end of file
+
This page contains usage examples and demos.
A demo on the benefits of using node feature discovery can be found in the source code repository under demo/.
This page contains usage examples and demos.
A demo on the benefits of using node feature discovery can be found in the source code repository under demo/.
Features are advertised as labels in the Kubernetes Node object.
Label creation in nfd-worker is performed by a set of separate modules called label sources. The core.labelSources
configuration option (or -label-sources
flag) of nfd-worker controls which sources to enable for label generation.
All built-in labels use the feature.node.kubernetes.io
label namespace and have the following format.
feature.node.kubernetes.io/<feature> = <value>
-
NOTE: Consecutive runs of nfd-worker will update the labels on a given node. If features are not discovered on a consecutive run, the corresponding label will be removed. This includes any restrictions placed on the consecutive run, such as restricting discovered features with the
-label-whitelist
flag of nfd-master orcore.labelWhiteList
option of nfd-worker.
Feature name | Value | Description |
---|---|---|
cpu-cpuid.<cpuid-flag> | true | CPU capability is supported. NOTE: the capability might be supported but not enabled. |
cpu-cpuid.<cpuid-attribute> | string | CPU attribute value |
cpu-hardware_multithreading | true | Hardware multithreading, such as Intel HTT, enabled (number of logical CPUs is greater than physical CPUs) |
cpu-coprocessor.nx_gzip | true | Nest Accelerator for GZIP is supported(Power). |
cpu-power.sst_bf.enabled | true | Intel SST-BF (Intel Speed Select Technology - Base frequency) enabled |
cpu-pstate.status | string | The status of the Intel pstate driver when in use and enabled, either ‘active' or ‘passive'. |
cpu-pstate.turbo | bool | Set to ‘true' if turbo frequencies are enabled in Intel pstate driver, set to ‘false' if they have been disabled. |
cpu-pstate.scaling_governor | string | The value of the Intel pstate scaling_governor when in use, either ‘powersave' or ‘performance'. |
cpu-cstate.enabled | bool | Set to ‘true' if cstates are set in the intel_idle driver, otherwise set to ‘false'. Unset if intel_idle cpuidle driver is not active. |
cpu-security.sgx.enabled | true | Set to ‘true' if Intel SGX is enabled in BIOS (based on a non-zero sum value of SGX EPC section sizes). |
cpu-security.se.enabled | true | Set to ‘true' if IBM Secure Execution for Linux (IBM Z & LinuxONE) is available and enabled (requires /sys/firmware/uv/prot_virt_host facility) |
cpu-security.tdx.enabled | true | Set to ‘true' if Intel TDX is available on the host and has been enabled (requires /sys/module/kvm_intel/parameters/tdx ). |
cpu-security.tdx.protected | true | Set to ‘true' if Intel TDX was used to start the guest node, based on the existence of the "TDX_GUEST" information as part of cpuid features. |
cpu-security.sev.enabled | true | Set to ‘true' if ADM SEV is available on the host and has been enabled (requires /sys/module/kvm_amd/parameters/sev ). |
cpu-security.sev.es.enabled | true | Set to ‘true' if ADM SEV-ES is available on the host and has been enabled (requires /sys/module/kvm_amd/parameters/sev_es ). |
cpu-security.sev.snp.enabled | true | Set to ‘true' if ADM SEV-SNP is available on the host and has been enabled (requires /sys/module/kvm_amd/parameters/sev_snp ). |
cpu-model.vendor_id | string | Comparable CPU vendor ID. |
cpu-model.family | int | CPU family. |
cpu-model.id | int | CPU model number. |
The CPU label source is configurable, see worker configuration and sources.cpu
configuration options for details.
Flag | Description |
---|---|
ADX | Multi-Precision Add-Carry Instruction Extensions (ADX) |
AESNI | Advanced Encryption Standard (AES) New Instructions (AES-NI) |
APX_F | Intel Advanced Performance Extensions (APX) |
AVX10 | Intel Advanced Vector Extensions 10 (AVX10) |
AVX10_256, AVX10_512 | Intel AVX10 256-bit and 512-bit vector support |
AVX | Advanced Vector Extensions (AVX) |
AVX2 | Advanced Vector Extensions 2 (AVX2) |
AVXIFMA | AVX-IFMA instructions |
AVXVNNI | AVX (VEX encoded) VNNI neural network instructions |
AMXBF16 | Advanced Matrix Extension, tile multiplication operations on BFLOAT16 numbers |
AMXINT8 | Advanced Matrix Extension, tile multiplication operations on 8-bit integers |
AMXFP16 | Advanced Matrix Extension, tile multiplication operations on FP16 numbers |
AMXFP8 | Advanced Matrix Extension, tile multiplication operations on FP8 numbers |
AMXTILE | Advanced Matrix Extension, base tile architecture support |
AVX512BF16 | AVX-512 BFLOAT16 instructions |
AVX512BITALG | AVX-512 bit Algorithms |
AVX512BW | AVX-512 byte and word Instructions |
AVX512CD | AVX-512 conflict detection instructions |
AVX512DQ | AVX-512 doubleword and quadword instructions |
AVX512ER | AVX-512 exponential and reciprocal instructions |
AVX512F | AVX-512 foundation |
AVX512FP16 | AVX-512 FP16 instructions |
AVX512IFMA | AVX-512 integer fused multiply-add instructions |
AVX512PF | AVX-512 prefetch instructions |
AVX512VBMI | AVX-512 vector bit manipulation instructions |
AVX512VBMI2 | AVX-512 vector bit manipulation instructions, version 2 |
AVX512VL | AVX-512 vector length extensions |
AVX512VNNI | AVX-512 vector neural network instructions |
AVX512VP2INTERSECT | AVX-512 intersect for D/Q |
AVX512VPOPCNTDQ | AVX-512 vector population count doubleword and quadword |
AVXNECONVERT | AVX-NE-CONVERT instructions |
AVXVNNIINT8 | AVX-VNNI-INT8 instructions |
AVXVNNIINT16 | AVX-VNNI-INT16 instructions |
CMPCCXADD | CMPCCXADD instructions |
ENQCMD | Enqueue Command |
GFNI | Galois Field New Instructions |
HYPERVISOR | Running under hypervisor |
MSRLIST | Read/Write List of Model Specific Registers |
PREFETCHI | PREFETCHIT0/1 instructions |
VAES | AVX-512 vector AES instructions |
VPCLMULQDQ | Carry-less multiplication quadword |
WRMSRNS | Non-Serializing Write to Model Specific Register |
By default, the following CPUID flags have been blacklisted: AVX10 (use AVX10_VERSION instead), BMI1, BMI2, CLMUL, CMOV, CX16, ERMS, F16C, HTT, LZCNT, MMX, MMXEXT, NX, POPCNT, RDRAND, RDSEED, RDTSCP, SGX, SSE, SSE2, SSE3, SSE4, SSE42, SSSE3 and TDX_GUEST. See sources.cpu
configuration options to change the behavior.
See the full list in github.com/klauspost/cpuid.
Attribute | Description |
---|---|
AVX10_VERSION | AVX10 vector ISA version (if supported) |
Flag | Description |
---|---|
IDIVA | Integer divide instructions available in ARM mode |
IDIVT | Integer divide instructions available in Thumb mode |
THUMB | Thumb instructions |
FASTMUL | Fast multiplication |
VFP | Vector floating point instruction extension (VFP) |
VFPv3 | Vector floating point extension v3 |
VFPv4 | Vector floating point extension v4 |
VFPD32 | VFP with 32 D-registers |
HALF | Half-word loads and stores |
EDSP | DSP extensions |
NEON | NEON SIMD instructions |
LPAE | Large Physical Address Extensions |
Flag | Description |
---|---|
AES | Announcing the Advanced Encryption Standard |
EVSTRM | Event Stream Frequency Features |
FPHP | Half Precision(16bit) Floating Point Data Processing Instructions |
ASIMDHP | Half Precision(16bit) Asimd Data Processing Instructions |
ATOMICS | Atomic Instructions to the A64 |
ASIMRDM | Support for Rounding Double Multiply Add/Subtract |
PMULL | Optional Cryptographic and CRC32 Instructions |
JSCVT | Perform Conversion to Match Javascript |
DCPOP | Persistent Memory Support |
Feature | Value | Description |
---|---|---|
kernel-config.<option> | true | Kernel config option is enabled (set ‘y' or ‘m'). Default options are NO_HZ , NO_HZ_IDLE , NO_HZ_FULL and PREEMPT |
kernel-selinux.enabled | true | Selinux is enabled on the node |
kernel-version.full | string | Full kernel version as reported by /proc/sys/kernel/osrelease (e.g. ‘4.5.6-7-g123abcde') |
kernel-version.major | string | First component of the kernel version (e.g. ‘4') |
kernel-version.minor | string | Second component of the kernel version (e.g. ‘5') |
kernel-version.revision | string | Third component of the kernel version (e.g. ‘6') |
The kernel label source is configurable, see worker configuration and sources.kernel
configuration options for details.
Feature | Value | Description |
---|---|---|
memory-numa | true | Multiple memory nodes i.e. NUMA architecture detected |
memory-nv.present | true | NVDIMM device(s) are present |
memory-nv.dax | true | NVDIMM region(s) configured in DAX mode are present |
memory-swap.enabled | true | Swap is enabled on the node |
Feature | Value | Description |
---|---|---|
network-sriov.capable | true | Single Root Input/Output Virtualization (SR-IOV) enabled Network Interface Card(s) present |
network-sriov.configured | true | SR-IOV virtual functions have been configured |
Feature | Value | Description |
---|---|---|
pci-<device label>.present | true | PCI device is detected |
pci-<device label>.sriov.capable | true | Single Root Input/Output Virtualization (SR-IOV) enabled PCI device present |
<device label>
is format is configurable and set to <class>_<vendor>
by default. For more more details about configuration of the pci labels, see sources.pci
options and worker configuration instructions.
Feature | Value | Description |
---|---|---|
usb-<device label>.present | true | USB device is detected |
<device label>
is format is configurable and set to <class>_<vendor>_<device>
by default. For more more details about configuration of the usb labels, see sources.usb
options and worker configuration instructions.
Feature | Value | Description |
---|---|---|
storage-nonrotationaldisk | true | Non-rotational disk, like SSD, is present in the node |
Feature | Value | Description |
---|---|---|
system-os_release.ID | string | Operating system identifier |
system-os_release.VERSION_ID | string | Operating system version identifier (e.g. ‘6.7') |
system-os_release.VERSION_ID.major | string | First component of the OS version id (e.g. ‘6') |
system-os_release.VERSION_ID.minor | string | Second component of the OS version id (e.g. ‘7') |
The custom label source is designed for creating user defined labels. However, it has a few statically defined built-in labels:
Feature | Value | Description |
---|---|---|
custom-rdma.capable | true | The node has an RDMA capable Network adapter |
custom-rdma.enabled | true | The node has the needed RDMA modules loaded to run RDMA traffic |
NFD has many extension points for creating vendor and application specific labels. See the customization guide for detailed documentation.
NFD is able to create extended resources, see the NodeFeatureRule CRD and its extendedResources field for more details.
Note that NFD is not a replacement for the usage of device plugins.
An example use-case for extended resources could be based on custom feature (created e.g. with feature files that exposes the node SGX EPC memory section size. This value will then be turned into an extended resource of the node, allowing PODs to request that resource and the Kubernetes scheduler to schedule such PODs to only those nodes which have a sufficient capacity of said resource left.
Features are advertised as labels in the Kubernetes Node object.
Label creation in nfd-worker is performed by a set of separate modules called label sources. The core.labelSources
configuration option (or -label-sources
flag) of nfd-worker controls which sources to enable for label generation.
All built-in labels use the feature.node.kubernetes.io
label namespace and have the following format.
feature.node.kubernetes.io/<feature> = <value>
+
NOTE: Consecutive runs of nfd-worker will update the labels on a given node. If features are not discovered on a consecutive run, the corresponding label will be removed. This includes any restrictions placed on the consecutive run, such as restricting discovered features with the
-label-whitelist
flag of nfd-master orcore.labelWhiteList
option of nfd-worker.
Feature name | Value | Description |
---|---|---|
cpu-cpuid.<cpuid-flag> | true | CPU capability is supported. NOTE: the capability might be supported but not enabled. |
cpu-cpuid.<cpuid-attribute> | string | CPU attribute value |
cpu-hardware_multithreading | true | Hardware multithreading, such as Intel HTT, enabled (number of logical CPUs is greater than physical CPUs) |
cpu-coprocessor.nx_gzip | true | Nest Accelerator for GZIP is supported(Power). |
cpu-power.sst_bf.enabled | true | Intel SST-BF (Intel Speed Select Technology - Base frequency) enabled |
cpu-pstate.status | string | The status of the Intel pstate driver when in use and enabled, either ‘active' or ‘passive'. |
cpu-pstate.turbo | bool | Set to ‘true' if turbo frequencies are enabled in Intel pstate driver, set to ‘false' if they have been disabled. |
cpu-pstate.scaling_governor | string | The value of the Intel pstate scaling_governor when in use, either ‘powersave' or ‘performance'. |
cpu-cstate.enabled | bool | Set to ‘true' if cstates are set in the intel_idle driver, otherwise set to ‘false'. Unset if intel_idle cpuidle driver is not active. |
cpu-security.sgx.enabled | true | Set to ‘true' if Intel SGX is enabled in BIOS (based on a non-zero sum value of SGX EPC section sizes). |
cpu-security.se.enabled | true | Set to ‘true' if IBM Secure Execution for Linux (IBM Z & LinuxONE) is available and enabled (requires /sys/firmware/uv/prot_virt_host facility) |
cpu-security.tdx.enabled | true | Set to ‘true' if Intel TDX is available on the host and has been enabled (requires /sys/module/kvm_intel/parameters/tdx ). |
cpu-security.tdx.protected | true | Set to ‘true' if Intel TDX was used to start the guest node, based on the existence of the "TDX_GUEST" information as part of cpuid features. |
cpu-security.sev.enabled | true | Set to ‘true' if ADM SEV is available on the host and has been enabled (requires /sys/module/kvm_amd/parameters/sev ). |
cpu-security.sev.es.enabled | true | Set to ‘true' if ADM SEV-ES is available on the host and has been enabled (requires /sys/module/kvm_amd/parameters/sev_es ). |
cpu-security.sev.snp.enabled | true | Set to ‘true' if ADM SEV-SNP is available on the host and has been enabled (requires /sys/module/kvm_amd/parameters/sev_snp ). |
cpu-model.vendor_id | string | Comparable CPU vendor ID. |
cpu-model.family | int | CPU family. |
cpu-model.id | int | CPU model number. |
The CPU label source is configurable, see worker configuration and sources.cpu
configuration options for details.
Flag | Description |
---|---|
ADX | Multi-Precision Add-Carry Instruction Extensions (ADX) |
AESNI | Advanced Encryption Standard (AES) New Instructions (AES-NI) |
APX_F | Intel Advanced Performance Extensions (APX) |
AVX10 | Intel Advanced Vector Extensions 10 (AVX10) |
AVX10_256, AVX10_512 | Intel AVX10 256-bit and 512-bit vector support |
AVX | Advanced Vector Extensions (AVX) |
AVX2 | Advanced Vector Extensions 2 (AVX2) |
AVXIFMA | AVX-IFMA instructions |
AVXVNNI | AVX (VEX encoded) VNNI neural network instructions |
AMXBF16 | Advanced Matrix Extension, tile multiplication operations on BFLOAT16 numbers |
AMXINT8 | Advanced Matrix Extension, tile multiplication operations on 8-bit integers |
AMXFP16 | Advanced Matrix Extension, tile multiplication operations on FP16 numbers |
AMXFP8 | Advanced Matrix Extension, tile multiplication operations on FP8 numbers |
AMXTILE | Advanced Matrix Extension, base tile architecture support |
AVX512BF16 | AVX-512 BFLOAT16 instructions |
AVX512BITALG | AVX-512 bit Algorithms |
AVX512BW | AVX-512 byte and word Instructions |
AVX512CD | AVX-512 conflict detection instructions |
AVX512DQ | AVX-512 doubleword and quadword instructions |
AVX512ER | AVX-512 exponential and reciprocal instructions |
AVX512F | AVX-512 foundation |
AVX512FP16 | AVX-512 FP16 instructions |
AVX512IFMA | AVX-512 integer fused multiply-add instructions |
AVX512PF | AVX-512 prefetch instructions |
AVX512VBMI | AVX-512 vector bit manipulation instructions |
AVX512VBMI2 | AVX-512 vector bit manipulation instructions, version 2 |
AVX512VL | AVX-512 vector length extensions |
AVX512VNNI | AVX-512 vector neural network instructions |
AVX512VP2INTERSECT | AVX-512 intersect for D/Q |
AVX512VPOPCNTDQ | AVX-512 vector population count doubleword and quadword |
AVXNECONVERT | AVX-NE-CONVERT instructions |
AVXVNNIINT8 | AVX-VNNI-INT8 instructions |
AVXVNNIINT16 | AVX-VNNI-INT16 instructions |
CMPCCXADD | CMPCCXADD instructions |
ENQCMD | Enqueue Command |
GFNI | Galois Field New Instructions |
HYPERVISOR | Running under hypervisor |
MSRLIST | Read/Write List of Model Specific Registers |
PREFETCHI | PREFETCHIT0/1 instructions |
VAES | AVX-512 vector AES instructions |
VPCLMULQDQ | Carry-less multiplication quadword |
WRMSRNS | Non-Serializing Write to Model Specific Register |
By default, the following CPUID flags have been blacklisted: AVX10 (use AVX10_VERSION instead), BMI1, BMI2, CLMUL, CMOV, CX16, ERMS, F16C, HTT, LZCNT, MMX, MMXEXT, NX, POPCNT, RDRAND, RDSEED, RDTSCP, SGX, SSE, SSE2, SSE3, SSE4, SSE42, SSSE3 and TDX_GUEST. See sources.cpu
configuration options to change the behavior.
See the full list in github.com/klauspost/cpuid.
Attribute | Description |
---|---|
AVX10_VERSION | AVX10 vector ISA version (if supported) |
Flag | Description |
---|---|
IDIVA | Integer divide instructions available in ARM mode |
IDIVT | Integer divide instructions available in Thumb mode |
THUMB | Thumb instructions |
FASTMUL | Fast multiplication |
VFP | Vector floating point instruction extension (VFP) |
VFPv3 | Vector floating point extension v3 |
VFPv4 | Vector floating point extension v4 |
VFPD32 | VFP with 32 D-registers |
HALF | Half-word loads and stores |
EDSP | DSP extensions |
NEON | NEON SIMD instructions |
LPAE | Large Physical Address Extensions |
Flag | Description |
---|---|
AES | Announcing the Advanced Encryption Standard |
EVSTRM | Event Stream Frequency Features |
FPHP | Half Precision(16bit) Floating Point Data Processing Instructions |
ASIMDHP | Half Precision(16bit) Asimd Data Processing Instructions |
ATOMICS | Atomic Instructions to the A64 |
ASIMRDM | Support for Rounding Double Multiply Add/Subtract |
PMULL | Optional Cryptographic and CRC32 Instructions |
JSCVT | Perform Conversion to Match Javascript |
DCPOP | Persistent Memory Support |
Feature | Value | Description |
---|---|---|
kernel-config.<option> | true | Kernel config option is enabled (set ‘y' or ‘m'). Default options are NO_HZ , NO_HZ_IDLE , NO_HZ_FULL and PREEMPT |
kernel-selinux.enabled | true | Selinux is enabled on the node |
kernel-version.full | string | Full kernel version as reported by /proc/sys/kernel/osrelease (e.g. ‘4.5.6-7-g123abcde') |
kernel-version.major | string | First component of the kernel version (e.g. ‘4') |
kernel-version.minor | string | Second component of the kernel version (e.g. ‘5') |
kernel-version.revision | string | Third component of the kernel version (e.g. ‘6') |
The kernel label source is configurable, see worker configuration and sources.kernel
configuration options for details.
Feature | Value | Description |
---|---|---|
memory-numa | true | Multiple memory nodes i.e. NUMA architecture detected |
memory-nv.present | true | NVDIMM device(s) are present |
memory-nv.dax | true | NVDIMM region(s) configured in DAX mode are present |
memory-swap.enabled | true | Swap is enabled on the node |
Feature | Value | Description |
---|---|---|
network-sriov.capable | true | Single Root Input/Output Virtualization (SR-IOV) enabled Network Interface Card(s) present |
network-sriov.configured | true | SR-IOV virtual functions have been configured |
Feature | Value | Description |
---|---|---|
pci-<device label>.present | true | PCI device is detected |
pci-<device label>.sriov.capable | true | Single Root Input/Output Virtualization (SR-IOV) enabled PCI device present |
<device label>
is format is configurable and set to <class>_<vendor>
by default. For more more details about configuration of the pci labels, see sources.pci
options and worker configuration instructions.
Feature | Value | Description |
---|---|---|
usb-<device label>.present | true | USB device is detected |
<device label>
is format is configurable and set to <class>_<vendor>_<device>
by default. For more more details about configuration of the usb labels, see sources.usb
options and worker configuration instructions.
Feature | Value | Description |
---|---|---|
storage-nonrotationaldisk | true | Non-rotational disk, like SSD, is present in the node |
Feature | Value | Description |
---|---|---|
system-os_release.ID | string | Operating system identifier |
system-os_release.VERSION_ID | string | Operating system version identifier (e.g. ‘6.7') |
system-os_release.VERSION_ID.major | string | First component of the OS version id (e.g. ‘6') |
system-os_release.VERSION_ID.minor | string | Second component of the OS version id (e.g. ‘7') |
The custom label source is designed for creating user defined labels. However, it has a few statically defined built-in labels:
Feature | Value | Description |
---|---|---|
custom-rdma.capable | true | The node has an RDMA capable Network adapter |
custom-rdma.enabled | true | The node has the needed RDMA modules loaded to run RDMA traffic |
NFD has many extension points for creating vendor and application specific labels. See the customization guide for detailed documentation.
NFD is able to create extended resources, see the NodeFeatureRule CRD and its extendedResources field for more details.
Note that NFD is not a replacement for the usage of device plugins.
An example use-case for extended resources could be based on custom feature (created e.g. with feature files that exposes the node SGX EPC memory section size. This value will then be turned into an extended resource of the node, allowing PODs to request that resource and the Kubernetes scheduler to schedule such PODs to only those nodes which have a sufficient capacity of said resource left.
Image Compatibility is in the experimental v1alpha1
version.
Image compatibility metadata enables container image authors to define their image requirements using Node Feature Rules. This complementary solution allows features discovered on nodes to be matched directly from images. As a result, container requirements become discoverable and programmable, supporting various consumers and use cases where applications need a specific compatible environment.
The compatibility specification is a list of compatibility objects that contain Node Feature Rules, along with additional fields to control the execution of validation between the image and the host.
version - string
This REQUIRED property specifies the version of the API in use.
compatibilities - array of object
This REQUIRED property is a list of compatibility sets.
rules - object
This REQUIRED property is a reference to the spec of the NodeFeatureRule API. The spec allows image requirements to be described using the features discovered from NFD sources. For more details, please refer to the documentation.
weight - int
This OPTIONAL property specifies the node affinity weight.
tag - string
This OPTIONAL property allows for the grouping or separation of compatibility sets.
description - string
This OPTIONAL property provides a brief description of a compatibility set.
version: v1alpha1
+ Image Compatibility Artifact · Node Feature Discovery
Image Compatibility Artifact
Table of contents
Image Compatibility
Image Compatibility is in the experimental v1alpha1
version.
Image compatibility metadata enables container image authors to define their image requirements using Node Feature Rules. This complementary solution allows features discovered on nodes to be matched directly from images. As a result, container requirements become discoverable and programmable, supporting various consumers and use cases where applications need a specific compatible environment.
Compatibility Specification
The compatibility specification is a list of compatibility objects that contain Node Feature Rules, along with additional fields to control the execution of validation between the image and the host.
Schema
-
version - string
This REQUIRED property specifies the version of the API in use.
-
compatibilities - array of object
This REQUIRED property is a list of compatibility sets.
-
rules - object
This REQUIRED property is a reference to the spec of the NodeFeatureRule API. The spec allows image requirements to be described using the features discovered from NFD sources. For more details, please refer to the documentation.
-
weight - int
This OPTIONAL property specifies the node affinity weight.
-
tag - string
This OPTIONAL property allows for the grouping or separation of compatibility sets.
-
description - string
This OPTIONAL property provides a brief description of a compatibility set.
Example
version: v1alpha1
compatibilities:
- description: "My image requirements"
rules:
@@ -117,4 +117,4 @@
path: "<path-to-registry-public-certs>"
type: ""
name: certs
-
Node Feature Discovery master
\ No newline at end of file
+
Usage instructions.
Usage instructions.
Developer Preview This feature is currently in developer preview and subject to change. It is not recommended to use it in production environments.
The kubectl
plugin kubectl nfd
can be used to validate/dryrun and test NodeFeatureRule objects. It can be installed with the following command:
git clone https://github.com/kubernetes-sigs/node-feature-discovery
+ Kubectl plugin · Node Feature Discovery
Kubectl plugin
Table of contents
Developer Preview This feature is currently in developer preview and subject to change. It is not recommended to use it in production environments.
Overview
The kubectl
plugin kubectl nfd
can be used to validate/dryrun and test NodeFeatureRule objects. It can be installed with the following command:
git clone https://github.com/kubernetes-sigs/node-feature-discovery
cd node-feature-discovery
make build-kubectl-nfd
KUBECTL_PATH=/usr/local/bin/
@@ -13,4 +13,4 @@
*** Labels ***
vendor.io/my-sample-feature=true
NodeFeatureRule "examples/nodefeaturerule.yaml" is valid for NodeFeature "examples/nodefeature.yaml"
-
Node Feature Discovery master
\ No newline at end of file
+
NFD-GC (NFD Garbage-Collector) is preferably run as a Kubernetes deployment with one replica. It makes sure that all NodeFeature and NodeResourceTopology objects have corresponding nodes and removes stale objects for non-existent nodes.
The daemon watches for Node deletion events and removes NodeFeature and NodeResourceTopology objects upon them. It also runs periodically to make sure no node delete event was missed and to remove any NodeFeature or NodeResourceTopology objects that were created without corresponding node. The default garbage collector interval is set to 1h which is the value when no -gc-interval is specified.
In Helm deployments see garbage collector parameters for altering the nfd-gc configuration.
NFD-GC (NFD Garbage-Collector) is preferably run as a Kubernetes deployment with one replica. It makes sure that all NodeFeature and NodeResourceTopology objects have corresponding nodes and removes stale objects for non-existent nodes.
The daemon watches for Node deletion events and removes NodeFeature and NodeResourceTopology objects upon them. It also runs periodically to make sure no node delete event was missed and to remove any NodeFeature or NodeResourceTopology objects that were created without corresponding node. The default garbage collector interval is set to 1h which is the value when no -gc-interval is specified.
In Helm deployments see garbage collector parameters for altering the nfd-gc configuration.
NFD-Master is responsible for connecting to the Kubernetes API server and updating node objects. More specifically, it modifies node labels, taints and extended resources based on requests from nfd-workers and 3rd party extensions.
The NodeFeature Controller uses NodeFeature objects as the input for the NodeFeatureRule processing pipeline. In addition, any labels listed in the NodeFeature object are created on the node (note the allowed label namespaces are controlled).
NFD-Master acts as the controller for NodeFeatureRule objects. It applies the rules specified in NodeFeatureRule objects on raw feature data and creates node labels accordingly. The feature data used as the input is received from nfd-worker instances through NodeFeature objects.
NFD-Master supports configuration through a configuration file. The default location is /etc/kubernetes/node-feature-discovery/nfd-master.conf
, but, this can be changed by specifying the-config
command line flag.
Master configuration file is read inside the container, and thus, Volumes and VolumeMounts are needed to make your configuration available for NFD. The preferred method is to use a ConfigMap which provides easy deployment and re-configurability.
The provided deployment methods (Helm and Kustomize) create an empty configmap and mount it inside the nfd-master containers.
In Helm deployments, Master pod parameter master.config
can be used to edit the respective configuration.
In Kustomize deployments, modify the nfd-master-conf
ConfigMap with a custom overlay.
NOTE: dynamic run-time reconfiguration was dropped in NFD v0.17. Re-configuration is handled by pod restarts.
See nfd-master configuration file reference for more details. The (empty-by-default) example config contains all available configuration options and can be used as a reference for creating a configuration.
NFD-Master runs as a deployment, by default it prefers running on the cluster's master nodes but will run on worker nodes if no master nodes are found.
For High Availability, you should increase the replica count of the deployment object. You should also look into adding inter-pod affinity to prevent masters from running on the same node. However note that inter-pod affinity is costly and is not recommended in bigger clusters.
Note: When NFD-Master is intended to run with more than one replica, it is advised to use
-enable-leader-election
flag. This flag turns on leader election for NFD-Master and let only one replica to act on changes in NodeFeature and NodeFeatureRule objects.
If you have RBAC authorization enabled (as is the default e.g. with clusters initialized with kubeadm) you need to configure the appropriate ClusterRoles, ClusterRoleBindings and a ServiceAccount for NFD to create node labels. The provided template will configure these for you.
NFD-Master is responsible for connecting to the Kubernetes API server and updating node objects. More specifically, it modifies node labels, taints and extended resources based on requests from nfd-workers and 3rd party extensions.
The NodeFeature Controller uses NodeFeature objects as the input for the NodeFeatureRule processing pipeline. In addition, any labels listed in the NodeFeature object are created on the node (note the allowed label namespaces are controlled).
NFD-Master acts as the controller for NodeFeatureRule objects. It applies the rules specified in NodeFeatureRule objects on raw feature data and creates node labels accordingly. The feature data used as the input is received from nfd-worker instances through NodeFeature objects.
NFD-Master supports configuration through a configuration file. The default location is /etc/kubernetes/node-feature-discovery/nfd-master.conf
, but, this can be changed by specifying the-config
command line flag.
Master configuration file is read inside the container, and thus, Volumes and VolumeMounts are needed to make your configuration available for NFD. The preferred method is to use a ConfigMap which provides easy deployment and re-configurability.
The provided deployment methods (Helm and Kustomize) create an empty configmap and mount it inside the nfd-master containers.
In Helm deployments, Master pod parameter master.config
can be used to edit the respective configuration.
In Kustomize deployments, modify the nfd-master-conf
ConfigMap with a custom overlay.
NOTE: dynamic run-time reconfiguration was dropped in NFD v0.17. Re-configuration is handled by pod restarts.
See nfd-master configuration file reference for more details. The (empty-by-default) example config contains all available configuration options and can be used as a reference for creating a configuration.
NFD-Master runs as a deployment, by default it prefers running on the cluster's master nodes but will run on worker nodes if no master nodes are found.
For High Availability, you should increase the replica count of the deployment object. You should also look into adding inter-pod affinity to prevent masters from running on the same node. However note that inter-pod affinity is costly and is not recommended in bigger clusters.
Note: When NFD-Master is intended to run with more than one replica, it is advised to use
-enable-leader-election
flag. This flag turns on leader election for NFD-Master and let only one replica to act on changes in NodeFeature and NodeFeatureRule objects.
If you have RBAC authorization enabled (as is the default e.g. with clusters initialized with kubeadm) you need to configure the appropriate ClusterRoles, ClusterRoleBindings and a ServiceAccount for NFD to create node labels. The provided template will configure these for you.
NFD-Topology-Updater is preferably run as a Kubernetes DaemonSet. This assures re-examination on regular intervals and/or per pod life-cycle events, capturing changes in the allocated resources and hence the allocatable resources on a per-zone basis by updating NodeResourceTopology custom resources. It makes sure that new NodeResourceTopology instances are created for each new nodes that get added to the cluster.
Because of the design and implementation of Kubernetes, only resources exclusively allocated to Guaranteed Quality of Service pods will be accounted. This includes CPU cores, memory and devices.
When run as a daemonset, nodes are re-examined for the allocated resources (to determine the information of the allocatable resources on a per-zone basis where a zone can be a NUMA node) at an interval specified using the -sleep-interval
option. The default sleep interval is set to 60s which is the value when no -sleep-interval is specified. The re-examination can be disabled by setting the sleep-interval to 0.
Another option is to configure the updater to update the allocated resources per pod life-cycle events. The updater will monitor the checkpoint file stated in -kubelet-state-dir
and triggers an update for every change occurs in the files.
In addition, it can avoid examining specific allocated resources given a configuration of resources to exclude via -excludeList
Kubelet PodResource API with the GetAllocatableResources functionality enabled is a prerequisite for nfd-topology-updater to be able to run (i.e. Kubernetes v1.21 or later is required).
Preceding Kubernetes v1.23, the kubelet
must be started with --feature-gates=KubeletPodResourcesGetAllocatable=true
.
Starting from Kubernetes v1.23, the KubeletPodResourcesGetAllocatable
feature gate. is enabled by default
NFD-Topology-Updater supports configuration through a configuration file. The default location is /etc/kubernetes/node-feature-discovery/topology-updater.conf
, but, this can be changed by specifying the-config
command line flag.
Topology-Updater configuration file is read inside the container, and thus, Volumes and VolumeMounts are needed to make your configuration available for NFD. The preferred method is to use a ConfigMap which provides easy deployment and re-configurability.
The provided deployment templates create an empty configmap and mount it inside the nfd-topology-updater containers.
In Helm deployments, Topology Updater parameters toplogyUpdater.config
can be used to edit the respective configuration.
In Kustomize deployments, modify the nfd-worker-conf
ConfigMap with a custom overlay.
See nfd-topology-updater configuration file reference for more details. The (empty-by-default) example config contains all available configuration options and can be used as a reference for creating a configuration.
NFD-Topology-Updater is preferably run as a Kubernetes DaemonSet. This assures re-examination on regular intervals and/or per pod life-cycle events, capturing changes in the allocated resources and hence the allocatable resources on a per-zone basis by updating NodeResourceTopology custom resources. It makes sure that new NodeResourceTopology instances are created for each new nodes that get added to the cluster.
Because of the design and implementation of Kubernetes, only resources exclusively allocated to Guaranteed Quality of Service pods will be accounted. This includes CPU cores, memory and devices.
When run as a daemonset, nodes are re-examined for the allocated resources (to determine the information of the allocatable resources on a per-zone basis where a zone can be a NUMA node) at an interval specified using the -sleep-interval
option. The default sleep interval is set to 60s which is the value when no -sleep-interval is specified. The re-examination can be disabled by setting the sleep-interval to 0.
Another option is to configure the updater to update the allocated resources per pod life-cycle events. The updater will monitor the checkpoint file stated in -kubelet-state-dir
and triggers an update for every change occurs in the files.
In addition, it can avoid examining specific allocated resources given a configuration of resources to exclude via -excludeList
Kubelet PodResource API with the GetAllocatableResources functionality enabled is a prerequisite for nfd-topology-updater to be able to run (i.e. Kubernetes v1.21 or later is required).
Preceding Kubernetes v1.23, the kubelet
must be started with --feature-gates=KubeletPodResourcesGetAllocatable=true
.
Starting from Kubernetes v1.23, the KubeletPodResourcesGetAllocatable
feature gate. is enabled by default
NFD-Topology-Updater supports configuration through a configuration file. The default location is /etc/kubernetes/node-feature-discovery/topology-updater.conf
, but, this can be changed by specifying the-config
command line flag.
Topology-Updater configuration file is read inside the container, and thus, Volumes and VolumeMounts are needed to make your configuration available for NFD. The preferred method is to use a ConfigMap which provides easy deployment and re-configurability.
The provided deployment templates create an empty configmap and mount it inside the nfd-topology-updater containers.
In Helm deployments, Topology Updater parameters toplogyUpdater.config
can be used to edit the respective configuration.
In Kustomize deployments, modify the nfd-worker-conf
ConfigMap with a custom overlay.
See nfd-topology-updater configuration file reference for more details. The (empty-by-default) example config contains all available configuration options and can be used as a reference for creating a configuration.
NFD-Worker is preferably run as a Kubernetes DaemonSet. This assures re-labeling on regular intervals capturing changes in the system configuration and makes sure that new nodes are labeled as they are added to the cluster. Worker connects to the nfd-master service to advertise hardware features.
When run as a daemonset, nodes are re-labeled at an default interval of 60s. This can be changed by using the core.sleepInterval
config option.
NFD-Worker supports configuration through a configuration file. The default location is /etc/kubernetes/node-feature-discovery/nfd-worker.conf
, but, this can be changed by specifying the-config
command line flag. Configuration file is re-read whenever it is modified which makes run-time re-configuration of nfd-worker straightforward.
Worker configuration file is read inside the container, and thus, Volumes and VolumeMounts are needed to make your configuration available for NFD. The preferred method is to use a ConfigMap which provides easy deployment and re-configurability.
The provided deployment methods (Helm and Kustomize) create an empty configmap and mount it inside the nfd-master containers.
In Helm deployments, Worker pod parameter worker.config
can be used to edit the respective configuration.
In Kustomize deployments, modify the nfd-worker-conf
ConfigMap with a custom overlay.
NOTE: dynamic run-time reconfiguration was dropped in NFD v0.17. Re-configuration is handled by pod restarts.
See nfd-worker configuration file reference for more details. The (empty-by-default) example config contains all available configuration options and can be used as a reference for creating a configuration.
Configuration options can also be specified via the -options
command line flag, in which case no mounts need to be used. The same format as in the config file must be used, i.e. JSON (or YAML). For example:
-options='{"sources": { "pci": { "deviceClassWhitelist": ["12"] } } }'
-
Configuration options specified from the command line will override those read from the config file.
NFD-Worker is preferably run as a Kubernetes DaemonSet. This assures re-labeling on regular intervals capturing changes in the system configuration and makes sure that new nodes are labeled as they are added to the cluster. Worker connects to the nfd-master service to advertise hardware features.
When run as a daemonset, nodes are re-labeled at an default interval of 60s. This can be changed by using the core.sleepInterval
config option.
NFD-Worker supports configuration through a configuration file. The default location is /etc/kubernetes/node-feature-discovery/nfd-worker.conf
, but, this can be changed by specifying the-config
command line flag. Configuration file is re-read whenever it is modified which makes run-time re-configuration of nfd-worker straightforward.
Worker configuration file is read inside the container, and thus, Volumes and VolumeMounts are needed to make your configuration available for NFD. The preferred method is to use a ConfigMap which provides easy deployment and re-configurability.
The provided deployment methods (Helm and Kustomize) create an empty configmap and mount it inside the nfd-master containers.
In Helm deployments, Worker pod parameter worker.config
can be used to edit the respective configuration.
In Kustomize deployments, modify the nfd-worker-conf
ConfigMap with a custom overlay.
NOTE: dynamic run-time reconfiguration was dropped in NFD v0.17. Re-configuration is handled by pod restarts.
See nfd-worker configuration file reference for more details. The (empty-by-default) example config contains all available configuration options and can be used as a reference for creating a configuration.
Configuration options can also be specified via the -options
command line flag, in which case no mounts need to be used. The same format as in the config file must be used, i.e. JSON (or YAML). For example:
-options='{"sources": { "pci": { "deviceClassWhitelist": ["12"] } } }'
+
Configuration options specified from the command line will override those read from the config file.
Nodes with specific features can be targeted using the nodeSelector
field. The following example shows how to target nodes with Intel TurboBoost enabled.
apiVersion: v1
+ Using node labels · Node Feature Discovery
Using node labels
Nodes with specific features can be targeted using the nodeSelector
field. The following example shows how to target nodes with Intel TurboBoost enabled.
apiVersion: v1
kind: Pod
metadata:
labels:
@@ -10,4 +10,4 @@
name: go1
nodeSelector:
feature.node.kubernetes.io/cpu-pstate.turbo: 'true'
-
For more details on targeting nodes, see node selection.
Node Feature Discovery master
\ No newline at end of file
+
For more details on targeting nodes, see node selection.