Skip to content

Commit

Permalink
Merge pull request #11223 from fabriziopandini/refactor-infra-machine…
Browse files Browse the repository at this point in the history
…-contract

📖 Refactor InfraMachine contract
  • Loading branch information
k8s-ci-robot authored Sep 27, 2024
2 parents 274d7e2 + de9897b commit 187f385
Show file tree
Hide file tree
Showing 5 changed files with 525 additions and 294 deletions.
4 changes: 2 additions & 2 deletions docs/book/src/developer/core/controllers/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The Cluster controller is responsible for reconciling the Cluster resource.
In order to allow Cluster provisioning on different type of infrastructure, The Cluster resource references
an InfraCluster object, e.g. AWSCluster, GCPCluster etc.

The [InfraCluster resource contract](../../providers/contracts/infra-cluster.md) defines a set of rules a provider is expected to comply in order to allow
The [InfraCluster resource contract](../../providers/contracts/infra-cluster.md) defines a set of rules a provider is expected to comply with in order to allow
the expected interactions with the Cluster controller.

Among those rules:
Expand All @@ -18,7 +18,7 @@ Among those rules:
Similarly, in order to support different solutions for control plane management, The Cluster resource references
an ControlPlane object, e.g. KubeadmControlPlane, EKSControlPlane etc.

The [ControlPlane resource contract](../../providers/contracts/control-plane.md) defines a set of rules a provider is expected to comply in order to allow
The [ControlPlane resource contract](../../providers/contracts/control-plane.md) defines a set of rules a provider is expected to comply with in order to allow
the expected interactions with the Cluster controller.

Considering all the info above, the Cluster controller's main responsibilities are:
Expand Down
152 changes: 33 additions & 119 deletions docs/book/src/developer/core/controllers/machine.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,39 @@
# Machine Controller
# Machine Controller

![](../../../images/cluster-admission-machine-controller.png)
The Machine controller is responsible for reconciling the Machine resource.

In order to allow Machine provisioning on different type of infrastructure, The Machine resource references
an InfraMachine object, e.g. AWSMachine, GCMachine etc.

The [InfraMachine resource contract](../../providers/contracts/infra-machine.md) defines a set of rules a provider is expected to comply with in order to allow
the expected interactions with the Machine controller.

Among those rules:
- InfraMachine MUST report a [provider ID](../../providers/contracts/infra-machine.md#inframachine-provider-id) for the Machine
- InfraMachine SHOULD take into account the [failure domain](../../providers/contracts/infra-machine.md#inframachine-failure-domain) where machines should be placed in
- InfraMachine SHOULD surface machine's [addresses](../../providers/contracts/infra-machine.md#inframachine-addresses) to help operators when troubleshooting issues
- InfraMachine MUST report when Machine's infrastructure is [fully provisioned](../../providers/contracts/infra-machine.md#inframachine-initialization-completed)
- InfraMachine SHOULD report [conditions](../../providers/contracts/infra-machine.md#inframachine-conditions)
- InfraMachine SHOULD report [terminal failures](../../providers/contracts/infra-machine.md#inframachine-terminal-failures)

Similarly, in order to support different machine bootstrappers, The Machine resource references
a BootstrapConfig object, e.g. KubeadmBoostrapConfig etc.

The [BootstrapConfig resource contract](../../providers/contracts/bootstrap-config.md) defines a set of rules a provider is expected to comply with in order to allow
the expected interactions with the Machine controller.

The Machine controller's main responsibilities are:
Considering all the info above, the Machine controller's main responsibilities are:

* Setting an OwnerReference on:
* Each Machine object to the Cluster object.
* The associated BootstrapConfig object.
* The associated InfrastructureMachine object.
* Copy data from `BootstrapConfig.Status.DataSecretName` to `Machine.Spec.Bootstrap.DataSecretName` if
`Machine.Spec.Bootstrap.DataSecretName` is empty.
* Setting NodeRefs to be able to associate machines and Kubernetes nodes.
* Deleting Nodes in the target cluster when the associated machine is deleted.
* Cleanup of related objects.
* Keeping the Machine's Status object up to date with the InfrastructureMachine's Status object.
* Finding Kubernetes nodes matching the expected providerID in the workload cluster.
* Setting an OwnerReference on the infrastructure object referenced in `Machine.spec.infrastructureRef`.
* Setting an OwnerReference on the bootstrap object referenced in `Machine.spec.bootstrap.configRef`.
* Keeping the Machine's status in sync with the InfraMachine and BootstrapConfig's status.
* Finding Kubernetes nodes matching the expected providerID in the workload cluster.
* Setting NodeRefs to be able to associate machines and Kubernetes nodes.
* Monitor Kubernetes nodes and propagate labels to them.
* Cleanup of all owned objects so that nothing is dangling after deletion.
* Drain nodes and wait for volumes being detached by CSI plugins.

![](../../../images/cluster-admission-machine-controller.png)

After the machine controller sets the OwnerReferences on the associated objects, it waits for the bootstrap
and infrastructure objects referenced by the machine to have the `Status.Ready` field set to `true`. When
Expand All @@ -25,108 +44,3 @@ The machine controller uses the kubeconfig for the new workload cluster to watch
When a node appears with `Node.Spec.ProviderID` matching `Machine.Spec.ProviderID`, the machine controller
transitions the associated machine into the `Provisioned` state. When the infrastructure ref is also
`Ready`, the machine controller marks the machine as `Running`.

## Contracts

### Cluster API

Cluster associations are made via labels.

#### Expected labels

| what | label | value | meaning |
| --- | --- | --- | --- |
| Machine | `cluster.x-k8s.io/cluster-name` | `<cluster-name>` | Identify a machine as belonging to a cluster with the name `<cluster-name>`|
| Machine | `cluster.x-k8s.io/control-plane` | `true` | Identifies a machine as a control-plane node |

### Bootstrap provider

The BootstrapConfig object **must** have a `status` object.

To override the bootstrap provider, a user (or external system) can directly set the `Machine.Spec.Bootstrap.Data`
field. This will mark the machine as ready for bootstrapping and no bootstrap data will be copied from the
BootstrapConfig object.

#### Required `status` fields

The `status` object **must** have several fields defined:

* `ready` - a boolean field indicating the bootstrap config data is generated and ready for use.
* `dataSecretName` - a string field referencing the name of the secret that stores the generated bootstrap data.

#### Optional `status` fields

The `status` object **may** define several fields that do not affect functionality if missing:

* `failureReason` - a string field explaining why a fatal error has occurred, if possible.
* `failureMessage` - a string field that holds the message contained by the error.

Note: once any of `failureReason` or `failureMessage` surface on the machine who is referencing the bootstrap config object,
they cannot be restored anymore (it is considered a terminal error; the only way to recover is to delete and recreate the machine).
Also, if the machine is under control of a MachineHealthCheck instance, the machine will be automatically remediated.

Example:

```yaml
kind: MyBootstrapProviderConfig
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
status:
ready: true
dataSecretName: "MyBootstrapSecret"
```
### Infrastructure provider
The InfrastructureMachine object **must** have both `spec` and `status` objects.

#### Required `spec` fields

The `spec` object **must** at least one field defined:

* `providerID` - a cloud provider ID identifying the machine.

#### Optional `spec` fields

The `spec` object **may** define several fields that do not affect functionality if missing:

* `failureDomain` - is a string identifying the failure domain the instance is running in.

#### Required `status` fields

The `status` object **must** at least one field defined:

* `ready` - a boolean field indicating if the infrastructure is ready to be used or not.

#### Optional `status` fields

The `status` object **may** define several fields that do not affect functionality if missing:

* `failureReason` - is a string that explains why a fatal error has occurred, if possible.
* `failureMessage` - is a string that holds the message contained by the error.
* `addresses` - is a `MachineAddresses` (a list of `MachineAddress`) which represents host names, external IP addresses, internal IP addresses,
external DNS names, and/or internal DNS names for the provider's machine instance. `MachineAddress` is
defined as:
- `type` (string): one of `Hostname`, `ExternalIP`, `InternalIP`, `ExternalDNS`, `InternalDNS`
- `address` (string)

Note: once any of `failureReason` or `failureMessage` surface on the machine who is referencing the infrastructureMachine object,
they cannot be restored anymore (it is considered a terminal error; the only way to recover is to delete and recreate the machine).
Also, if the machine is under control of a MachineHealthCheck instance, the machine will be automatically remediated.

Example:
```yaml
kind: MyMachine
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
spec:
providerID: cloud:////my-cloud-provider-id
status:
ready: true
```

### Secrets

The Machine controller will create a secret or use an existing secret in the following format:

| secret name | field name | content |
|:---:|:---:|---|
|`<cluster-name>-kubeconfig`|`value`|base64 encoded kubeconfig that is authenticated with the child cluster|
6 changes: 3 additions & 3 deletions docs/book/src/developer/providers/contracts/infra-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ rules:
- watch
```
Note: The write permissions allow the Cluster controller to set owner references and labels on the InfraCluster resources;
Note: The write permissions allow the Cluster controller to set owner references and labels on the InfraCluster resources;
write permissions are not used for general mutations of InfraCluster resources, unless specifically required (e.g. when
using ClusterClass and managed topologies).
Expand Down Expand Up @@ -271,7 +271,7 @@ Each InfraCluster MUST report when Cluster's infrastructure is fully provisioned

```go
type FooClusterStatus struct {
// Ready denotes that the foo cluster infrastructure fully provisioned.
// Ready denotes that the foo cluster infrastructure is fully provisioned.
// +optional
Ready bool `json:"ready"`

Expand All @@ -282,7 +282,7 @@ type FooClusterStatus struct {

Once `status.ready` the Cluster "core" controller will bubbles up this info in Cluster's `status.infrastructureReady`;
If defined, also InfraCluster's `spec.controlPlaneEndpoint` and `status.failureDomains` will be surfaced on Cluster's
corresponding field at the same time.
corresponding fields at the same time.

<aside class="note warning">

Expand Down
Loading

0 comments on commit 187f385

Please sign in to comment.