add MachineHealthCheck #146

3deep5me · 2023-11-14T12:14:32Z

this adds a MachineHealthCheck for all Machines of a new cluster. This can help if the node does not go into a running state. E.g. #145 or network problems during startup or other reasons.

sp-yduck · 2023-11-15T01:24:24Z

Thank you for the PR !
As you may know this template file is used for Quick Start. I want to keep Quick Start with minimal setups. So if you want to include this, the options is

create new template for it so that users can choose specific template to try specific features
ref: https://cluster-api.sigs.k8s.io/clusterctl/commands/generate-cluster.html?highlight=flavor#flavors
ref: https://cluster-api.sigs.k8s.io/clusterctl/commands/generate-cluster.html?highlight=flavor#alternative-source-for-cluster-templates

3deep5me · 2023-11-16T09:11:20Z

I also changed the location of the quick start i hope this is fine for you.
I don't know how the clusterctl knows where to find the cluster-templates.
Do i have to change something else to make this work?

sp-yduck · 2023-11-17T03:13:19Z

clusterctl checks assets of the release so the file changes are ok. the thing is I am using make release to output these assets for each release.

make release-templates
https://github.com/sp-yduck/cluster-api-provider-proxmox/blob/bbcdd56993d21f9e3581eed558806fc909b71cec/Makefile#L219-L220
make generate-e2e-templates
https://github.com/sp-yduck/cluster-api-provider-proxmox/blob/bbcdd56993d21f9e3581eed558806fc909b71cec/Makefile#L111-L113

I think you can use kustomize build template/base or something for both of them

templates/flavors/advanced/machinehealthcheck.yaml

sp-yduck · 2023-11-17T03:22:24Z

templates/flavors/advanced/machinehealthcheck.yaml

+  maxUnhealthy: 100%
+  selector:
+    matchLabels:
+      cluster.x-k8s.io/cluster-name: '${CLUSTER_NAME}'


is this a good idea to have single mhc for all the nodes in a cluster ?
maybe better to have a different mhc for controlplane and others. what do you think ?

It think its more common to have two mhc but in the simple setup i see no reason for that.
But i could be wrong do you have an idea how we could utilize two mhc?

To be precise, I don't know how the two mhc should differ.

https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/healthchecking.html#creating-a-machinehealthcheck
there is an example mhc for both workers(node-unhealthy-5m) and controlplane(kcp-unhealthy-5m)

Mhhhh... okay. But the Examples do not differ. But we can definitely do two mhc if you want to be more cluster-api conformant.

sp-yduck · 2023-11-20T01:43:31Z

btw I believe mhc does not help for

E.g. #145 or network problems during startup or other reasons.

since mhc checks Machine and Node object to confirm if Node(in workload cluster) is ready. so like issue #145 , if the vm goes into unhealthy before it joins k8s cluster, mhc cannot find the Node associated to that unhealthy vm and cannot remediate it.

3deep5me · 2023-11-23T17:00:03Z

I'm not sure about the detailed mechanics.
But I can confirm that if a VM does not boot, it is deleted and recreated.
(But at the moment no VM boots because i get the error every time 😢)

I think mhc also checks on default the status.condtion[].type.Ready field. Or that is the only way I can explain the behavior.

added MachineHealthCheck

1027173

restructure to kustomize overlays

6b97a24

sp-yduck reviewed Nov 17, 2023

View reviewed changes

3deep5me added 3 commits November 19, 2023 17:26

renamed flavor & added namespace

825caaa

removed trailing whitespace

48988a4

added new cluster-template builds into makefile

7246652

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add MachineHealthCheck #146

add MachineHealthCheck #146

3deep5me commented Nov 14, 2023 •

edited

Loading

sp-yduck commented Nov 15, 2023

3deep5me commented Nov 16, 2023

sp-yduck commented Nov 17, 2023

sp-yduck Nov 17, 2023

3deep5me Nov 19, 2023

3deep5me Nov 19, 2023

sp-yduck Nov 20, 2023

3deep5me Nov 23, 2023

sp-yduck commented Nov 20, 2023 •

edited

Loading

3deep5me commented Nov 23, 2023 •

edited

Loading

add MachineHealthCheck #146

Are you sure you want to change the base?

add MachineHealthCheck #146

Conversation

3deep5me commented Nov 14, 2023 • edited Loading

sp-yduck commented Nov 15, 2023

3deep5me commented Nov 16, 2023

sp-yduck commented Nov 17, 2023

sp-yduck Nov 17, 2023

Choose a reason for hiding this comment

3deep5me Nov 19, 2023

Choose a reason for hiding this comment

3deep5me Nov 19, 2023

Choose a reason for hiding this comment

sp-yduck Nov 20, 2023

Choose a reason for hiding this comment

3deep5me Nov 23, 2023

Choose a reason for hiding this comment

sp-yduck commented Nov 20, 2023 • edited Loading

3deep5me commented Nov 23, 2023 • edited Loading

3deep5me commented Nov 14, 2023 •

edited

Loading

sp-yduck commented Nov 20, 2023 •

edited

Loading

3deep5me commented Nov 23, 2023 •

edited

Loading