ClusterAPI custom resource monitoring #37

bavarianbidi · 2022-05-30T12:11:14Z

Signed-off-by: Mario Constanti [email protected]

Status:

2023-05-25: PR will be closed (without merging it) soon. The current state is documented in PR https://github.com/giantswarm/giantswarm/pull/27092 and all future discussions will be moved into an own issue

Signed-off-by: Mario Constanti <[email protected]>

fiunchinho · 2022-05-30T13:00:48Z

Thanks for the RFC.

The CASM upstream proposal "future work" section, mentions adding provider metrics in the future, so I guess solving this means discussing with upstream about the best solution.

Talking about the proposed solutions here (and without having talked to upstream people yet), I feel like having different implementations for different providers living in different applications/repos is more aligned with how the CAPI project is currently made/designed, making different providers to be implemented as different applications. As it's mentioned in the RFC, trying to put all potential providers metrics implementations into a single app would bring many many code dependencies into the project (or a complex use of unstructured objects using conditionals to use different fields for different providers?). Apart from code dependencies, it would introduce provider dependency, meaning that development for one provider could affect other providers.

Using different implementations for different providers makes it harder to deploy, but that's a problem that we are already solving for CAPI controllers anyway.

Signed-off-by: Mario Constanti <[email protected]>

cluster-api-state-metrics-adoption/README.md

erkanerol · 2022-05-31T09:40:28Z

cluster-api-state-metrics-adoption/README.md

+
+But as the code isn't that complex it's very easy to extend the existing implementation to support infrastructure provider specific metrics.
+
+Beside generated changes in `go.mod` and `go.sum` only the following changes are needed to add the two new metrics `capi_openstackcluster_created` and `capi_openstackcluster_paused` into CASM.


would it be so complex&dirty to add different providers by config? like passing GVK as config + using unstructured in the code.

would it be so complex&dirty to add different providers by config? like passing GVK as config + using unstructured in the code.

my assumption was to align with the already existing implementation (import the type and reuse it) as much as possible.

If we're using unstructured objects, we have to deal with different fields for all used different providers, not sure if it's worth to doing it.

@fiunchinho already mentioned this above:

As it's mentioned in the RFC, trying to put all potential providers metrics implementations into a single app would bring many many code dependencies into the project (or a complex use of unstructured objects using conditionals to use different fields for different providers?)

IMO:
As upstream implementation doesn't have a solution for provider specific metrics yet i guess it's worth to start with a simple solution for now (as the benefits for us might be huge on a short term). And once the discussion starts upstream we could take part there as we already have to refactor this component once upstream has found a good solution. WDYT @erkanerol ?

I think it is the simplest solution for us (OpenStack provider) but I am not sure it is the most optimal solution for the whole GiantSwarm. We can mention this in the next kaas sync.

Nevertheless, I don't see any big risk in starting with this solution and then improving in time. So +1 from my side.

added a note to the upcoming KaaS-Sync

what was the conclusion in that sync?

what was the conclusion in that sync?

that all others will take a look at this PR. Will propagate it again in the next couple of days.

whats left for this rfc now?

my time to convert it into a new RFC ... still on my list

* WIP: notes about ksm * changed highlighting for new .go file

Signed-off-by: Mario Constanti <[email protected]>

cluster-api-resource-monitoring/README.md

pipo02mix · 2022-06-08T06:13:32Z

cluster-api-resource-monitoring/README.md

+- Scope of KSM between management- and workload cluster differs
+  - resource requirements
+  - permissions
+- Changes in KSM-App affect all clusters, even if changes are only done for management clusters (e.g. version bump)


can we just change those values at deployment time depending on the target cluster? I think we do for other components too

My understanding is YES and this seems the best solution to me. We already use helm templates to differentiate behavior of apps (including monitoring stuff) according to the environment.

yes, it's possible. It's more about accepting the impact when we go this way

I like this variant mixed with things from other variants 😄

I'm wondering if the following would be possible. I was thinking about one KSM instance as described on this variant, but CustomResourceStateMetrics resources are bundled with the CAPI apps like cluster-api-app and cluster-api-provider-$provider-app. Same as explained on variant 2.3 but without having different KSM instances. I think in this case we get the benefits without having to deploy several KSMs and waste resources.

That's exactly one idea, KSM folks mentioned someday in slack (TODO marioc: link to that statement) that each third party component can ship there own CustomResourceStateMetrics configuration and KSM is taking care of it.

Currently CustomResourceStateMetrics is only defined as key in the configmap, which KSM will read.

kopiczko · 2022-06-08T08:01:24Z

cluster-api-resource-monitoring/README.md

+
+- Cluster API versions must be reflect in the `CustomResourceStateMetrics` configuration. We will potentially diverge on the used Cluster API versions per infrastructure provider.
+
+#### Variant 2.2: dedicated `kube-state-metrics` instance on management cluster per CAPI providers


For this I understand we could put a bunch of {{ if eq .Values.provider "openstack" }} in the helm chart and that would do right?

With Variant 1 and Variant 2.1 we will have a lot of {{ if eq .Values.provider "openstack" }} in the repos (and therefore we have to decide how to go with corresponding APP releases - as we could break other providers monitoring)

pipo02mix · 2022-06-23T08:36:43Z

I am still confused if this will be implemented in CASM or KSM will fadeit out

bavarianbidi · 2022-06-23T13:50:00Z

I am still confused if this will be implemented in CASM or KSM will fadeit out

upstream discussion about KSM is still open (kubernetes/kube-state-metrics#1755 (comment) - and there is no feedback from KSM folks right now) but i still asume that we will go with KSM.

My idea was to get feedback about both possible implementations and once upstream has a decision we can immediately start with our implementation.

bavarianbidi · 2022-06-27T13:07:10Z

for now i would start with a dedicated app - as i won't run the kube-system based KSM from a forked custom branch (https://github.com/kubernetes/kube-state-metrics/compare/master...chrischdi:poc-additional-metric-types-2?expand=1)

bavarianbidi · 2022-07-29T18:27:12Z

@JosephSalisbury as you take the CAPI-Monitoring thing into an upcoming "product-sync" - i'm currently think about closing this RFC (with a comment) and creating a new RFC to

adopt variant 2.1 as a general purpose for future CR-Monitoring in ManagementClusters (and eventually in WCs as well - if needed)
and if we (as GS) belive in a solution, where
- KSM in a long-term provides a CustomResourceStateMetrics CRD and watch/list/get all CustomResourceStateMetrics CRs
- and each controller implementation (e.g. ClusterAPI, Flux, ...) creates/serves an own MetricConfiguration of type CustomResourceStateMetrics
might be the better approach - we could think about implementing/providing this feature to KSM.

During the initial implementation of cluster-api-monitoring-app and demonstrating the special KSM-instance i often got asked, if CustomResourceStateMetrics is a CRD in a cluster (currently it's only the "header" of the configuration and only used as anchor for generating the configuration)

Until now, the Idea itself got only published once in a slack thread by KSM maintainer Manuel Krug:

I'm also currently thinking about a kube-state-metrics-customresourcemetrics CRD, so each CRD could ship their own set of metric definitions with its distribution and kube-state-metric would pick it up

JosephSalisbury · 2022-08-01T10:31:51Z

yeah, i reckon recreating / updating #37 to focus on variant 2.1 (i.e: present all the options, make 2.1 the one we're going to go with) would be neat, and focus it more on cr monitoring in general

i'm not sure yet about customeresourcestatemetrics crs. it feels a bit too far away for us to make a bet on. i could see us having some app that we deploy, and we have our config there, then if customeresourcestatemetrics crs become a thing, we break that apart - does that sound reasonable?

fiunchinho · 2022-10-03T12:04:00Z

Should we update the RFC with the approach being used atm and try to close this PR?

pipo02mix · 2022-10-11T09:03:15Z

yeah it would be a good idea. WDYT @bavarianbidi ?

bavarianbidi · 2022-10-11T09:21:11Z

yeah it would be a good idea. WDYT @bavarianbidi ?

yep, i agree ... already having a local branch open but currently i'm focused to make some progress on the mc bootstrap behind a proxy-thing.

will try to invest some time this week/next week to make the current status clear.

github-actions · 2023-03-04T12:32:08Z

This RFC has had no activity for 100 days. It will be closed in 1 week, unless activity occurs or the label lifecycle/keep is added.

bavarianbidi · 2023-03-06T15:40:18Z

will try to invest some time this week/next week to make the current status clear.

oh man, what was I thinking 🙈 let's give them another try in the next couple of weeks 🤞

bavarianbidi · 2023-05-25T12:25:27Z

PR will closed soon.
current state is documented here: https://github.com/giantswarm/giantswarm/pull/27092

All the future discussions about the possibilities with the CustomResourceMonitoring feature of KSM will be moved into a dedicated issue as there is a lot of progress on upstream atm.

bavarianbidi · 2023-05-30T13:31:36Z

Current CAPI monitoring is described in https://intranet.giantswarm.io/docs/monitoring/capi_monitoring/.
This RFC will be closed as we have a solution for now. Any additional improvements will be handled in giantswarm/roadmap#2513

pipo02mix · 2023-05-30T15:28:14Z

But it is not the first variant? I was thinking why not merging instead of closing and don't lose the work you did

bavarianbidi · 2023-05-30T15:43:28Z

But it is not the first variant? I was thinking why not merging instead of closing and don't lose the work you did

@pipo02mix
I didn't want to merge the PR as we didn't had a real consent about the implementation and due high priorities and the operational need we started with variant 2.1 - so there was no "real discussion"

I've recently documented this setup in our intranet and also referenced to this PR for possible alternative implementations (https://intranet.giantswarm.io/docs/monitoring/capi_monitoring/#reason-for-current-architecture).
And also upstream KSM is extending the current feature which should have impact for the entire CR monitoring thing within GS (giantswarm/roadmap#2513)

Adoption of CASM

edbc74b

Signed-off-by: Mario Constanti <[email protected]>

bavarianbidi requested review from a team May 30, 2022 12:21

add TODO conclusion section

b4837b8

Signed-off-by: Mario Constanti <[email protected]>

erkanerol reviewed May 31, 2022

View reviewed changes

bavarianbidi added 3 commits June 7, 2022 11:26

notes about ksm

708ae18

* WIP: notes about ksm * changed highlighting for new .go file

add additional notes about ksm deployment modes

c26e04a

Signed-off-by: Mario Constanti <[email protected]>

renamed RFC

051ab0d

Signed-off-by: Mario Constanti <[email protected]>

bavarianbidi changed the title ~~Adoption of CASM~~ ClusterAPI custom resource monitoring Jun 8, 2022

bavarianbidi requested review from a team and erkanerol June 8, 2022 04:55

pipo02mix reviewed Jun 8, 2022

View reviewed changes

cluster-api-resource-monitoring/README.md Outdated Show resolved Hide resolved

pipo02mix reviewed Jun 8, 2022

View reviewed changes

kopiczko reviewed Jun 8, 2022

View reviewed changes

add some pictures to make possible ksm cut more clear

3a6cfb2

bavarianbidi requested review from pipo02mix and kopiczko June 13, 2022 09:09

bavarianbidi mentioned this pull request Sep 15, 2022

Update from upstream giantswarm/kube-state-metrics-app#158

Merged

github-actions bot added the lifecycle/stale label Mar 4, 2023

bavarianbidi removed the lifecycle/stale label Mar 6, 2023

bavarianbidi added the lifecycle/keep label Mar 6, 2023

bavarianbidi mentioned this pull request May 30, 2023

CR monitoring via KSM giantswarm/roadmap#2513

Open

bavarianbidi closed this May 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClusterAPI custom resource monitoring #37

ClusterAPI custom resource monitoring #37

bavarianbidi commented May 30, 2022 •

edited

Loading

fiunchinho commented May 30, 2022

erkanerol May 31, 2022

bavarianbidi May 31, 2022

erkanerol May 31, 2022

bavarianbidi Jun 1, 2022

pipo02mix Jun 23, 2022

bavarianbidi Jun 23, 2022

gawertm Nov 24, 2022

bavarianbidi Nov 24, 2022

pipo02mix Jun 8, 2022

erkanerol Jun 8, 2022

bavarianbidi Jun 8, 2022

fiunchinho Jun 30, 2022

bavarianbidi Jul 3, 2022

kopiczko Jun 8, 2022

bavarianbidi Jun 13, 2022

pipo02mix commented Jun 23, 2022

bavarianbidi commented Jun 23, 2022

bavarianbidi commented Jun 27, 2022

bavarianbidi commented Jul 29, 2022

JosephSalisbury commented Aug 1, 2022 •

edited

Loading

fiunchinho commented Oct 3, 2022

pipo02mix commented Oct 11, 2022

bavarianbidi commented Oct 11, 2022

github-actions bot commented Mar 4, 2023

bavarianbidi commented Mar 6, 2023

bavarianbidi commented May 25, 2023

bavarianbidi commented May 30, 2023

pipo02mix commented May 30, 2023

bavarianbidi commented May 30, 2023


		But as the code isn't that complex it's very easy to extend the existing implementation to support infrastructure provider specific metrics.

		Beside generated changes in `go.mod` and `go.sum` only the following changes are needed to add the two new metrics `capi_openstackcluster_created` and `capi_openstackcluster_paused` into CASM.


		- Cluster API versions must be reflect in the `CustomResourceStateMetrics` configuration. We will potentially diverge on the used Cluster API versions per infrastructure provider.

		#### Variant 2.2: dedicated `kube-state-metrics` instance on management cluster per CAPI providers

ClusterAPI custom resource monitoring #37

ClusterAPI custom resource monitoring #37

Conversation

bavarianbidi commented May 30, 2022 • edited Loading

fiunchinho commented May 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pipo02mix commented Jun 23, 2022

bavarianbidi commented Jun 23, 2022

bavarianbidi commented Jun 27, 2022

bavarianbidi commented Jul 29, 2022

JosephSalisbury commented Aug 1, 2022 • edited Loading

fiunchinho commented Oct 3, 2022

pipo02mix commented Oct 11, 2022

bavarianbidi commented Oct 11, 2022

github-actions bot commented Mar 4, 2023

bavarianbidi commented Mar 6, 2023

bavarianbidi commented May 25, 2023

bavarianbidi commented May 30, 2023

pipo02mix commented May 30, 2023

bavarianbidi commented May 30, 2023

bavarianbidi commented May 30, 2022 •

edited

Loading

JosephSalisbury commented Aug 1, 2022 •

edited

Loading