-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClusterAPI custom resource monitoring #37
Conversation
Signed-off-by: Mario Constanti <[email protected]>
Thanks for the RFC. The CASM upstream proposal "future work" section, mentions adding provider metrics in the future, so I guess solving this means discussing with upstream about the best solution. Talking about the proposed solutions here (and without having talked to upstream people yet), I feel like having different implementations for different providers living in different applications/repos is more aligned with how the CAPI project is currently made/designed, making different providers to be implemented as different applications. As it's mentioned in the RFC, trying to put all potential providers metrics implementations into a single app would bring many many code dependencies into the project (or a complex use of Using different implementations for different providers makes it harder to deploy, but that's a problem that we are already solving for CAPI controllers anyway. |
Signed-off-by: Mario Constanti <[email protected]>
|
||
But as the code isn't that complex it's very easy to extend the existing implementation to support infrastructure provider specific metrics. | ||
|
||
Beside generated changes in `go.mod` and `go.sum` only the following changes are needed to add the two new metrics `capi_openstackcluster_created` and `capi_openstackcluster_paused` into CASM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be so complex&dirty to add different providers by config? like passing GVK as config + using unstructured in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be so complex&dirty to add different providers by config? like passing GVK as config + using unstructured in the code.
my assumption was to align with the already existing implementation (import the type and reuse it) as much as possible.
If we're using unstructured
objects, we have to deal with different fields for all used different providers, not sure if it's worth to doing it.
@fiunchinho already mentioned this above:
As it's mentioned in the RFC, trying to put all potential providers metrics implementations into a single app would bring many many code dependencies into the project (or a complex use of
unstructured
objects using conditionals to use different fields for different providers?)
IMO:
As upstream implementation doesn't have a solution for provider specific metrics yet i guess it's worth to start with a simple solution for now (as the benefits for us might be huge on a short term). And once the discussion starts upstream we could take part there as we already have to refactor this component once upstream has found a good solution. WDYT @erkanerol ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is the simplest solution for us (OpenStack provider) but I am not sure it is the most optimal solution for the whole GiantSwarm. We can mention this in the next kaas sync.
Nevertheless, I don't see any big risk in starting with this solution and then improving in time. So +1 from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a note to the upcoming KaaS-Sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the conclusion in that sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what was the conclusion in that sync?
that all others will take a look at this PR. Will propagate it again in the next couple of days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats left for this rfc now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my time to convert it into a new RFC ... still on my list
* WIP: notes about ksm * changed highlighting for new .go file
Signed-off-by: Mario Constanti <[email protected]>
Signed-off-by: Mario Constanti <[email protected]>
- Scope of KSM between management- and workload cluster differs | ||
- resource requirements | ||
- permissions | ||
- Changes in KSM-App affect all clusters, even if changes are only done for management clusters (e.g. version bump) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just change those values at deployment time depending on the target cluster? I think we do for other components too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is YES and this seems the best solution to me. We already use helm templates to differentiate behavior of apps (including monitoring stuff) according to the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it's possible. It's more about accepting the impact when we go this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this variant mixed with things from other variants 😄
I'm wondering if the following would be possible. I was thinking about one KSM instance as described on this variant, but CustomResourceStateMetrics
resources are bundled with the CAPI apps like cluster-api-app
and cluster-api-provider-$provider-app
. Same as explained on variant 2.3 but without having different KSM instances. I think in this case we get the benefits without having to deploy several KSMs and waste resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly one idea, KSM folks mentioned someday in slack (TODO marioc: link to that statement) that each third party component can ship there own CustomResourceStateMetrics
configuration and KSM
is taking care of it.
Currently CustomResourceStateMetrics
is only defined as key in the configmap, which KSM
will read.
|
||
- Cluster API versions must be reflect in the `CustomResourceStateMetrics` configuration. We will potentially diverge on the used Cluster API versions per infrastructure provider. | ||
|
||
#### Variant 2.2: dedicated `kube-state-metrics` instance on management cluster per CAPI providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this I understand we could put a bunch of {{ if eq .Values.provider "openstack" }}
in the helm chart and that would do right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With Variant 1 and Variant 2.1 we will have a lot of {{ if eq .Values.provider "openstack" }}
in the repos (and therefore we have to decide how to go with corresponding APP releases - as we could break other providers monitoring)
I am still confused if this will be implemented in CASM or KSM will fadeit out |
upstream discussion about KSM is still open (kubernetes/kube-state-metrics#1755 (comment) - and there is no feedback from KSM folks right now) but i still asume that we will go with KSM. My idea was to get feedback about both possible implementations and once upstream has a decision we can immediately start with our implementation. |
for now i would start with a dedicated app - as i won't run the kube-system based KSM from a forked custom branch (https://github.com/kubernetes/kube-state-metrics/compare/master...chrischdi:poc-additional-metric-types-2?expand=1) |
@JosephSalisbury as you take the CAPI-Monitoring thing into an upcoming "product-sync" - i'm currently think about closing this RFC (with a comment) and creating a new RFC to
|
yeah, i reckon recreating / updating #37 to focus on variant 2.1 (i.e: present all the options, make 2.1 the one we're going to go with) would be neat, and focus it more on cr monitoring in general i'm not sure yet about customeresourcestatemetrics crs. it feels a bit too far away for us to make a bet on. i could see us having some app that we deploy, and we have our config there, then if customeresourcestatemetrics crs become a thing, we break that apart - does that sound reasonable? |
Should we update the RFC with the approach being used atm and try to close this PR? |
yeah it would be a good idea. WDYT @bavarianbidi ? |
yep, i agree ... already having a local branch open but currently i'm focused to make some progress on the will try to invest some time this week/next week to make the current status clear. |
This RFC has had no activity for 100 days. It will be closed in 1 week, unless activity occurs or the label |
oh man, what was I thinking 🙈 let's give them another try in the next couple of weeks 🤞 |
PR will closed soon. All the future discussions about the possibilities with the CustomResourceMonitoring feature of KSM will be moved into a dedicated issue as there is a lot of progress on upstream atm. |
Current CAPI monitoring is described in https://intranet.giantswarm.io/docs/monitoring/capi_monitoring/. |
But it is not the first variant? I was thinking why not merging instead of closing and don't lose the work you did |
@pipo02mix I've recently documented this setup in our intranet and also referenced to this PR for possible alternative implementations (https://intranet.giantswarm.io/docs/monitoring/capi_monitoring/#reason-for-current-architecture). |
Signed-off-by: Mario Constanti [email protected]
Status: