Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster API State Metrics implementation #6458

Closed
9 tasks done
chrischdi opened this issue Apr 28, 2022 · 20 comments
Closed
9 tasks done

Cluster API State Metrics implementation #6458

chrischdi opened this issue Apr 28, 2022 · 20 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@chrischdi
Copy link
Member

chrischdi commented Apr 28, 2022

Follow-up issue to track implementation of the merged proposal

as follow-up to

Current state is that the contribution PR needs to be done for https://github.com/mercedes-benz/cluster-api-state-metrics to the exp/state-metrics directory which is WIP at the branch https://github.com/mercedes-benz/cluster-api/tree/exp/casm-introduce .

User Story
xref proposal user stories

TODOs:

Detailed Description

[A clear and concise description of what you want to happen.]

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 28, 2022
@fabriziopandini
Copy link
Member

/milestone v1.2

@chrischdi
Copy link
Member Author

Looks like the kube-state-metrics project made the implementation obsolete.

https://github.com/kubernetes/kube-state-metrics/releases/tag/v2.5.0 is now available with a lot of great contributions from the community! It comes with more metrics for standard components and an experimental feature to create your own metrics for CRDs! 🎉 Isn't Friday the perfect opportunity to deploy this to your cluster? 😉

Source: https://kubernetes.slack.com/archives/CJJ529RUY/p1654257314683059

Looks like we should adapt, check if we could do the same with configuration only and deploying kube-state-metrics instead :-)

@sbueringer
Copy link
Member

Oh, interesting turn of events. Yeah let's figure out if that covers our use case.

@bavarianbidi
Copy link
Contributor

As i currently prepare our internal repositories to make use of cluster-api-state-metrics and try to propose a generic description to adopt different infrastructure providers as well i just gave kube-state-metrics a short try with following config.

kind: CustomResourceStateMetrics
spec: 
  resources: 
    - groupVersionKind: 
        group: cluster.x-k8s.io
        kind: Machine
        version: v1beta1
      metrics: 
        - each: 
            path: 
              - status
              - phase
          help: "machine phase"
          name: phase

As kube-state-metrics expect a float as value (https://github.com/kubernetes/kube-state-metrics/blob/7f09ca71ba25af4d2cbb1637e13b28c1d0028159/pkg/customresourcestate/registry_factory.go#L490-L525) above config will fail with

E0607 09:12:00.168168  126676 registry_factory.go:469] "kube_cluster_x-k8s_io_v1beta1_Machine_phase" err="[status,phase]: []: strconv.ParseFloat: parsing \"Runn
ing\": invalid syntax"

Tried to circumvent the fact that a float is required with following config where i defined the path to the only numbered value field in the Machine via

kind: CustomResourceStateMetrics
spec: 
  resources: 
    - groupVersionKind: 
        group: cluster.x-k8s.io
        kind: Machine
        version: v1beta1
      metrics: 
        - each: 
            valueFrom:
              - metadata
              - generation
            path: 
              - status
              - phase
          help: "machine phase"
          name: phase

But still get an error:

E0607 09:12:00.168234  126676 registry_factory.go:469] "kube_cluster_x-k8s_io_v1beta1_Machine_phase_annotation" err="[status,phase]: [metadata,generation]: expe
cted number but found nil value"

It seems to me, that a well scoped metrics server for ClusterAPI (+ infrastructure implementations) still make sense.

@sbueringer
Copy link
Member

Just for my understanding, they only allow exposing floats directly as metrics? So something like the condition metrics just doesn't work?

@bavarianbidi
Copy link
Contributor

Just for my understanding, they only allow exposing floats directly as metrics? So something like the condition metrics just doesn't work?

Exactly, exposing the condition as metric only might work, if at least one numbered field is in the spec (but my hack via metadata.generation does not work - so i guess the field must be in the spec or status).

If KSM-folks accept a patch for this and their plans are longer support for this experimental feature this could be a very promising solution for metrics.

@sbueringer
Copy link
Member

Yup that was my thinking. Essentially what is the gap and can we upstream enough to avoid maintaining our own state metrics

@chrischdi
Copy link
Member Author

chrischdi commented Jun 7, 2022

I started a discussion in kube-state-metrics slack : https://kubernetes.slack.com/archives/CJJ529RUY/p1654593034854759 and we will move the discussion to an issue.

For the CAPI project I think we have some options here to continue:

  1. wait (and maybe help to reach that point) for kube-state-metrics to enable all kind of metrics via config we want
  2. implement custom binary for now:
    a. and migrate to configuration-only kube-state-metrics as soon as all metrics are possible to get defined via configuration (all metrics already possible via config could already get done via config)
    b. and keep custom binary/implementation

@schrej
Copy link
Member

schrej commented Jun 8, 2022

implement custom binary for now

Assuming kube-state-metrics can cover all requirements and we'll migrate to it at some point, does it make sense to even migrate cluster-api-state-metrics into this repository? I think it would be fine to leave it where it is, and then just add the kube-state-metrics configuration to this repository (as well as tilt integration etc.).

The time spent moving the code here is probably better invested into kube-state-metrics to get the custom metrics into a suitable replacement.

@sbueringer
Copy link
Member

sbueringer commented Jun 10, 2022

The time spent moving the code here is probably better invested into kube-state-metrics to get the custom metrics into a suitable replacement.

In general I agree. Depending on how fast we can get the necessary features into kube-state-metrics we could also merge it and then use more and more from the upstream kube-state-metrics until cluster-api-state-metrics is not necessary anymore (as cluster-api-state-metrics imports/extends kube-state-metrics).

But I think for now the best approach is to wait a bit with cluster-api-state-metrics to get a feeling for how fast we can make progress in extending kube-state-metrics (and thus how long it would take to get everything we need supported there).

@chrischdi
Copy link
Member Author

Some progress on the topic:

I invested some time in kubernetes/kube-state-metrics#1755 which resulted in the PR kubernetes/kube-state-metrics#1777 to start some feedback.

Running kube-state-metrics from the linked branch and using the config in kubernetes/kube-state-metrics#1777 (comment) would already cover nearly all metrics of the proposal, except the labels metrics which I did not yet take a look at.

As of now I only see one small disadvantage for using the configuration style kube-state-metrics: we may have to split the paused metric to annotation_paused and (if matches CR) spec_paused because the config (currently) does not provide the feature of combining values from different paths in the CR.

All in all I still think going forward with upstream kube-state-metrics makes way more sense then implementing a custom binary.

@sbueringer
Copy link
Member

Sounds great!

As of now I only see one small disadvantage for using the configuration style kube-state-metrics: we may have to split the paused metric to annotation_paused and (if matches CR) spec_paused because the config (currently) does not provide the feature of combining values from different paths in the CR.

I think that's fine and it's rather in the spirit of kube-state-metrics to straight forward expose the current state from resources without doing additional calculations. If desired this can always be done (at least in Prometheus) either in the alerts or in record rules.

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini added this to the v1.3 milestone Jul 29, 2022
@chrischdi
Copy link
Member Author

I will follow up the next days with a WIP PR which already integrates kube-state-metrics's helm chart into Tilt, similar to prometheus, grafana, ...

Local version is already running and seems to integrate greatly 🎉

@fabriziopandini
Copy link
Member

@chrischdi can we close this issue now and move pending items to separated issues? (I have already opened #7158)

@chrischdi
Copy link
Member Author

chrischdi commented Sep 3, 2022

@chrischdi can we close this issue now and move pending items to separated issues? (I have already opened #7158)

I've got two AI's I want to do and would have closed this issue afterwards:

  • resync the proposal to the current state
  • review again the *status_*replica* metrics for consistency purposes

Also the issue mentions the point clarify and implement release manifest, I am also ok with not doing this for now and/or creating a separate issue for that.

@sbueringer
Copy link
Member

sbueringer commented Sep 5, 2022

I think finishing the two sub-tasks that you mentioned as part of this issue + a separate issue for release manifests would be good.

I'm not sure if we want to release the manifests as long as we are writing / maintaining them manually. So this could depend on the gen tool.

@chrischdi
Copy link
Member Author

I'm not sure if we want to release the manifests as long as we are writing / maintaining them manually. So this could depend on the gen tool.

Fair point to get discussed in the follow-up issue then 👍

Ok, I will do so and also create the issue.

@chrischdi
Copy link
Member Author

xref: issue in kube-state-metrics for *_labels metrics, including example configuration to workaround: kubernetes/kube-state-metrics#1832

@chrischdi
Copy link
Member Author

Closing here by marking the last step as done 🎉 / via creation of:

/close

@k8s-ci-robot
Copy link
Contributor

@chrischdi: Closing this issue.

In response to this:

Closing here by marking the last step as done 🎉 / via creation of:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants