From 0fa8107d86cdcb886495167a0c1f12e4d0a1dd43 Mon Sep 17 00:00:00 2001 From: Matthew Westphall Date: Fri, 16 Aug 2024 17:30:22 -0500 Subject: [PATCH 1/2] Docuemnt configuring and installing Kuantifier --- docs/other/monitor-kubernetes-kuantifier.md | 188 ++++++++++++++++++++ 1 file changed, 188 insertions(+) create mode 100644 docs/other/monitor-kubernetes-kuantifier.md diff --git a/docs/other/monitor-kubernetes-kuantifier.md b/docs/other/monitor-kubernetes-kuantifier.md new file mode 100644 index 000000000..149cb81c3 --- /dev/null +++ b/docs/other/monitor-kubernetes-kuantifier.md @@ -0,0 +1,188 @@ +title: Monitor Kubernetes Workloads with Kuantifier +DateReviewed: 2024-08-16 + +Monitoring Kubernetes Workloads with Kuantifier +=============================================== + + + +Before Starting +--------------- + +### Confirm access to a running Kubernetes cluster + +All subsequent instructions assume you have access to a running Kuberenetes cluster, and can run [kubectl](kubectl) +against that cluster. + +### Install the Helm command line tools + +Kuantifier itself, and several of its prerequisites, are installed via [helm chart](https://helm.sh/). The helm +command line tools are used to install helm charts against a running kuberentes cluster, and can be installed +as follows: + +1. Download the latest [helm release](helm-release) +1. Unpack the release blob (eg. `tar -zxvf helm-v3.0.0-linux-amd64.tar.gz`) +1. Move the `helm` binary from the archive into a location along your `$PATH` (eg. `mv linux-amd64/helm ~/.local/bin`) + +### Install Prometheus and kube-state-metrics in your Kubernetes cluster + +Kuantifier relies on [Prometheus](prometheus) with [kube-state-metrics](kube-state-metrics) to gather raw pod metrics. +There are a number of ways to install both, such as via the [promethus community helm charts](prometheus-community): + +1. Add the prometheus community helm repository to your local helm + + :::console + helm repo add prometheus-community https://prometheus-community.github.io/helm-charts + helm repo update + +1. Install the kube-state-metrics and prometheus helm charts into your kubernetes cluster. + + :::console + helm install kube-state-metrics prometheus-community/kube-state-metrics + helm install prometheus prometheus-community/prometheus + + !!! note + These commands will install into the `default` namespace by default. To install into a + non-default namespace such as `monitoring`, use the `-n monitoring` flag in helm install. + +1. Ensure that the namespace where your workload pods run is properly configured. + + - Kuantifier relies on the `spec.containers[].resources.requests.cpu` field in workload pods + to determine proccessor count for GRACC reporting. Ensure a cpu request is set for pods in + your workspace. + + - Kuantifier relies on the Prometheus pod completion time metric to calculate workload job run times. + This metric is sometimes missed for pods that are spontaneously deleted, such as those created by + Deployments. For best results, run workload pods via Kubernetes Jobs. + + - (Known issue) Kuantifier currently doesn't support calculating usage metrics for workload pods + running multiple containers. Ensure that workload pods in your namespace have only one container. + +Installation +------------ + +Kuantifier itself is also installed via a helm chart, hosted at hub.opensciencegrid.org/iris-hep/kuantifier. + + +### Configuring Kuantifier's Values File + +Several instance-specific modifications to the default [Values File](values-file) provided with the chart +must be made prior to installation. For full documentation of the values in the values file, see the +[helm chart README on Github](helm-values-readme). + +1. Fetch the default values.yaml for kuantifier. This file can be obtained in several ways. + - Via the helm cli: + + :::console + helm show values oci://hub.opensciencegrid.org/iris-hep/kuantifier + + - Via the [kuantifier Github repository](values-github). + + !!! note + Ensure that the release tag for values.yaml in the git repo corresponds to the version of the chart you're installing. + + +1. Update the top-level `.outputFormat` in values.yaml to output records to [GRACC](https://gracc.opensciencegrid.org/) + + :::yaml + outputFormat: "gratia" + +1. Update the `.processor.config` map with the details of your deployment. + - All of the following need to be set: + - `NAMESPACE`: The namespace of the pods for which Kuantifier will collect and report metrics. + + !!! note + Each installation of kuantifier only reports on pods in a single namespaece. You must + install multiple instances of the chart to support reporting on multiple namespaces. + + - `SITE_NAME`: The name of the site being reported. + - `SUBMIT_HOST`: Uniquely identifying name for the Kubernetes cluster where your workload pods run, in FQDN format. + - `VO_NAME`: Virtual Organization (VO) of jobs. + + - Additionally, the following may need to be set: + - `PROMETHEUS_SERVER`: The DNS name of the prometheus server installed in your kubernetes cluster. + - If Prometheus was installed in your cluster via the prometheus community helm chart in the monitoring + namespace, the DNS name will be `prometheus-server.monitoring.svc.cluster.local` + - Otherwise, [construct](https://kubernetes.io/docs/concepts/services-networking/service/#dns) the URL based on the standard Kubernetes service discovery mechanism (i.e. service name and namespace). + + - A fully configured `.processor.config` might look like: + + :::yaml + processor: + config: + NAMESPACE: workload-namespace + SITE_NAME: CHTC + VO_NAME: University of Wisconsin + SUBMIT_HOST: tiger-cluster.chtc.wisc.edu + PROMETHEUS_SERVER: prometheus-server.monitoring.svc.cluster.local + +1. (Optional) If Prometheus in your cluster is configured to require authentiation, an + authentication header can be specified via a key within an already-existing [Secret](kubernetes-secret) in the namespace: + + :::yaml + processor: + prometheus_auth: + secret: + key: + +1. (Optional) Update the frequency of the Kuantifier Reporting job. This may be useful for debugging. + + :::yaml + cronJob: + schedule: "@daily" + +### Installing Kuantifier + +After configuring an appropriate values file for your instance, install the chart via helm: + + :::console + helm install -f -n kuantifier oci://hub.opensciencegrid.org/iris-hep/kuantifier + +Validation +---------- + +After running helm install, ensure that the expected kubernetes objects have been created. The following commands assume +that kuantifier has been installed in the monitoring namespace. + +1. Check that a CronJob was created for running the kuantifier processor: + + :::console + kubectl -n monitoring get cronjob kuantifier-cronjob + +1. Check that a ConfigMap was created to configure processor jobs, and that the values in the ConfigMap + align with the values set in `.processor.config` in the values file: + + :::console + kubectl -n monitoring get configmap kuantifier-processor-config -o yaml + + +If the helm chart artifacts are present as expected, run a test instance of the CronJob and inspect its output. + +1. Create a new job from the CronJob, then find the Pod created by the job + + :::console + kubectl -n monitoring create job --from=cronjob/kuantifier-cronjob kuantifier-test-job + kubectl -n monitoring get pod | grep kuantifier-test-job + +1. Inspect the logs from the processor initContainer, which queries prometheus to generate output records. + + :::console + kubectl -n monitoring logs -c processor + +1. Inspect the logs from the gratia-output contaier, which sends the output records to GRACC. + + :::console + kubectl -n monitoring logs -c gratia-output + +If both the procesor initContainer and gratia-output container run to completion without error, the next step +is to confirm with a member of the OSG technology team that the results are visible in GRACC. + + +[helm-values-readme]: +[values-yaml]: +[values-file]: +[prometheus-community]: +[kubectl]: +[prometheus]: +[kube-state-metrics]: +[kubernetes-secret]: From b44559a2f0eeee9ad8705a311972e254d1333b8c Mon Sep 17 00:00:00 2001 From: Matthew Westphall Date: Mon, 19 Aug 2024 09:58:00 -0500 Subject: [PATCH 2/2] Add first draft of kuantifier install guide --- docs/other/monitor-kubernetes-kuantifier.md | 23 ++++++++++++--------- 1 file changed, 13 insertions(+), 10 deletions(-) diff --git a/docs/other/monitor-kubernetes-kuantifier.md b/docs/other/monitor-kubernetes-kuantifier.md index 149cb81c3..b872a2ace 100644 --- a/docs/other/monitor-kubernetes-kuantifier.md +++ b/docs/other/monitor-kubernetes-kuantifier.md @@ -4,20 +4,22 @@ DateReviewed: 2024-08-16 Monitoring Kubernetes Workloads with Kuantifier =============================================== - +Workload jobs run via Kubernetes will not integrate with [Gratia accounting](./troubleshooting-gratia/) by default. +To report contributions to OSG made via Kubernetes, the [Kuantifier](kuantifier-github) helm chart can be installed +into your cluster. Before Starting --------------- ### Confirm access to a running Kubernetes cluster -All subsequent instructions assume you have access to a running Kuberenetes cluster, and can run [kubectl](kubectl) +All subsequent instructions assume you have access to a running Kubernetes cluster, and can run [kubectl](kubectl) against that cluster. ### Install the Helm command line tools Kuantifier itself, and several of its prerequisites, are installed via [helm chart](https://helm.sh/). The helm -command line tools are used to install helm charts against a running kuberentes cluster, and can be installed +command line tools are used to install helm charts against a running kubernetes cluster, and can be installed as follows: 1. Download the latest [helm release](helm-release) @@ -26,8 +28,8 @@ as follows: ### Install Prometheus and kube-state-metrics in your Kubernetes cluster -Kuantifier relies on [Prometheus](prometheus) with [kube-state-metrics](kube-state-metrics) to gather raw pod metrics. -There are a number of ways to install both, such as via the [promethus community helm charts](prometheus-community): +Kuantifier relies on [Prometheus](prometheus) with [kube-state-metrics](kube-state-metrics) to account for pod resource usage. +There are a number of ways to install both, such as via the [prometheus community helm charts](prometheus-community): 1. Add the prometheus community helm repository to your local helm @@ -48,7 +50,7 @@ There are a number of ways to install both, such as via the [promethus community 1. Ensure that the namespace where your workload pods run is properly configured. - Kuantifier relies on the `spec.containers[].resources.requests.cpu` field in workload pods - to determine proccessor count for GRACC reporting. Ensure a cpu request is set for pods in + to determine processor count for GRACC reporting. Ensure a cpu request is set for pods in your workspace. - Kuantifier relies on the Prometheus pod completion time metric to calculate workload job run times. @@ -92,7 +94,7 @@ must be made prior to installation. For full documentation of the values in the - `NAMESPACE`: The namespace of the pods for which Kuantifier will collect and report metrics. !!! note - Each installation of kuantifier only reports on pods in a single namespaece. You must + Each installation of kuantifier only reports on pods in a single namespace. You must install multiple instances of the chart to support reporting on multiple namespaces. - `SITE_NAME`: The name of the site being reported. @@ -116,7 +118,7 @@ must be made prior to installation. For full documentation of the values in the SUBMIT_HOST: tiger-cluster.chtc.wisc.edu PROMETHEUS_SERVER: prometheus-server.monitoring.svc.cluster.local -1. (Optional) If Prometheus in your cluster is configured to require authentiation, an +1. (Optional) If Prometheus in your cluster is configured to require authentication, an authentication header can be specified via a key within an already-existing [Secret](kubernetes-secret) in the namespace: :::yaml @@ -169,15 +171,16 @@ If the helm chart artifacts are present as expected, run a test instance of the :::console kubectl -n monitoring logs -c processor -1. Inspect the logs from the gratia-output contaier, which sends the output records to GRACC. +1. Inspect the logs from the gratia-output container, which sends the output records to GRACC. :::console kubectl -n monitoring logs -c gratia-output -If both the procesor initContainer and gratia-output container run to completion without error, the next step +If both the processor initContainer and gratia-output container run to completion without error, the next step is to confirm with a member of the OSG technology team that the results are visible in GRACC. +[kuantifier-github]: [helm-values-readme]: [values-yaml]: [values-file]: