Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOFTWARE-5963: Document Kuantifier #194

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
191 changes: 191 additions & 0 deletions docs/other/monitor-kubernetes-kuantifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
title: Monitor Kubernetes Workloads with Kuantifier
DateReviewed: 2024-08-16

Monitoring Kubernetes Workloads with Kuantifier
===============================================

Workload jobs run via Kubernetes will not integrate with [Gratia accounting](./troubleshooting-gratia/) by default.
To report contributions to OSG made via Kubernetes, the [Kuantifier](kuantifier-github) helm chart can be installed
into your cluster.

Before Starting
---------------

### Confirm access to a running Kubernetes cluster

All subsequent instructions assume you have access to a running Kubernetes cluster, and can run [kubectl](kubectl)
against that cluster.

### Install the Helm command line tools

Kuantifier itself, and several of its prerequisites, are installed via [helm chart](https://helm.sh/). The helm
command line tools are used to install helm charts against a running kubernetes cluster, and can be installed
as follows:

1. Download the latest [helm release](helm-release)
1. Unpack the release blob (eg. `tar -zxvf helm-v3.0.0-linux-amd64.tar.gz`)
1. Move the `helm` binary from the archive into a location along your `$PATH` (eg. `mv linux-amd64/helm ~/.local/bin`)

### Install Prometheus and kube-state-metrics in your Kubernetes cluster

Kuantifier relies on [Prometheus](prometheus) with [kube-state-metrics](kube-state-metrics) to account for pod resource usage.
There are a number of ways to install both, such as via the [prometheus community helm charts](prometheus-community):

1. Add the prometheus community helm repository to your local helm

:::console
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

1. Install the kube-state-metrics and prometheus helm charts into your kubernetes cluster.

:::console
helm install kube-state-metrics prometheus-community/kube-state-metrics
helm install prometheus prometheus-community/prometheus

!!! note
These commands will install into the `default` namespace by default. To install into a
non-default namespace such as `monitoring`, use the `-n monitoring` flag in helm install.

1. Ensure that the namespace where your workload pods run is properly configured.

- Kuantifier relies on the `spec.containers[].resources.requests.cpu` field in workload pods
to determine processor count for GRACC reporting. Ensure a cpu request is set for pods in
your workspace.

- Kuantifier relies on the Prometheus pod completion time metric to calculate workload job run times.
This metric is sometimes missed for pods that are spontaneously deleted, such as those created by
Deployments. For best results, run workload pods via Kubernetes Jobs.

- (Known issue) Kuantifier currently doesn't support calculating usage metrics for workload pods
running multiple containers. Ensure that workload pods in your namespace have only one container.

Installation
------------

Kuantifier itself is also installed via a helm chart, hosted at hub.opensciencegrid.org/iris-hep/kuantifier.


### Configuring Kuantifier's Values File

Several instance-specific modifications to the default [Values File](values-file) provided with the chart
must be made prior to installation. For full documentation of the values in the values file, see the
[helm chart README on Github](helm-values-readme).

1. Fetch the default values.yaml for kuantifier. This file can be obtained in several ways.
- Via the helm cli:

:::console
helm show values oci://hub.opensciencegrid.org/iris-hep/kuantifier

- Via the [kuantifier Github repository](values-github).

!!! note
Ensure that the release tag for values.yaml in the git repo corresponds to the version of the chart you're installing.


1. Update the top-level `.outputFormat` in values.yaml to output records to [GRACC](https://gracc.opensciencegrid.org/)

:::yaml
outputFormat: "gratia"

1. Update the `.processor.config` map with the details of your deployment.
- All of the following need to be set:
- `NAMESPACE`: The namespace of the pods for which Kuantifier will collect and report metrics.

!!! note
Each installation of kuantifier only reports on pods in a single namespace. You must
install multiple instances of the chart to support reporting on multiple namespaces.

- `SITE_NAME`: The name of the site being reported.
- `SUBMIT_HOST`: Uniquely identifying name for the Kubernetes cluster where your workload pods run, in FQDN format.
- `VO_NAME`: Virtual Organization (VO) of jobs.

- Additionally, the following may need to be set:
- `PROMETHEUS_SERVER`: The DNS name of the prometheus server installed in your kubernetes cluster.
- If Prometheus was installed in your cluster via the prometheus community helm chart in the monitoring
namespace, the DNS name will be `prometheus-server.monitoring.svc.cluster.local`
- Otherwise, [construct](https://kubernetes.io/docs/concepts/services-networking/service/#dns) the URL based on the standard Kubernetes service discovery mechanism (i.e. service name and namespace).

- A fully configured `.processor.config` might look like:

:::yaml
processor:
config:
NAMESPACE: workload-namespace
SITE_NAME: CHTC
VO_NAME: University of Wisconsin
SUBMIT_HOST: tiger-cluster.chtc.wisc.edu
PROMETHEUS_SERVER: prometheus-server.monitoring.svc.cluster.local

1. (Optional) If Prometheus in your cluster is configured to require authentication, an
authentication header can be specified via a key within an already-existing [Secret](kubernetes-secret) in the namespace:

:::yaml
processor:
prometheus_auth:
secret: <secret name>
key: <key in secret containing auth header>

1. (Optional) Update the frequency of the Kuantifier Reporting job. This may be useful for debugging.

:::yaml
cronJob:
schedule: "@daily"

### Installing Kuantifier

After configuring an appropriate values file for your instance, install the chart via helm:

:::console
helm install -f <values.yaml> -n <install namespace> kuantifier oci://hub.opensciencegrid.org/iris-hep/kuantifier

Validation
----------

After running helm install, ensure that the expected kubernetes objects have been created. The following commands assume
that kuantifier has been installed in the monitoring namespace.

1. Check that a CronJob was created for running the kuantifier processor:

:::console
kubectl -n monitoring get cronjob kuantifier-cronjob

1. Check that a ConfigMap was created to configure processor jobs, and that the values in the ConfigMap
align with the values set in `.processor.config` in the values file:

:::console
kubectl -n monitoring get configmap kuantifier-processor-config -o yaml


If the helm chart artifacts are present as expected, run a test instance of the CronJob and inspect its output.

1. Create a new job from the CronJob, then find the Pod created by the job

:::console
kubectl -n monitoring create job --from=cronjob/kuantifier-cronjob kuantifier-test-job
kubectl -n monitoring get pod | grep kuantifier-test-job

1. Inspect the logs from the processor initContainer, which queries prometheus to generate output records.

:::console
kubectl -n monitoring logs <test-job-pod-name> -c processor

1. Inspect the logs from the gratia-output container, which sends the output records to GRACC.

:::console
kubectl -n monitoring logs <test-job-pod-name> -c gratia-output

If both the processor initContainer and gratia-output container run to completion without error, the next step
is to confirm with a member of the OSG technology team that the results are visible in GRACC.


[kuantifier-github]: <https://github.com/rptaylor/kapel/>
[helm-values-readme]: <https://github.com/rptaylor/kapel/blob/master/chart/README.md>
[values-yaml]: <https://github.com/rptaylor/kapel/blob/master/chart/values.yaml>
[values-file]: <https://helm.sh/docs/chart_template_guide/values_files/>
[prometheus-community]: <https://github.com/prometheus-community/helm-charts/tree/main>
[kubectl]: <https://kubernetes.io/docs/reference/kubectl/>
[prometheus]: <https://prometheus.io/>
[kube-state-metrics]: <https://github.com/kubernetes/kube-state-metrics>
[kubernetes-secret]: <https://kubernetes.io/docs/concepts/configuration/secret/>
Loading