The Observability Operator deploys & maintains a common platform for Application Services to share and utilize to aid in monitoring & reporting on their service components. It integrates with the Observatorium project for pushing metrics and logs to a central location.
Prometheus is a free software application used for event monitoring and alerting. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.
Prometheus's Alertmanager is also supported, which handles alerts sent by Prometheus & takes care of deduplicating, grouping, and routing them to the correct receiver integration (such as, in our case, PagerDuty). It also takes care of silencing and inhibition of alerts.
Grafana is a multi-platform open source analytics and interactive visualization web application. When configured with supported data sources, Grafana provides charts, graphs, and other visualizations via UI dashboards.
Promtail is a log aggregator for Loki, Grafana's platform for collecting and analyzing logs. Loki is the logging backend used in Observatorium.
If you find yourself asking, chances are good a conversation's already been started in that regard, but if not, reach out! It'll likely be a bit more involved than plug-and-play and we'd like to make sure we support your needs. That being said (and know that what follows is absolutely subject to change)...
Observability Operator is intended to support multiple application services, each of which will be responsible for maintaining their own configuration repository & instantiating a ConfigMap containing a bit of information the operator needs in order to read from it. At current, it's expected that configuration repos reside within our 'bf2' organization as we use a special read-only mechanism limited to within our organization for access.
As an example, first take a look at the configuration repository
for the first service we've onboarded, Managed Kafka. There you'll find an index file and various configuration files referenced from within. In order to use this config
repo, the Observability Operator must be told about it via a Secret
:
kind: Secret
apiVersion: v1
metadata:
name: kafka-observability-configuration
namespace: kafka-observability
labels:
configures: observability-operator
data:
access_token: '<token here>'
channel: 'resources'
repository: 'https://api.github.com/repos/bf2fc6cc711aee1a0c2a/observability-resources-mk/contents'
tag: <tag or branch>
The Observability Operator doesn't care too much about what namespace the Secret resides in (that's not to say that we won't have an opinion, though!). Instead, it scans all namespaces for any Secrets matching a particular label set as specified in the Observability CR (more on that in a bit):
configurationSelector:
matchLabels:
configures: "observability-operator"
Within a given resources folder an index.json file
containing, at a minimum, id
and config
fields must exist:
{
"id": "shiny-managed-service-development",
"config": {...}
}
The id
field (specifying both the service and channel) is used by the observability operator to label & track
various generated resources.
The config
field may contain various entries as appropriate for your service:
-
config.grafana.dashboards
expects an array list ofsubdirectory/file.yaml
entries, each pointing to a complete Grafana Dashboard YAML definition file:"grafana": { "dashboards": [ "grafana/foo-dashboard.yaml", "grafana/bar-dashboard.yaml", ] }, }
-
config.promtail
specifies whether Promtail should be used and, if so, a namespace label selector for matching:"promtail": { "observatorium": "default" "enabled": true, "namespaceLabelSelector": { "app": "strimzi" } },
-
config.promtail.observatorium
specifies theid
of the Observatorium config to forward logs to -
config.alertmanager
indicates the name of two prerequisite secrets assumed to pre-exist on the cluster for configuration of Prometheus PagerDuty & Alertmanager integrations:"alertmanager": { "pagerDutySecretName": "pagerduty", "deadmansSnitchSecretName": "deadmanssnitch" },
-
config.prometheus.pod_monitors
expects an array list ofsub/directory/file.yaml
entries, each pointing to a complete Prometheus PodMonitor YAML definition file:"prometheus": { "pod_monitors": [ "prometheus/pod_monitors/foo-monitor.yaml", "prometheus/pod_monitors/bar-monitor.yaml", ],
-
config.prometheus.rules
expects an array list ofsub/directory/file.yaml
entries, each pointing to a complete PrometheusRule YAML definition file:"rules": [ "prometheus/prometheus-rules.yaml" ],
-
config.prometheus.federation
expects a singlesubdirectory/file.yaml
location pointing to a file containing an array of regex patterns to be concatenated & used in instantiating a Prometheus additional scrape config secret:"federation": "prometheus/federation-config.yaml",
-
config.prometheus.observatorium
specifies theid
of the Observatorium config to forward metrics to -
config.prometheus.remoteWrite
expects a singlesubdirectory/file.yaml
location pointing to a file containing an array of regex patterns to be concatenated & used in instantiating the Prometheus operand (CR):"remoteWrite": "prometheus/remote-write.yaml"
-
config.observatoria
an array of observatorium configs, each with an id referenced by prometheus and/or promtail:[{ "id": "default", "secretName": "observatorium-configuration-red-hat-sso" } }, ...]
Additionally, an empty ConfigMap can be created in a target namespace to prevent an Observability operand (CR) from being created in that namespace.
- The ConfigMap requires the
name
to be set toobservability-operator-no-init
and the targetnamespace
to be specified:kind: ConfigMap apiVersion: v1 metadata: name: observability-operator-no-init namespace: my-target-namespace
Your spec will be required to specify, at a minumum, two things:
- a
resyncPeriod
to indicate how often external config should be re-fetched - a
retention
to configure the lifetime of stored data - a
configurationSelector
indicating what labels to match when scanning for external config info ConfigMaps as previously mentioned
apiVersion: observability.redhat.com/v1
kind: Observability
metadata:
name: observability-sample
spec:
resyncPeriod: 1h
retention: 45d
configurationSelector:
matchLabels:
configures: "observability-operator"
- Prometheus storage config
spec: storage: prometheus: volumeClaimTemplate: spec: storageClassName: ssd resources: requests: storage: 40Gi
- Node Tolerations
spec: tolerations: - effect: NoSchedule key: node-role.kubernetes.io/infra operator: Exists
- Node Affinities
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-role.kubernetes.io/infra operator: Exists
- golang 1.19+
- operator-sdk v1.4.2
- CodeReady Containers 1.18 (OCP 4.6.1) or later
- OpenShift command line tool
- We recommend targeting a local CRC instance for daily development. Though we're hedging our bets a bit with these amounts, we do feel additional memory config params are essential in having a stable platform to run against:
crc start --pull-secret-file $HOME/.crc/cache/pull-secret.txt --cpus 6 --memory 24576
- Once your cluster is up and running, determine which (or create a new) namespace to target with various generated
files. The operator will look in this namespace for an Observability operand (CR) and if not found, generate its own.
In order for it to do so, you need to indicate the namespace in one of two ways:
- specify the
WATCH_NAMESPACE=foo
environment variable when running - create a file containing only the namespace name at
/var/run/secrets/kubernetes.io/serviceaccount/namespace
to emulate a pod environment & prevent having to supply the env var every run.
- specify the
Note that some files require values that are not supplied here or within sample files for security reasons - please review the content of each!
-
PagerDuty secret:
oc apply -f config/samples/secrets/pagerduty.yaml
-
DeadmansSnitch secret:
oc apply -f config/samples/secrets/deadmanssnitch.yaml
-
Sendgrid secret:
oc apply -f config/samples/secrets/sendgrid.yaml
-
Auth provider config secret:
Users can choose between two auth configurations; dex or Red Hat SSO
- Dex config secret
oc apply -f config/samples/secrets/observatorium-dex-credentials.yaml
- Red Hat SSO secret
oc apply -f config/samples/secrets/observatorium-configuration-red-hat-sso.yaml
-
External config repo secret:
- The Observability stack requires a Personal Access Token to read externalized configuration from within the bf2 organization. For development cycles, you will need to generate a personal token for your own GitHub user (with bf2 access) and place the value in the Secret.
- To generate a new token:
- Follow the steps found here, making sure to check ONLY the
repo
box at the top of the scopes/permissions list (which will check each of the subcategory boxes beneath it). - Copy the value of your Personal Access Token to a secure private location. Once you leave the page, you cannot access the value again & you will be forced to reset the token to receive a new value should you lose the original.
- Take care not to push your PAT to any repository as if you do, GitHub will automatically revoke your token as soon as you push, and you'll need to follow this process again to generate a new token.
- Follow the steps found here, making sure to check ONLY the
- Apply the Secret with token value substituted in:
oc apply -f config/samples/secrets/observability_secret.yaml
-
The required secrets listed above can be configured and deployed using
make deploy/secrets
- The following parameters can be modified:
-
NAMESPACE: Defaults to current namespace in use.
-
OBSERVATORIUM_TENANT: Defaults to
managedKafka
-
OBSERVATORIUM_GATEWAY: Defaults to
https://observatorium-mst.api.stage.openshift.com
-
OBSERVATORIUM_AUTH_TYPE: Defaults to
redhat
-
OBSERVATORIUM_RHSSO_URL: Defaults to
https://sso.redhat.com/auth/
-
OBSERVATORIUM_RHSSO_REALM: Defaults to
redhat-external
-
OBSERVATORIUM_RHSSO_METRICS_CLIENT_ID: No default provided
-
OBSERVATORIUM_RHSSO_METRICS_SECRET: No default provided
-
OBSERVATORIUM_RHSSO_LOGS_CLIENT_ID: No default provided
-
OBSERVATORIUM_RHSSO_LOGS_SECRET: No default provided
-
GITHUB_ACCESS_TOKEN: No default provided
-
- More information about configurable parameters can be found by running:
oc process --parameters -f ./templates/secrets-template.yml
- The following parameters can be modified:
-
For users deploying to CRC, an additional secret is required in the
openshift-monitoring
namespace for grafana datasources. This secret can be deployed with the command:make deploy/crc/secret
-
Priority Class
- If the Observability Operator was not installed using OLM, you need to create the requried Priority Class yourself. Use the command to add the necessary Priority Class:
oc apply -f config/samples/prioclass.yaml
If this is the first time you've run against the cluster (or your CRD has changed and been uninstalled):
make install
Run the operator as a local process:
WATCH_NAMESPACE=<namespace> make run
alternatively, you can deploy the operator's latest image to your cluster:
make deploy
- If you encounter errors regarding PrometheusRule validation webhooks, you can use the following:
oc delete ValidatingWebhookConfiguration prometheusrules.openshift.io
Add the following into your launch.json
file to enable running and debugging.
"version": "0.2.0",
"configurations": [{
"name": "Observability Operator",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/main.go",
"env": {
"WATCH_NAMESPACE": "kafka-observability",
"KUBERNETES_CONFIG": "~/.kube/config",
"OPERATOR_NAME": "observability-operator"
},
"cwd": "${workspaceFolder}",
"args": []
}]
}
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
If you'd like to report an issue, feel free to use our project Issues page.
If all else fails, you can reach us at [email protected]