Deploys kubernetes infrastructure components to multiple kubernetes environments using helm and helmfile.
Helmfile is being used to declaratively manage the configuration and deployment of these charts across multiple environments. Helmfile is like helm for helm. It allows a gitflow approach to using helm chart releases, where the release configuration customization is driven by files in git such as changes to yaml values files.
- nginx-ingress
- etcd-operator
- vault
- cert-manager
- prometheus-operator
- myapp-prometheus-operator
- prometheus-pushgateway
- jenkins
- sonarqube
- strimzi (kafka)
- twistlock (Prisma container security)
- atlantis (webhook setup for https://github.com/contino/srebot)
- spinnaker
- jira
- confluence
- gitlab
- hygieia
- confluent-oss (kafka)
- gcp gke on google cloud
- azure aks on azure cloud
- ldev airgap local k8s lower environmentss
- lprod airgap local higher environments
Additional kubernetes cluster including on-prem Anthos, Azure, AWS and others can be accommodated with minimal effort.
Instructions are provided on setuping up k8s cluster using terraformcloud.md
# brew install kubernetes-helm
# helm version
version.BuildInfo{Version:"v3.0.1", GitCommit:"7c22ef9ce89e0ebeb7125ba2ebf7d421f3e82ffa", GitTreeState:"clean", GoVersion:"go1.13.4"}
# brew install helmfile
# helmfile --version
helmfile version v0.100.2
# helm plugin install https://github.com/databus23/helm-diff
# brew install jsonnet
# pip install pyyml (python2)
# brew install stern
# brew install kubectx
# brew install sops
# helm plugin install https://github.com/futuresimple/helm-secrets
# brew install gnu-getopt
# brew install cfssl
Kubernetes creadentials for the appropriate environement must be used with helmfile/helm/kubectl.
For vk8, a KUBECONFIG file for the myapp-prometheus service account has been generated by the k8s admin.
This can be used by setting the environment var:
# export KUBECONFIG=/path/to/kubeconfig_serviceaccount
Note that kubens, kubectx and other k8s tools might impact usage.
Alternatively, the helmfile --kube-context can be used to specify credentials.
For gcp, you can setup k8s gke credentials using a command like:
gcloud container clusters get-credentials acme --zone us-central1-c --project bhood-214523
All required helm charts and configurations are defined in helmfile.yaml. You must specify the environment using -e or --environment flag.
Lint helmfile/charts:
# helmfile --environment ldev lint
Diff between your kubernetes cluster and the helmfile:
# helmfile --environment ldev diff
To apply the changes with a decision gate to validate:
# helmfile --environment lprod --interactive apply
To apply the changes without doing a diff or decision gate:
# helmfile --environment gcp sync
- https://github.com/roboll/helmfile#environment-secrets
- Download and install GPG: https://releases.gpgtools.org/GPG_Suite-2019.2.dmg
- Generate and upload key
- create .sops.yaml DO NOT CHECK IN
creation_rules:
# Encrypt with AWS KMS
#- kms: 'arn:aws:kms:us-east-1:222222222222:key/111b1c11-1c11-1fd1-aa11-a1c1a1sa1dsl1+arn:aws:iam::222222222222:role/helm_secrets'
# Encrypt using GCP KMS
#- gcp_kms: projects/mygcproject/locations/global/keyRings/mykeyring/cryptoKeys/thekey
# As failover encrypt with PGP
- pgp: 'TODO_XXXXXXXX'
# For more help look at https://github.com/mozilla/sops
- create secret.yaml.dec (could differ by env) with secrets DO NOT CHECK IN
grafana_adminPassword: TODO_prom-operator
elasticsearch_svc_grafana_password: TODO_RXgXXXXXXx
stackdriver_privateKey: |
-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCwshHZBbbax4Ho
TODO_REALKEYHERE
EUOLCkhgv8ukISVqlQ3oyuU=
-----END PRIVATE KEY-----
hangouts_private_key: "-----BEGIN PRIVATE KEY-----\nTODO_REALKEYHERE\nCXDGE8o2B2lYXy3jGBWacQ==\n-----END PRIVATE KEY-----\n"
- ./secret.sh
- git add /environment/XXX/secret.yaml and commit/push
- in values.yaml.gotmpl use secrets:
{{ .Environment.Values.elasticsearch_svc_grafana_password }}
{{ .Environment.Values.stackdriver_privateKey }}
{{ .Environment.Values.hangouts_private_key }}
{{ .Environment.Values.grafana_adminPassword }}
The Prometheus Operator provides easy monitoring for k8s services and deployments besides managing Prometheus, Alertmanager and Grafana configuration while preserving configurability as well as making the configuration Kubernetes native.
When you deploy a new version of your app, k8s creates a new pod (container) and after the pod is ready k8s destroy the old one. Prometheus is on a constant vigil, watching the k8s api and when detects a change it creates a new Prometheus configuration, based on the services (pods) changes.
Prometheus-operator uses a Custom Resource Definition (CRD), named ServiceMonitor, to abstract the configuration to target. As an example below, let’s see how to monitor a NGINX pod with ServiceMonitor. The ServiceMonitor will select the NGINX pod, using the matchLabels selector. The prometheus-operator will search for the pods based on the label selector and creates a prometheus target so prometheus will scrape the metrics endpoint.
In vk8, the full Prometheus Operator (including CED/operator, AlertManger, Prometheus, and Grafana) are deploy by the vk8 platform. These are used to self-monitor the k8s cluster, and to monitor platform dependencies such as Jenkins, ELK, and Kafka. Deployment in this cluster deploys the Promwetheus Operator helm chart with the CRD/operator and AlertManager disabled, as these are shared by the platform Prometheus. Only Prometheus and Grafana are installed in a myapp-prometheus namespace using a myapp-prometheus serviceaccount.
- https://alertmanager-gcp.{{domain}}/ (shared)
- https://grafana-gcp.{{domain}}/ (platform)
- https://prometheus-gcp.{{domain}}/ (platform)
- https://pushgateway-gcp.{{domain}}/ (platform)
- https://myapp-grafana-gcp.{{domain}}/
- https://myapp-prometheus-gcp.{{domain}}/
Endpoints follow a naming convention where the "gcp" portion above is the environment, which is one of lprod, ldev, gcp
Grafana dashboards (and other sidecar dashboards) can be added using a config map with the label 'grafana_dashboard'. Dashboards in config map with this label will automatically be discovered and added to grafana.
To add or modify a dashboard:
- put the dashboard json in the resources/XXX-dashboards/ directories. You can edit the dashboard using the grafana web UI, then grab the "JSON Model".
- run the shell script resources/loadall.sh
- Note that this approach manages the dashboards in a separate workflow from the helmfile, so that changes can be made without any helm change.
Grafana datasources often include secrets which should not be checked into source control (git or bitbucket).
Environment-specific datasources are in config/myappw-prometheus-operator/{{Environment.Name}}.yaml.gotmpl
These use environment-specific environment secrets in environments/{{Environment.Name}}/secrets.yaml
Because of network restrictions, the k8s cluster might not be able to reach https://grafana.com/api to download and install plugins. The preferred approach for security and docker image startup speed is to preinstall plugins on the docker image and store the docker images in the local repo per https://github.com/grafana/grafana/blob/c344a3a66e47c180867fb77a869774b69af18a89/docs/sources/installation/docker.md
https://github.com/grafana/grafana.git
git clone https://github.com/grafana/grafana.git
cd grafana
# git checkout 6.6.1
cd packaging/docker/custom
docker build \
--build-arg "GRAFANA_VERSION=latest" \
--build-arg "GF_INSTALL_IMAGE_RENDERER_PLUGIN=true" \
--build-arg "GF_INSTALL_PLUGINS=grafana-piechart-panel,grafana-clock-panel,grafana-simple-json-datasource" \
-t dockerhub.artifactory.{{domain}}/grafana/grafana:6.6.2-1 -f Dockerfile .
docker login dockerhub.artifactory.{{domain}}
docker push dockerhub.artifactory.{{domain}}/grafana/grafana:6.6.2-1
docker run -d -p 3000:3000 --name=grafana dockerhub.artifactory.{{domain}}/grafana/grafana:6.6.2-1
Some extra files are needed for configuration that include secrets which should not be checked into source control (git or bitbucket).
To add or modify extra:
- update the files in the resources/extra/ directory. These can include secrets, and should not be committed with secrets included.run the shell script resources/extra.sh to update the associated configmap
- Note that this approach manages the datesources in a separate workflow from the helmfile. Changes can be made without any helm change and are typically done before the helm chart is deployed to an environment.
In vk8 environments, the alertmanager and prometheus operator are install and operated by the cluster administrators. The shared alertmanager is shared with myapp prometheus and potentially other cluster tenants (myapp). AlertManager uses calert and is configured for google chat, and configuration must be coordinated with platform operators.
Alertmanager includes receiver configuration for gchat-notify (using calert) and route configuration for targeting specific receivers has on alert labels. severity and profile (environment) labels are used to target specific rooms. Also, alert annotations description, message, runbook_url and link are used customize notification messages.
The myapp-prometheus-operator standard alerts includes PrometheusNotConnectedToAlertmanagers which should be manually disabled for now (or configure for Blackhole). Future helm chart version could be used to eliminated this alert when the alertmanager is not included in the release and/or modified to match the tags of the configured shared alertmanager.
SLO dashboards are generated using jsonnet using a IaC approach.
helmfile hooks trigger generation of prometheus rules, prometheus alerts, and grafana dashboards for kubeapi and myappapi specs, and these are deployed to prometheus.
A standardized RED method (Request Rate, Errors, Duration) using a data-driven IaC approach based on https://github.com/bitnami-labs/kubernetes-grafana-dashboards
See JSONNET.md for more info
SonarQube is an open-source continous code inspection tools which empowers developers to write cleaner and safer code.
- https://github.com/sbaudoin/sonar-yaml/releases/download/v1.5.1/sonar-yaml-plugin-1.5.1.jar
- https://github.com/dependency-check/dependency-check-sonar-plugin/releases/download/2.0.2/sonar-dependency-check-plugin-2.0.2.jar
- https://github.com/SonarSource/sonar-ldap/releases/tag/2.2.0.608
Hashicorp Vault, backed by etcd
- etcd and root secrets stored in GCP KMS
- etcd is HA with TLS
- vault is HA with etcd backend
- must unseal manually, autoseal not yet confiured
kubectl exec -ti -n vault vault-0 -- vault operator init
keep the 5 unseal keys and the inital root token somewhere safe
on each of vault-0, 1, 2 you must provide at least 3 unseal keys to each
kubectl exec -ti -n vault vault-0 -- vault operator unseal XXXX
- https://github.com/confluentinc/cp-helm-charts
- now unsupport by confluent - pretty lame and now deprecated
- https://github.com/strimzi/strimzi-kafka-operator/tree/master/helm-charts/strimzi-kafka-operator
- installs basic operator in namespace kafka
- postsync install CRD for kafka cluster and topic in namespace my-kafka-project
- NOTE: must manually add secret with prometheus config for strimzi or prometheus will not start kubectl create secret generic prometheus-operator-prometheus-scrape-confg --from-file=additional-scrape-configs.yaml -n prometheus
- See strimzi.md for more info
- Helm (https://helm.sh/docs/helm/)
- Helmfile (https://github.com/roboll/helmfile)
- https://itnext.io/kubernetes-monitoring-with-prometheus-in-15-minutes-8e54d1de2e13
- https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html
- https://github.com/helm/charts/tree/master/stable/prometheus-operator
These charts use serviceaccounts and namespaces created by the cluster administratory beforehand
- serviceaccount myapp-prometheus is used for all releases (may be split out to multiple in the future)
- namespace myapp-prometheus is used for prometheus-operatoe and prometheus-pushgateway for example app
- namespace sonarqube is used for sonarqube
- namespace vault is used for etcd-operator and vault
- namespace kafka is used for the Strimzi Kafka operator
- namespace my-kafka-project is used for the example Kafka app
- namespace cp is used for confluent platform kafka (deprecated)
- namespace twistlock is used for Prisma Twistlock
- namespace devops is a default (e.g. jenkins, atlantis)
For gcp, the cluster, service accounts, and namespaces must be setup outside of this helmfile project
kubectl create ns nginx
kubectl create ns prometheus
kubectl create ns sonarqube
kubectl create ns myapp-prometheus
kubectl create ns devops
kubectl create ns cp
kubectl create ns vault
kubectl create ns kafka
kubectl create ns my-kafka-project
Also, compute static IP and DNS records to support Ingress must also be done
- https://cloud.google.com/kubernetes-engine/docs/tutorials/configuring-domain-name-static-ip
- https://medium.com/faun/dns-and-gke-network-configuration-on-google-cloud-platform-1bfdc74fe2e
It may be necessary to setup firewall access from GKE control plane to your cluster nodes.
#!/bin/bash
CLUSTER_NAME=clustername
CLUSTER_REGION=europe-west1
VPC_NETWORK=$(gcloud container clusters describe $CLUSTER_NAME --region $CLUSTER_REGION --format='value(network)')
MASTER_IPV4_CIDR_BLOCK=$(gcloud container clusters describe $CLUSTER_NAME --region $CLUSTER_REGION --format='value(privateClusterConfig.masterIpv4CidrBlock)')
NODE_POOLS_TARGET_TAGS=$(gcloud container clusters describe $CLUSTER_NAME --region $CLUSTER_REGION --format='value[terminator=","](nodePools.config.tags)' --flatten='nodePools[].config.tags[]' | sed 's/,\{2,\}//g')
echo $VPC_NETWORK
echo $MASTER_IPV4_CIDR_BLOCK
echo $NODE_POOLS_TARGET_TAGS
gcloud compute firewall-rules create "allow-apiserver-to-admission-webhook-8443" \
--allow tcp:8443 \
--network="$VPC_NETWORK" \
--source-ranges="$MASTER_IPV4_CIDR_BLOCK" \
--target-tags="$NODE_POOLS_TARGET_TAGS" \
--description="Allow apiserver access to admission webhook pod on port 8443" \
--direction INGRESS
or
gcloud container clusters describe acme --region us-central1-c | yq r - ipAllocationPolicy.clusterIpv4CidrBlock
10.36.0.0/14
gcloud compute firewall-rules list \
--filter 'name~^gke-acme' \
--format 'table(
name,
network,
direction,
sourceRanges.list():label=SRC_RANGES,
allowed[].map().firewall_rule().list():label=ALLOW,
targetTags.list():label=TARGET_TAGS
)'
gcloud compute firewall-rules create allow-apiserver-to-admission-webhook-8443 \
--action ALLOW \
--direction INGRESS \
--source-ranges 10.36.0.0/14 \
--rules tcp :8443 \
--target-tags gke-acme-f12e5ab7-node
- https://github.com/roboll/helmfile
- https://www.reddit.com/r/devops/comments/awy81c/managing_helm_releases_terraform_helmsman/
- https://docs.cloudposse.com/tools/helmfile/
- https://medium.com/@naseem_60378/helmfile-its-like-a-helm-for-your-helm-74a908581599
- https://costimuraru.wordpress.com/2019/08/22/setup-your-kubernetes-cluster-with-helmfile/
- https://github.com/roboll/helmfile
- secrets other than pgp (vault)
- integrated json net
- ingress
- cert-manager
- k8s_setup with terraform to create GKE cluster in trial account
- Dockerfile
- example app (webGoat)?
- helmfile operator https://github.com/mumoshu/helmfile-operator or https://github.com/fluxcd/helm-operator
- JenkinsFile
- config/setup jira/confluence/gitlab etc
- externalDNS
- LDAP/AD integration (or front-end it with oauth2-proxy and do SSO)
- sonarqube rules/qualityprofile/qualitygate as code (backup/restore?)e
- prometheus monitoring and dashboards for sonar, gitlab, jira, ....
- helmfile hooks to create ns
- helmfile hooks for resources/extra.sh, dashboard loadall.sh, etc.