This module provides a simple and opinionated way to build a standard Azure AKS Kubernetes cluster with a common set of services. By providing a standard Kubernetes pattern we reduce the cognitive load on the teams who need to run these clusters and benefit from an economy of scale. The module API and behaviour is designed (as far as possible) to be common across all RSG Kubernetes implementations which allows for greater portability between implementations.
The module follows a SemVer versioning strategy and is packaged and released as a tested pattern with a corresponding support policy. For detailed documentation and more information on the Kubernetes ecosystem please visit the RSG Kubernetes Documentation.
Support for this module isn't operational; by using this module you're agreeing that operational support will be provided to your end-users by your cluster operators and that the core engineering team will only interact with these operational teams.
At any given time the last 3 minor versions of this module are supported; this means these versions will get patch fixes for critical bugs, core service CVEs & AKS patches. It is the module operators and end-users responsibility to make sure that clusters are running the latest patch version of a supported version, failure to do this in a timely manner could expose the cluster to significant risks.
Note If there have been versions
v3.0.0
,v3.1.0
,v3.1.1
,v3.2.0
,v3.3.0
&v3.3.1
released then the supported versions would bev3.1.1
,v3.2.0
&v3.3.1
(latest patch versions of the last 3 minor versions).
Before using this module, the whole README should be read and you should be familiar with the concepts in the RSG Kubernetes Documentation; some common questions are answered in the module FAQ.
If you have unanswered questions after reading the documentation, please visit RSG Kubernetes Discussions where you can either join an existing discussion or start your own.
The core engineering team are responsible for triaging bugs in a timely manner and providing fixes either in the next minor release or as a patch to supported versions. The following constraints need to be met before an issue can be reported, failure to meet these may result in the issue being closed if not addressed promptly.
- The reporter must be a cluster operator
- The core team doesn't have the capacity to deal directly with end-users
- Clusters must be running a supported minor version with the latest patch
- Complex issues may need to be demonstrated on a cluster running the latest version
- Only clusters deployed using this module are supported
- Forks of this module are not supported
- Nesting this module in a wrapper module is not supported
- Issues should only be reported where the only change is this module
- Terraform has a number of issues when using a large graph of changes
- Issues should only be created after checking there isn't already an open issue
- Issues need to have context such as Kubernetes version, module version, region, etc
- Issues should have an example of how to replicate them
See documentation for system architecture, requirements and user guides for cluster services.
Warning AKS automatic upgrades can be sensitive to incorrectly configured workloads or transient failures in the AKS system; as a result you should closely monitor your clusters to ensure they're suitable for automatic upgrades.
The AKS module is handles the Kubernetes minor version updates and the core service versions; the core service versions are upgraded as part of a module upgrade and the Kubernetes minor version is a module input. The AKS module also configures a maintenance window for when the control plane automatically upgrades and a maintenance window for when the nodes automatically upgrade; it is possible (but UNSUPPORTED) to disable these upgrades. The control plane upgrade window will also be used for upgrading the AKS managed services in the cluster, and so is always required to be configured.
You should never upgrade the AKS Kubernetes minor version outside the AKS module; however, it may be feasible to manually upgrade the patch version, though this is UNSUPPORTED until we verify that it doesn't cause adverse effects.
Always use a supported AKS module version. It's highly recommended to move to the latest supported AKS module version promptly.
Control plane upgrades either bump the Kubernetes version (patch or minor) or upgrade AKS services; automated control plane upgrades will only ever be for Kubernetes patch upgrades or AKS managed services. And Kubernetes bump will first upgrade the control plane and then upgrade the nodes; this sequence might disrupt some workloads, unlike some other managed Kubernetes solutions.
Node upgrades take the form of a new node image version which is used to replace nodes with the old version; this is implemented by creating new nodes (based on the surging configuration) and terminating workloads form old nodes once the new nodes are ready which will cause some workload interruption. Unfortunately the current implementation attempts to keep the VMs underpinning the original nodes instead of the new VMs so there is additional disruption while workloads flip-flop between "new" nodes.
AKS regularly provides new images with the latest updates, Linux node images are updated weekly and Windows node images updated monthly, so your maintenance window configuration should take this into account.
Core services are upgraded by running a new version of this module or by changing the Kubernetes version for the cluster; these services have been tested together to provide a simple and safe way to keep the cluster secure and functional.
A VNet can contain multiple AKS clusters and be shared with non-AKS resources, however there should be a dedicated subnet and a unique route table for each AKS cluster. It is technically possible to host multiple AKS cluster node pools in a subnet, this is not recommended and may cause connectivity issues but can be achieved by passing in a unique non-overlapping CIDR block to each cluster via the podnet_cidr_block
input variable. The two modes of network outbound traffic from the pods can be through a load balancer or a managed NAT gateway. The load balancer is configured by AKS within the module, while the NAT gateway needs to be configured externally.
Subnet configuration, in particular sizing, will largely depend on the network plugin (CNI) used. See the network model comparison for more information.
v1.0.0-beta.24
introduces Thanos as a core service for Prometheus, providing high availability and long-term metrics. The backend utilizes an Azure service endpoint for secure access, improving security and decreasing internet traffic for cluster access. To use versions beyond v1.0.0-beta.24
, operators must configure the Azure service endpoint in their subscription before consuming the service. See rsg-terraform-azurerm-aks/issues/861 for more details.
Configuration for the DNS can be configured via inputs in the core_services_config
variable.
For example, the module exposes ingress endpoints for core services such as Prometheus, Grafana and AlertManager UIs. The endpoints must be secured via TLS and DNS records must be published to Azure DNS for clients to resolve.
core_services_config = {
cert_manager = {
acme_dns_zones = ["us-accurint-prod.azure.lnrsg.io"]
default_issuer_name = letsencrypt # for production usage of letsencrypt
}
external_dns = {
private_resource_group_name = "us-accurint-prod-dns-rg"
private_zones = ["us-accurint-prod.azure.lnrsg.io"]
# public_domain_filters = ["us-accurint-prod.azure.lnrsg.io"] # use this if you use public dns zone
}
ingress_internal_core = {
domain = "us-accurint-prod.azure.lnrsg.io"
public_dns = false # use true if you use public_domain_filters as above
}
}
- The
cert_manager
block specifies the public zone Let's Encrypt will use to validate the domain and its resource group. - The
external_dns
block specifies domain(s) that user services can expose DNS records through and their resource group - all zones managed by External DNS must be in a single resource group. - The
ingress_internal_core
block specifies the domain to expose ingress resources to, consuming DNS/TLS services above.
It's very likely the same primary domain will be configured for all services, perhaps with External DNS managing some additional domains. The resource group is a required input so the module can assign appropriate Azure role bindings. It is expected that in most cases all DNS domains will be hosted in a single resource group.
While External DNS supports both public and private zones, in split-horizon setups only the private zone should be configured, otherwise both zones will be updated with service records. The only scenario for configuring both public and private zones of the same name is to migrate public records to private records. Once this is done, the public zone should be removed and records manually deleted in the public zone.
The node group configuration provided by the node_groups
input variable allows a cluster to be created with node groups that span multiple availability zones and can be configured with the specific required behaviour. The node group name prefix is the map key and at a minimum node_size
& max_capacity
must be provided with the other values having a default (see Appendix C).
AKS always created a system node pool upon creation and modifying the system node pool results in the cluster being destroyed and re-built. An "initial" bootstrap node pool allows us to modify the system node pools without requiring a cluster re-build every time the system node pool gets modified. Once the cluster is ready, we attach our 3 system node pools (we need 3 to use storage) and when they are ready, we remove the "bootstrap" node pool.
Warning Do not use this it is likely to be deprecated in future module versions.
The single_group parameter controls whether a single node group is created that spans multiple zones, or if a separate node group is created for each zone in a cluster. When this parameter is set to true
, a single node group is created that spans all zones, and the min_capacity
and max_capacity
settings apply to the total number of nodes across all zones. When set to false, separate node groups are created for each zone and the min_capacity
and max_capacity
settings apply to the number of nodes in each individual zone and must be scaled accordingly. It is advised to not use single_group
unless you have a specific problem to solve and have spoken to the core engineering team.
Node sizes are based on the number of CPUs, with the other resources being dependent on the node type; not all node types support all sizes.
When creating persistent volumes in Azure, make sure you use a size supported by azure disk. This applies to Standard and Premium disks; this doesn't apply to Premium v2 disks.
Name | CPU Count |
---|---|
large |
2 |
xlarge |
4 |
2xlarge |
8 |
4xlarge |
16 |
8xlarge |
32 |
12xlarge |
48 |
16xlarge |
64 |
18xlarge |
72 |
20xlarge |
80 |
24xlarge |
96 |
26xlarge |
104 |
Node types describe the purpose of the node and maps down to the underlying Azure virtual machines. Select your node type for the kind of workloads you expect to be running, as a rule of thumb use gp
unless you have additional requirements.
Due to the availability issues with specific Azure VMs when choosing a node type you also need to select the version; newer versions may well be less available in popular regions.
All the nodes provisioned by the module support premium storage.
General purpose nodes, gp
& gpd
, offer a good balance of compute and memory. If you need a local temp disk gpd
provides this.
Arch | Type | Variant | Version | VM Type | Sizes |
---|---|---|---|---|---|
amd64 |
gp |
default |
v1 |
Dsv4 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge & 16xlarge |
amd64 |
gp |
default |
v2 |
Dsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 24xlarge |
amd64 |
gp |
amd |
v2 |
Dasv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 24xlarge |
arm64 |
gp |
default |
v1 |
Dpsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 24xlarge |
amd64 |
gpd |
default |
v1 |
Ddsv4 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge & 16xlarge |
amd64 |
gpd |
default |
v2 |
Ddsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 24xlarge |
amd64 |
gpd |
amd |
v2 |
Dadsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 24xlarge |
arm64 |
gpd |
default |
v1 |
Dpdsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 24xlarge |
Memory optimised nodes, mem
& memd
, offer a higher memory to CPU ration than general purpose nodes. If you need a local temp disk memd
provides this.
Arch | Type | Variant | Version | VM Type | Sizes |
---|---|---|---|---|---|
amd64 |
mem |
default |
v1 |
Esv4 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge & 16xlarge |
amd64 |
mem |
default |
v2 |
Esv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge , 24xlarge & 26xlarge |
amd64 |
mem |
amd |
v2 |
Easv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge , 24xlarge & 26xlarge |
arm64 |
mem |
default |
v1 |
Epsv5 | large , xlarge , 2xlarge , 4xlarge & 8xlarge |
amd64 |
memd |
default |
v1 |
Edsv4 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge & 16xlarge |
amd64 |
memd |
default |
v2 |
Edsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge , 24xlarge & 26xlarge |
amd64 |
memd |
amd |
v2 |
Eadsv5 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge , 24xlarge & 26xlarge |
arm64 |
memd |
default |
v1 |
Epdsv5 | large , xlarge , 2xlarge , 4xlarge & 8xlarge |
Compute optimised nodes, cpu
, offer a higher CPU to memory ratio than general purpose nodes.
Arch | Type | Variant | Version | VM Type | Sizes |
---|---|---|---|---|---|
amd64 |
cpu |
default |
v1 |
Fsv2 | large , xlarge , 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 18xlarge |
Storage optimised nodes, stor
, offer higher disk throughput and IO than general purpose nodes and come both with a local temp disk and one or more NVMe drives.
Arch | Type | Variant | Version | VM Type | Sizes |
---|---|---|---|---|---|
amd64 |
stor |
default |
v1 |
Lsv2 | 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 20xlarge |
amd64 |
stor |
default |
v2 |
Lsv3 | 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 20xlarge |
amd64 |
stor |
amd |
v2 |
Lasv3 | 2xlarge , 4xlarge , 8xlarge , 12xlarge , 16xlarge & 20xlarge |
This module currently only supports user access by users or groups passed into the module by the rbac_bindings
input variable; these users and groups are linked to a ClusterRole
via a ClusterRoleBinding
. The following ClusterRoles
can be bound to (the ClusterRoles
with a *
are Kubernetes defaults).
ClusterRole | Description |
---|---|
cluster-admin * |
Allows super-user access to perform any action on any resource. It gives full control over every resource in the cluster and in all namespaces. |
view * |
Allows read-only access to see most objects in all namespaces. It does not allow viewing roles or role bindings. This role does not allow viewing Secrets , since reading the contents of Secrets enables access to ServiceAccount credentials, which would allow API access as any ServiceAccount (a form of privilege escalation). |
When you create a new cluster, you can enable FIPS 140-2 mode by setting the fips
module variable to true
. Keep in mind that once a cluster has been created, you cannot enable or disable FIPS mode; you will need to create a new cluster if you want to change the FIPS mode.
FIPS 140-2 mode is a security standard that specifies the security requirements for cryptographic modules used in government and industry, and enabling it on your cluster can help ensure the security and integrity of the cryptographic functions used by your cluster. However, it can also introduce additional overhead and complexity, so operators should carefully consider whether it is necessary for the use case. It is crucial to ensure that any software running on the cluster is FIPS compliant in order for the cluster to function properly in FIPS 140-2 mode. This includes any applications or services that utilize cryptographic functions, as well as any external libraries or dependencies that may utilize cryptographic functions. Failure to do so can result in errors and potential security vulnerabilities.
This module is expected to be referenced by it's major version (e.g. v1
) and run regularly (at least every 4 weeks) to keep the cluster configuration up to date.
The core service configuration, input variable core_services_config, allows the customisation of the core cluster services. All core services run on a dedicated system node group reserved only for these services, although DaemonSets
will be scheduled on all cluster nodes.
By default cluster nodes will be auto-scaled by the Cluster Autoscaler based on the node group configuration (resources, labels and taints).
AKS cluster logging is currently split up into two parts; the control plane logs are handled as part of the AKS service and are sent to Log Analytics, while the node and pod logs are handled by a core service reading the logs directly from the node and then sending them to a cluster aggregation service which can export the logs. The log export interface is specifically linked to the aggregator implementation, however there are plans to abstract this by supporting generic OpenTelemetry in addition to specific targets configured through the module. Additionally there are plans to enhance the system by supporting the aggregation of control plane logs into the cluster, allowing all logs to be managed as a unified set.
Control plane logs are exported to one (or both) of a Log Analytics Workspace or an Azure Storage Account, these logs are retained for 30
days by default. Both of these destinations support the direct querying of the logs but sending logs to a Log Analytics Workspace will incur additional cost overhead.
The control plane logs can be configured via the logging.control_plane input variable which allows for the selection of the destinations as well as the profile (see below) and additional configuration such as changing the retention (DEPRECATED).
An AKS cluster generates a number of different control plane logs which need to be collected. To configure which log category types are collected you should specify a log profile, which can optionally be augmented with additional log category types.
The following control plane log profiles with their default log category types are supported (note that not all log category types are available in all Azure locations). You are strongly advised to use the all
profile wherever possible and the empty
profile is unsupported for any workload cluster and is only made available for testing purposes.
Profile | Log Category Types |
---|---|
all |
["kube-apiserver", "kube-audit", "kube-controller-manager", "kube-scheduler", "cluster-autoscaler", "cloud-controller-manager", "guard", "csi-azuredisk-controller", "csi-azurefile-controller", "csi-snapshot-controller"] |
audit-write-only |
["kube-apiserver", "kube-audit-admin", "kube-controller-manager", "kube-scheduler", "cluster-autoscaler", "cloud-controller-manager", "guard", "csi-azuredisk-controller", "csi-azurefile-controller", "csi-snapshot-controller"] |
minimal |
["kube-apiserver", "kube-audit-admin", "kube-controller-manager", "cloud-controller-manager", "guard"] |
empty |
[] |
Both node (systemd) and pod/container logs are collected from the node by the Fluent Bit collector DaemonSet
, Fluent Bit adds additional metadata to the collected logs before sending them off the ephemeral node to the persistent aggregation service. The persistent aggregation service is currently provided by Fluentd running as a StatefulSet
with a single pod in each cluster availability zone, Fluentd makes sure that the logs can't be lost once they've been received by backing them on persistent storage. Fluentd is then responsible for sending the logs to one or more destination. This architecture provides high throughput of logs, over 15k logs a second, and resilience to both node and network outages.
Logging outputs can be configured directly against Fluentd via the module input variable core_services_config.fluentd
(see Appendix F5); it is also possible to modify the logs as part of the route configuration but modifying logs in any way more advanced than adding fields for cluster context can significantly impact the Fluentd throughput and so is strongly advised against.
In cluster Loki support is currently experimental and can be enabled by setting the logging.nodes.loki.enabled
or logging.workloads.loki.enabled
to true
; this enables powerful log querying through the in cluster Grafana.
The workload logging interface is to write logs to stdout/stderr, these logs will be collected and aggregated centrally in the cluster from where they can be exported to one or more destination. If your application creates JSON log lines the fields of this object are extracted, otherwise there is a log
field with the application log data as a string; for JSON logging we suggest using msg
for the log text field.
All pod logs have a kube
tag and additional fields extracted from the Kubernetes metadata, please note that using Kubernetes common labels makes the log fields more meaningful.
Pods annotated with the fluentbit.io/exclude: "true"
annotation won't have their logs collected as part of the cluster logging system, this shouldn't be used unless you have an alternative way of ensuring that you're in compliance.
Pods annotated with the lnrs.io/loki-ignore: "true"
annotation won't have their logs aggregated in the cluster Loki, this is advised against as it reduces log visibility but can be used to gradually integrate workloads with Loki.
Workload logs can be shipped to an Azure storage account by setting logging.workloads.storage_account_logs
to true
.
Workload logs can be shipped to an Loki by setting logging.workloads.loki.enabled
to true
.
An external storage account must be provided in the logging.storage_account_config
settings for this feature to function. The following is an example of the configuration required:
logging = {
workloads = {
# Enable workload log exporting
storage_account_logs = true
}
storage_account_config = {
# Configure the storage account
id = azurerm_storage_account.data.id
}
}
Cluster metrics are collected by Prometheus which is managed by a Prometheus Operator and made HA by running Thanos as a sidecar and as a cluster service. Metrics can be exported from the cluster via the Prometheus remote write protocol. It is planned to also support exporting metrics using the OpenTelemetry interface.
The workload metrics interface is to expose metrics from the workload pod in Prometheus format and then create either a ServiceMonitor (if the workload has a service) or a PodMonitor to configure how the metrics should be scraped by Prometheus. The ServiceMonitor
and PodMonitor
resources should be labelled with the lnrs.io/monitoring-platform: "true"
label to ensure they are evaluated.
Cluster alerts are powered by AlertManager and are ignored by default, to configure the alerts you can use the module input variable core_services_config.alertmanager
(see Appendix F1) to define routes and receivers.
Custom alert rules can be configured by adding additional PrometheusRule resources to the cluster with the lnrs.io/monitoring-platform: "true"
and either the lnrs.io/prometheus-rule: "true"
or lnrs.io/thanos-rule: "true"
(only use the Thanos rule label if you need to process more than 6h of metrics) labels set.
Traces aren't currently natively supported but it is planned to support at least the collection using the OpenTelemetry interface.
The module provides in-cluster Grafana as a visualisation service for metrics and logs; this can be configured via the core_services_config.grafana
input variable (see Appendix F7). You can also add additional data sources via a ConfigMap
with the label grafana_datasource: "1"
and additional dashboards via a ConfigMap
with the label grafana_dashboard: "1"
.
Clusters have Cert Manager installed to support generating certificates from Certificate resources referencing either a ClusterIssuer or an Issuer; this can be configured via the core_services_config.cert_manager
input variable (see Appendix F2). By default there are the following ClusterIssuers
provided, letsencrypt
, letsencrypt-staging
& zerossl
; all of which use ACME DNS01 which is configured via the core_services_config.cert_manager.acme_dns_zones
input variable. It is possible to add additional ClusterIssuer
or Issuer
resources either via the core_services_config.cert_manager.additional_issuers
or directly through the Kubernetes API.
If an Ingress
resource is annotated with the cert-manager.io/cluster-issuer
or cert-manager.io/issuer
and contains TLS configuration for the hosts Cert Manager can automatically generate a certificate.
Clusters have External DNS installed to manage configuring Azure DNS external to the cluster; this can be configured via the core_services_config.external_dns
input variable (see Appendix F4). There will always be an instance managing private Route53 records but if you've configured public DNS zones there will also be a public instance running to manage these. To manage resources for private DNS the lnrs.io/zone-type: private
annotation should be set, for public DNS the lnrs.io/zone-type: public
annotation should be set and for split horizon (public & private) DNS the lnrs.io/zone-type: public-private
annotation should be set.
By default DNS records are only generated for Ingress
resources with the lnrs.io/zone-type
annotation set but additional Kubernetes resource types can be supported by adding them to the core_services_config.external_dns.additional_sources
input variable.
The module includes support for the Azure Disks CSI driver (always on), Azure Files CSI driver (off by default), Azure Blob CSI driver (off by default) and Local Volume Static Provisioner (off by default). There is also support creating a host path volume on the node from local disks (NVMe or the temp disk). The module storage configuration can be customised using the the storage module input variable.
The following StorageClass
resources are created for the Azure Disks CSI driver by default to support common Azure disk types with default characteristics. When using a default StorageClass
you are recommended to use the Premium SSD v2 classes where possible due to the best price-performance characteristics. If you need support for specific characteristics (such as higher IOPS or throughput) you should create a custom StorageClass
.
azure-disk-standard-ssd-retain
azure-disk-premium-ssd-retain
azure-disk-premium-ssd-v2-retain
azure-disk-standard-ssd-delete
azure-disk-premium-ssd-delete
azure-disk-premium-ssd-v2-delete
azure-disk-standard-ssd-ephemeral
azure-disk-premium-ssd-ephemeral
azure-disk-premium-ssd-v2-ephemeral
If you wish to use the Azure Files CSI driver you will need to enable it by setting storage.file
to true
and add one or more custom StorageClass
resource.
If you wish to use the Azure Blob CSI driver you will need to enable it by setting storage.blob
to true
and add one or more custom StorageClass
resource.
If you wish to use the Local Volume Static Provisioner you will need to enable it by setting storage.nvme_pv
to true
and provision node groups with nvme_mode
set to PV
.
The current behaviour is to mount each NVMe drive on the node as a separate PersistentVolume
but is should be possible to combine all of the drives into a single RAID-0 volume and either expose it as a single PersistentVolume
or partition to support more PersistentVolume
per node than there are NVMe drives.
If you wish to support creating a host path volume on nodes with local disks you will need to enable it by setting storage.host_path
to true
and provision node groups with either temp_disk_mode
or nvme_mode
set to HOST_PATH
. This will create a host volume at /mnt/scratch
backed by either the NVMe drives (RAID-0 if there are moe than one) or the temp disk. If a node has both NVMe drives and a temp disk and both are set to host path only the NVMe drives will be used.
All traffic being routed into a cluster should be configured using an Ingress
resources backed by an ingress controller and should NOT be configured directly as a Service
resource of LoadBalancer
type (this is what the ingress controllers do behind the scenes). There are a number of different ingress controller supported by Kubernetes but it is strongly recommended to use an ingress controller backed by an official Terraform module to install. All ingress traffic should enter the cluster onto nodes specifically provisioned for ingress without any other workload on them.
Out of the box the cluster supports automatically generating certificates with the Cert Manager default issuer, this can be overridden by the following Ingress
annotations cert-manager.io/cluster-issuer
or cert-manager.io/issuer
. DNS records will be created by External DNS from Ingress
resources when the lnrs.io/zone-type
is set, see the DNS config for how this works.
- The following official Terraform modules for ingress controllers are supported by the core engineering team and have been tested on AKS. These controller require you to have ingress nodes registered in your cluster to work correctly.
- K8s Ingress NGINX Terraform Module
Warning With the release of Kubernetes
v1.25
, the behavior of ingress communication has changed compared tov1.24
(Removed). If you are using pod-to-ingress communication when updating from Kubernetesv1.24
(Removed) tov1.25
, you will encounter an SSL error when connecting cluster-hosted applications to the ingress due to a bug in how iptables rules were applied in the previous version.To pre-emptively address the issue of blocking node-to-pod traffic during a Kubernetes
v1.25
upgrade, you have two options depending on your requirements:Option 1: Specify the cluster pod CIDR in
core_services_config.ingress_internal_core.lb_source_cidrs
.Option 2: If you're using the rsg-terraform-kubernetes-ingress-nginx module, add the pod CIDR to the
lb_source_cidrs
variable.This action will ensure that the correct iptables rules are applied, allowing traffic from the node to the pod via the ingress. The Kubernetes
v1.25
upgrade includes changes to the code that implements iptables rules, fixing an issue kubernetes/kubernetes#109826 and enforcing the correct behavior of blocking node-to-pod traffic due to a lack of CIDRs in the service specification.Remember to perform the necessary steps before upgrading to Kubernetes
v1.25
to avoid any issues with node-to-pod traffic.
By default the platform deploys an internal IngressClass
, named core-internal
, to expose services such as Prometheus and Grafana UIs. This ingress shouldn't be used for user services but can be used for other internal dashboards; for user services instead deploy a dedicated ingress controller with it's own IngressClass
.
By default this ingress doesn't support pod-to-ingress network traffic but you can override it by specifying core_services_config.ingress_internal_core.lb_source_cidrs
(you will need to specify all the values). For better performance and network efficiency we recommend using internal communication for pod-to-pod interactions, rather than going outside and re-entering through an ingress; this utilizes the cluster's internal DNS service to access services inside the cluster using a <service>.<namespace>.svc.cluster.local
domain name. However, if your use case requires pod-to-ingress communication, such as when ingress features like SSL termination, load balancing, or traffic routing rules are necessary, you will need to make sure you've configured the ingress correctly.
This is the only ingress controller in the cluster which doesn't require ingress nodes as it's required by all clusters and is not expected to carry a significant volume of traffic. If you do not configure ingress nodes this ingress controller will run on the system nodes.
Ingress nodes mush have the lnrs.io/tier: ingress
label and the ingress=true:NoSchedule
taint to enable the ingress controller(s) to be scheduled and to isolate ingress traffic from other pods. You can also add additional labels and taints to keep specific ingress traffic isolated to it's own nodes. As ingress traffic is stateless a single node group can be used to span multiple zones by setting single_group = true
.
An example of an ingress node group.
locals {
ingress_node_group = {
{
name = "ingress"
node_os = "ubuntu"
node_type = "gp"
node_type_version = "v1"
node_size = "large"
single_group = true
min_capacity = 3
max_capacity = 6
placement_group_key = null
labels = {
"lnrs.io/tier" = "ingress"
}
taints = [{
key = "ingress"
value = "true"
effect = "NO_SCHEDULE"
}]
tags = {}
}
}
}
The module installs the Calico network policy engine on a Kubernetes cluster. Calico is a widely used networking solution for Kubernetes that allows users to define and enforce network policies for their pods. However, at this time this module does not expose Calico's functionality to operators. Instead, consumers can use native Kubernetes network policies to manage networking within their clusters.
Native Kubernetes network policies allow users to specify which pods can communicate with each other, as well as set up ingress and egress rules. This enables users to secure their clusters by controlling network traffic between pods and enforcing network segmentation. For more information on using network policies in Kubernetes, see the official documentation at: kubernetes.io/docs/concepts/services-networking/network-policies/
When utilizing custom tags with the module, it is essential to be aware of the potential limitations that may impact the removal of tags. Some tags may not be removed when attempting to remove them through the module, which can result in unexpected behaviour or errors in your pipeline. To avoid these issues, it is recommended to thoroughly review and test the behaviour of custom tags before implementing them in any environment. If necessary, persistent tags can be manually removed through the Azure portal, CLI or API to ensure that they are properly removed from the resource. For more information on tag limitations, you can refer to the Microsoft documentation here
AKS clusters created by this module use Azure AD authentication and don't create local accounts.
When running this module or using a Kubernetes based provider (kubernetes
, helm
or kubectl
) the Terraform identity either needs to have the Azure Kubernetes Service RBAC Cluster Admin scoped to the cluster or you need to pass the identities AD group ID into the admin_group_object_ids
module input variable.
Note If you're using TFE you need to use the
admin_group_object_ids
input variable unless specifically told otherwise.
From Terraform workspaces all Kubernetes based providers should be configured to use the exec plugin pattern and for AKS clusters this is Kubelogin which should be configured as below, note the constant --server-id
of 6dae42f8-4368-4678-94ff-3960e28e3630
and the values which need to be defined in locals (or elsewhere). The exec
block is the same as kubernetes
for the helm
and kubectl
providers but is nested under the kubernetes
block in them.
provider "kubernetes" {
host = module.aks.cluster_endpoint
cluster_ca_certificate = base64decode(module.aks.cluster_certificate_authority_data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "kubelogin"
args = ["get-token", "--login", "spn", "--server-id", "6dae42f8-4368-4678-94ff-3960e28e3630", "--environment", local.azure_environment, "--tenant-id", local.tenant_id]
env = { AAD_SERVICE_PRINCIPAL_CLIENT_ID = local.client_id, AAD_SERVICE_PRINCIPAL_CLIENT_SECRET = local.client_secret }
}
}
To connect to an AKS cluster after it's been created your AD user will need to have been added to the cluster via the rbac_bindings
input variable. You can run the following commands, assuming that you have the Azure CLI installed and you are logged in to it. By default this will configure kubectl
to require a device login but this behaviour can be changed to use the Azure CLI by replacing the --login
argument of devicelogin
with azurecli
in the ~/.kube/config file.
az aks install-cli
az account set --subscription "${SUBSCRIPTION}"
az aks get-credentials --resource-group "${RESOURCE_GROUP_NAME}" --name "${CLUSTER_NAME}"
Note Experimental features are not officially supported and do not follow SemVer like the rest of this module; use them at your own risk.
Experimental features allow end users to try out new functionality which isn't stable in the context of a stable module release, they are enabled by setting the required variables on the experimental
module variable.
If your cluster isn't being destroyed cleanly due to stuck AAD Pod Identity resources you can increase the time we wait before uninstalling the chart by setting experimental = { aad_pod_identity_finalizer_wait = "300s" }
.
This module supports enabling the OMS agent as it needs to be done when the cluster is created; but the operation of the agent is not managed by the module and needs to be handled by the cluster operators separately. All core namespaces should be excluded by the cluster operator, especially the logging namespace, unless they are specifically wanted.
To enable OMS agent support you need to set experimental = { oms_agent = true, oms_log_analytics_workspace_id = "my-workspace-id" }
.
By default the module will configure the OMS agent by creating the container-azm-ms-agentconfig
ConfigMap; this specifically excludes core namespaces from log collection. You can append additional data keys to the ConfigMap
via the config_map_v1_data Terraform resource. It is possible to disable this behaviour by setting the experimental.oms_agent_create_configmap
input variable to false
; by doing this you're taking full responsibility for managing your own OMS agent configuration and should make sure that the default configuration log exclusion is replicated.
You can override the default Log Analytics ContainerLog schema to ContainerLogV2 by setting the experimental.oms_agent_containerlog_schema_version
input variable to v2
.
To enable experimental support for OS custom configuration you can set experimental = { node_group_os_config = true }
and then add an os_config
block to applicable node_groups
objects.
node_groups = {
workers = {
node_os = "ubuntu"
node_type = "gp"
node_type_version = "v1"
node_size = "large"
single_group = false
min_capacity = 3
max_capacity = 6
os_config = {
sysctl = {
net_core_rmem_max = 2500000
net_core_wmem_max = 2500000
net_ipv4_tcp_keepalive_time = 120
}
}
labels = {}
taints = []
tags = {}
}
}
Only a subset of Linux systcl
configuration is supported (see above or in code). Note not all parameters are required, please raise an issue for additional tunables.
To enable the creation of ARM64 Ampere Altra nodes you can set the experimental flag experimental = { arm64 = true }
. When this flag is set you can set node_arch
to arm64
to get an ARM64 instance, if this flag isn't set attempting to set node_arch
will be ignored.
To enable the customisation of the maximum number of pods per node when using the Azure CNI you can set the experimental flag experimental = { azure_cni_max_pods = true }
. When this flag is set you can set max_pods
to a value between 12
& 110
, if this flag isn't set attempting to set max_pods
will be ignored.
To enable in-cluster Loki you can set the experimental flag experimental = { loki = true }
; this is planned to be released as an opt-in core service config option once it's been tested. We would like to hear feedback from operators using Loki before we make it GA.
The module now experimentally supports using Fluent Bit as the log aggregator instead of Fluentd; the Fluent Bit StatefulSet
can have it's memory, CPU & replicas set in addition to the configuration of filters & outputs.
The Fluent Bit Aggregator can be enabled by setting the experimental flag experimental = { fluent_bit_aggregator = true }
and it supports the same outputs as Fluentd. Additional functionality can be configured with raw Fluent Bit configuration via the experimental.fluent_bit_aggregator_raw_filters
& experimental.fluent_bit_aggregator_raw_outputs
flags. You can also provide env variables via the experimental.fluent_bit_aggregator_extra_env
flag, secret env variables via the experimental.fluent_bit_aggregator_secret_env
flag, and custom scripts to be used by the Lua filter via the experimental.fluent_bit_aggregator_lua_scripts
flag. The StatefulSet
can be configured by the experimental.fluent_bit_aggregator_replicas_per_zone
, experimental.fluent_bit_aggregator_resources_override
flags.
Variable | Description | Type | Default |
---|---|---|---|
fluent_bit_aggregator_resources_override |
Resource overrides for pod containers. Map key(s) can be default , thanos_sidecar , config_reloader |
map(object) (see Appendix G) |
{} |
You can add custom single line log parsing support at the Fluent Bit collector level by setting the experimental.fluent_bit_collector_parsers
input variable. Enabling this functionality could cause performance issues so a better solution where possible would be to fix the logs at the application level.
Once a parser has been defined, to use the parser for your application, add the annotation fluentbit.io/parser
to the spec template so that pods recieve the annotation when deployed. The value for the fluentbit.io/parser
annotation is the name provided to the parser in the terraform object which in this example is "custom-regex".
If pods contain multiple containers however you require parsing on specific container in the pod or if you need to limit parsing to a specific stream (stdout or stderr), you can add stream and container name as suffixes to the annotation key fluentbit.io/parser[_stream][-container]
. This will cause parsing to happen only on specific containers and/or specific stream.
The pattern object takes regex to match into named capturing groups. If your regex uses \
characters then you will need to prepend each of them with another \
character as this is an escape sequence character in terraform.
The types object is optional and is a string that contains named capturing group names and types in the format <named_group>:. Multiple can be specified, using space as a delimeter.
See below example where log line would contain key value pairs with
- Pipe
|
separating key value pairs - Space separating key and value
- Regex that contains
\
character - Mutiple types.
A text|B 1|C 2.08
locals {
fluent_bit_collector_parsers = {
"custom-regex" = {
pattern = "^A (?<a>[^|]*)\\|B (?<b>[^|]*)\\|C (?<c>[^|]*)$"
types = {
a = "string",
b = "integer",
c = "float"
}
}
}
}
annotations:
fluentbit.io/parser: custom-regex
You can add custom multiline log parsing support at the Fluent Bit collector level by setting the experimental.fluent_bit_collector_multiline_parsers
input variable. Enabling this functionality could cause performance issues so a better solution where possible would be to fix the logs at the application level.
For an example see below.
locals {
fluent_bit_collector_multiline_parsers = {
test_parser = {
rules = [
{
name = "start_state"
pattern = "/^\\[MY\\LOG\\].*/"
next_rule_name = "cont"
},
{
name = "cont"
pattern = "/^[^\\[].*/"
next_rule_name = "cont"
}
]
workloads = [{
namespace = "default"
pod_prefix = "my-pod"
}]
}
}
}
You can test the ability to create a new Linux only AKS cluster with the Azure CNI in Overlay mode by setting the experimental = { azure_cni_overlay = true }
input variable.
Some features in the AKS module are in a category of "use at your own risk". These features are unlikely to be fully supported in the forseeable future. This includes disabling the logging stack and windows support.
It is possible, but UNSUPPORTED, to entirely disable the logging stack. This should only be done by groups with explicit approval to do so. The aim of this flag is to enable groups to experiment with alternative approaches to external logging in development and non-production environments.
To disable the logging stack, you can use the following configuration.
unsupported = { logging_disabled = true }
It is possible to entirely disable the observability stack. This should only be done by groups with explicit approval to do so. The aim of this flag is to enable groups to experiment with alternative approaches to external observability in development and nonproduction environments. This use case is unsupported.
To disable the observability stack, you can use the following configuration:
unsupported = { observability_disabled = true }
Important Teams must seek approval from their business unit Architect and IOG Architecture before using Windows node pools.
Using Windows Nodes in an AKS cluster is UNSUPPORTED and is currently significantly limited; Windows node pools do not include platform daemonsets
such as the Prometheus metrics exporter, Fluent Bit log collection or Azure AD Pod Identity. In the interim it is expected teams provide their own support for these features, e.g. use Azure Container Insights for log collection. Services provided by the AKS platform SHOULD work but have not been tested, including kube-proxy
, CSI drivers and Calico network policy.
As of AKS v1.25
the default AKS Windows version will be Windows Server 2022 which hasn't had any testing due to the lack of available resources, please make sure that you've updated your node_os
inputs to specify the version of Windows required before upgrading to AKS v1.25
.
There may be other requirements or specific configuration required for Windows nodes, yet to be identified. We encourage teams to identify, report and contribute code and documentation to improve support going forward.
To enable Windows support, you can use the following configuration.
unsupported = { windows_support = true }
There are some potential cases where automatic cluster upgrades might cause problems, so you can enable this UNSUPPORTED functionality to take responsibility for manually upgrading your Kubernetes patch version. If you're using this functionality you are still required to meet our security baseline. Be aware that as AKS is a managed service Azure resurve the right to upgrade components at any time if the feel that it is necessary, an up to date cluster is less likely to fall into this category.
To disable automatic upgrades and require manual upgrades, you can use the following configuration.
unsupported = { manual_upgrades = true }
This module requires the following versions to be configured in the workspace terraform {}
block.
Version |
---|
>= 1.4.6 |
Name | Version |
---|---|
hashicorp/azurerm | >= 3.63.0 |
hashicorp/helm | >= 2.11.0 |
gavinbunney/kubectl | >= 1.14.0 |
hashicorp/kubernetes | >= 2.23.0 |
hashicorp/random | >= 3.3.0 |
scottwinkler/shell | >= 1.7.10 |
hashicorp/time | >= 0.7.2 |
Variable | Description | Type | Default |
---|---|---|---|
location |
Azure location to target. | string |
|
resource_group_name |
Name of the resource group to create resources in, some resources will be created in a separate AKS managed resource group. | string |
|
cluster_name |
Kubernetes Service managed cluster to create, also used as a prefix in names of related resources. This must be lowercase and contain the pattern aks-{ordinal} (e.g. app-aks-0 or app-aks-1 ). |
string |
|
cluster_version |
Kubernetes version to use for the Azure Kubernetes Service managed cluster; versions 1.27 , 1.26 and 1.25 are supported. |
string |
|
sku_tier |
Pricing tier for the Azure Kubernetes Service managed cluster; "FREE" & "STANDARD" are supported. For production clusters or clusters with more than 10 nodes this should be set to STANDARD (see docs). |
string |
"FREE" |
cluster_endpoint_access_cidrs |
List of CIDR blocks which can access the Azure Kubernetes Service managed cluster API server endpoint, an empty list will not error but will block public access to the cluster. | list(string) |
|
virtual_network_resource_group_name |
Name of the resource group containing the virtual network. | string |
|
virtual_network_name |
Name of the virtual network to use for the cluster. | string |
|
subnet_name |
Name of the AKS subnet in the virtual network. | string |
|
route_table_name |
Name of the AKS subnet route table. | string |
|
dns_resource_group_lookup |
Lookup from DNS zone to resource group name. | map(string) |
|
podnet_cidr_block |
CIDR range for pod IP addresses when using the kubenet network plugin, if you're running more than one cluster in a subnet (or sharing a route table) this value needs to be unique. |
string |
"100.65.0.0/16" |
nat_gateway_id |
ID of a user-assigned NAT Gateway to use for cluster egress traffic, if not set a cluster managed load balancer will be used. Please note that this can only be enabled when creating a new cluster. | string |
null |
managed_outbound_ip_count |
Count of desired managed outbound IPs for the cluster managed load balancer, see the documentation. Ignored if NAT gateway is specified, must be between 1 and 100 inclusive. |
number |
1 |
managed_outbound_ports_allocated |
Number of desired SNAT port for each VM in the cluster managed load balancer, do not manually set this unless you've read the documentation carefully and fully understand the impact of the change. Ignored if NAT gateway is specified, must be between 0 & 64000 inclusive and divisible by 8 . |
number |
0 |
managed_outbound_idle_timeout |
Desired outbound flow idle timeout in seconds for the cluster managed load balancer, see the documentation. Ignored if NAT gateway is specified, must be between 240 and 7200 inclusive. |
number |
240 |
admin_group_object_ids |
AD Object IDs to be added to the cluster admin group, this should only ever be used to make the Terraform identity an admin if it can't be done outside the module. | list(string) |
[] |
rbac_bindings |
User and groups to configure in Kubernetes ClusterRoleBindings ; for Azure AD these are the IDs. |
object (Appendix A) |
{} |
system_nodes |
System nodes to configure. | map(object) (Appendix B) |
{} |
node_groups |
Node groups to configure. | map(object) (Appendix C) |
{} |
logging |
Logging configuration. | map(object) (Appendix D) |
{} |
storage |
Storage configuration. | map(object) (Appendix E) |
{} |
core_services_config |
Core service configuration. | any (Appendix G) |
|
maintenance |
Maintenance configuration. | object (Appendix H) |
{} |
tags |
Tags to apply to all resources. | map(string) |
{} |
fips |
If true , the cluster will be created with FIPS 140-2 mode enabled; this can't be changed once the cluster has been created. |
bool |
false |
unsupported |
Configure unsupported features. | any |
{} |
experimental |
Configure experimental features. | any |
{} |
Specification for the rbac_bindings
object.
Note User and group IDs can be found in Azure Active Directory.
Variable | Description | Type | Default |
---|---|---|---|
cluster_admin_users |
Users to bind to the cluster-admin ClusterRole , identifier as the key and group ID as the value. |
map(string) |
{} |
cluster_view_users |
Users to bind to the view ClusterRole , identifier as the key and group ID as the value. |
map(string) |
{} |
cluster_view_groups |
Groups to bind to the view ClusterRole , list of group IDs. |
list(string) |
[] |
Specification for the system_nodes
objects.
Variable | Description | Type | Default |
---|---|---|---|
node_arch |
EXPERIMENTAL - Processor architecture to use for the system node group, amd64 & arm64 are supported. See docs. |
string |
amd64 |
node_type_version |
The version of the node type to use. See node types for more information. | string |
"v1" |
node_size |
Size of the instance to create in the system node group. See node sizes for more information. | string |
|
min_capacity |
Minimum number of nodes in the system node group, this needs to be divisible by the number of subnets in use. | number |
3 |
Specification for the node_groups
objects.
Variable | Description | Type | Default |
---|---|---|---|
node_arch |
EXPERIMENTAL - Processor architecture to use for the node group(s), amd64 & arm64 are supported. See docs. |
string |
amd64 |
node_os |
OS to use for the node group(s), ubuntu , windows2019 (**UNSUPPORTED**) & windows2022 (EXPERIMENTAL) are valid; Windows node support is not guaranteed but best-effort and needs manually enabling. |
string |
"ubuntu" |
node_type |
Node type to use, one of gp , gpd , mem , memd , cpu or stor . See node types for more information. |
string |
"gp" |
node_type_variant |
The variant of the node type to use. See node types for more information. | string |
"default" |
node_type_version |
The version of the node type to use. See node types for more information. | string |
"v1" |
node_size |
Size of the instance to create in the node group(s). See node sizes for more information. | string |
|
ultra_ssd |
If the node group can use Azure ultra disks. | bool |
false |
os_disk_size |
Size of the OS disk to create, this will be ignored if temp_disk_mode is KUBELET . |
number |
128 |
temp_disk_mode |
The temp disk mode for the node group, this is only valid for node types with a temp disk. The available values are NONE to do nothing, KUBELET (EXPERIMENTAL) to store the kubelet data (images, logs and empty dir volumes), & HOST_PATH (EXPERIMENTAL) to create a single volume at /mnt/scratch which can be used by a host mount volume. |
string |
NONE |
nvme_mode |
The NVMe mode for node group, this is only valid for stor node types. The available values are NONE to do nothing, PV to use the Local Volume Static Provisioner to create PersistentVolumes, & HOST_PATH (EXPERIMENTAL) to create a single volume (RAID-0 if more than 1 NVMe disk is present) at /mnt/scratch which can be used by a host mount volume. |
string |
NONE |
os_config |
EXPERIMENTAL - Custom OS configuration. See docs. | object |
|
placement_group_key |
If specified the node group will be added to a proximity placement group created for the key in a zone, single_group must be false . The key must be lowercase, alphanumeric, maximum 11 characters, please refer to the documentation for warnings and considerations. |
string |
null |
single_group |
If this template represents a single node group spanning multiple zones or a node group per cluster zone. | bool |
false |
min_capacity |
Minimum number of nodes in the node group(s), this needs to be divisible by the number of subnets in use. | number |
0 |
max_capacity |
Maximum number of nodes in the node group(s), this needs to be divisible by the number of subnets in use. | number |
|
max_pods |
EXPERIMENTAL - Custom maximum number of pods when using the Azure CNI; by default this is 30 but can be set to -1 to use the default or explicitly between 20 & 110 . For Kubenet there is always a maximum of 110 pods. See docs. |
number |
-1 |
max_surge |
EXPERIMENTAL - Custom maximum number or percentage of nodes which will be added to the Node Pool size during an upgrade. | string |
10% |
labels |
Additional labels for the node group(s). It is suggested to set the lnrs.io/tier label. |
map(string) |
{} |
taints |
Taints for the node group(s). For ingress node groups the ingress taint should be set to NO_SCHEDULE . |
list(object) (see below) |
[] |
tags |
User defined component of the node group name. | map(string) |
{} |
Specification for the node_groups.taints
object.
Variable | Description | Type | Default |
---|---|---|---|
key |
The key of the taint. Maximum length of 63. | string |
|
value |
The value of the taint. Maximum length of 63. | string |
|
effect |
The effect of the taint. Valid values: NO_SCHEDULE , NO_EXECUTE , PREFER_NO_SCHEDULE . |
string |
Specification for the logging
object.
Variable | Description | Type | Default |
---|---|---|---|
control_plane |
Control plane logging configuration. | object (Appendix D1) |
|
nodes |
Nodes logging configuration. | object (Appendix D2) |
{} |
workloads |
Workloads logging configuration. | object (Appendix D3) |
{} |
log_analytics_workspace_config |
Default Azure Log Analytics workspace configuration. | object (Appendix D4) |
{} |
storage_account_config |
Default Azure storage configuration. | object (Appendix D5) |
{} |
extra_records |
Additional records to add to the logs; env variables can be referenced within the value in the form ${<ENV_VAR>} |
map(string) |
{} |
Specification for the logging.control_plane
object.
Variable | Description | Type | Default |
---|---|---|---|
log_analytics |
Control plane logging log analytics configuration. | object (Appendix D1a) |
{} |
storage_account |
Control plane logging storage account configuration. | object (Appendix D1b) |
{} |
Specification for the logging.control_plane.log_analytics
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If control plane logs should be sent to a Log Analytics Workspace. | bool |
false |
workspace_id |
The Azure Log Analytics workspace ID, if not specified the default will be used. | string |
null |
profile |
The profile to use for the log category types. | string |
null |
additional_log_category_types |
Additional log category types to collect. | list(string) |
[] |
Specification for the logging.control_plane.storage_account
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If control plane logs should be sent to a storage account. | bool |
false |
id |
The Azure Storage Account ID, if not specified the default will be used. | string |
null |
profile |
The profile to use for the log category types. | string |
null |
additional_log_category_types |
Additional log category types to collect. | list(string) |
[] |
retention_enabled |
If retention should be configured per log category collected. | bool |
true |
retention_days |
Number of days to retain the logs if retention_enabled is true . |
number |
30 |
Specification for the logging.nodes
object.
Variable | Description | Type | Default |
---|---|---|---|
storage_account |
Node logs storage account configuration. | object (Appendix D2a) |
{} |
loki |
Loki workload logs configuration | object (Appendix D3a) |
{} |
Specification for the logging.nodes.storage_account
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If node logs should be sent to a storage account. | bool |
false |
id |
The Azure Storage Account ID, if not specified the default will be used. | string |
null |
container |
The container to use for the log storage. | string |
"nodes" |
path_prefix |
Blob prefix for the logs. | string |
null |
Specification for the logging.nodes.loki
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If node logs should be sent to Loki. | bool |
false |
Specification for the logging.workloads
object.
Variable | Description | Type | Default |
---|---|---|---|
core_service_log_level |
Log level for the core services; one of ERROR , WARN , INFO or DEBUG . |
string |
"WARN" |
storage_account |
Workload logs storage account configuration. | object (Appendix D3a) |
{} |
loki |
Loki workload logs configuration | object (Appendix D3a) |
{} |
Specification for the logging.workloads.storage_account
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If workload logs should be sent to a storage account. | bool |
false |
id |
The Azure Storage Account ID, if not specified the default will be used. | string |
null |
container |
The container to use for the log storage. | string |
"nodes" |
path_prefix |
Blob prefix for the logs. | string |
null |
Specification for the logging.workloads.loki
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If node logs should be sent to Loki. | bool |
false |
Specification for the logging.log_analytics_workspace_config
object.
Variable | Description | Type | Default |
---|---|---|---|
id |
The Azure Log Analytics Workspace ID to be used by default. | string |
null |
Specification for the logging.storage_account_config
object.
Variable | Description | Type | Default |
---|---|---|---|
id |
The Azure Storage Account ID to be used by default. | string |
null |
Specification for the storage
object.
Variable | Description | Type | Default |
---|---|---|---|
file |
Azure File CSI configuration. | object (Appendix E1) |
{} |
blob |
Azure Blob CSI configuration. | object (Appendix E2) |
{} |
nvme_pv |
NVMe Local Volume Static Provisioner configuration. | object (Appendix E3) |
{} |
host_path |
NVMe & temp disk host path configuration (EXPERIMENTAL). | object (Appendix E4) |
{} |
Specification for the storage.file
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If the Azure File CSI should be enabled. | bool |
false |
Specification for the storage.blob
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If the Azure Blob CSI should be enabled. | bool |
false |
Specification for the storage.nvme_pv
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If the Local Volume Static Provisioner should be enabled to mount NVMe drives as PVs. | bool |
false |
Specification for the storage.host_path
object.
Variable | Description | Type | Default |
---|---|---|---|
enabled |
If the NVMe or temp disk host path support should be enabled. | bool |
false |
Specification for the core_services_config
object.
Variable | Description | Type | Default |
---|---|---|---|
alertmanager |
Alertmanager configuration. | object (Appendix F1) |
{} |
cert_manager |
Cert Manager configuration. | object (Appendix F2) |
{} |
coredns |
CoreDNS configuration. | object (Appendix F3) |
{} |
external_dns |
ExternalDNS configuration. | object (Appendix F4) |
{} |
fluentd |
Fluentd configuration. | object (Appendix F5) |
{} |
grafana |
Grafana configuration. | object (Appendix F7) |
{} |
ingress_internal_core |
Ingress internal-core configuration. | object (Appendix F8) |
|
kube_state_metrics |
Kube State Metrics configuration. | object (Appendix F9) |
{} |
prometheus |
Prometheus configuration. | object (Appendix F10) |
{} |
prometheus_node_exporter |
Prometheus Node Exporter configuration. | object (Appendix F11) |
{} |
thanos |
Thanos configuration. | object (Appendix F12) |
{} |
loki |
Loki. | object (Appendix F13) |
{} |
Specification for the core_services_config.alertmanager
object.
Variable | Description | Type | Default |
---|---|---|---|
smtp_host |
SMTP host to send alert emails. | string |
|
smtp_from |
SMTP from address for alert emails. | string |
null |
receivers |
Receiver configuration. | list(object) |
[] |
routes |
Route configuration. | list(object) |
[] |
resource_overrides |
Resource overrides for pod containers. Map key(s) can be default |
map(object) (see Appendix G) |
{} |
Specification for the core_services_config.cert_manager
object.
Variable | Description | Type | Default |
---|---|---|---|
acme_dns_zones |
DNS zones that ACME issuers can manage certificates for. | list(string) |
[] |
additional_issuers |
Additional issuers to install into the cluster. | map(object) |
{} |
default_issuer_kind |
Kind of the default issuer. | string |
"ClusterIssuer" |
default_issuer_name |
Name of the default issuer , use letsencrypt for prod certs. |
string |
"letsencrypt-staging" |
Specification for the core_services_config.coredns
object.
Variable | Description | Type | Default |
---|---|---|---|
forward_zones |
Map of DNS zones and DNS server IP addresses to forward DNS requests to. | map(string) |
{} |
Specification for the core_services_config.external_dns
object.
Variable | Description | Type | Default |
---|---|---|---|
additional_sources |
Additional Kubernetes objects to be watched. | list(string) |
[] |
private_domain_filters |
Domains that can have DNS records created for them, these must be set up in the VPC as private hosted zones. | list(string) |
[] |
public_domain_filters |
Domains that can have DNS records created for them, these must be set up in the account as public hosted zones. | list(string) |
[] |
Specification for the core_services_config.fluentd
object.
Variable | Description | Type | Default |
---|---|---|---|
image_repository |
Custom image repository to use for the Fluentd image, image_tag must also be set. |
map(string) |
null |
image_tag |
Custom image tag to use for the Fluentd image, image_repository must also be set. |
map(string) |
null |
additional_env |
Additional environment variables. | map(string) |
{} |
debug |
If true all logs will be sent to stdout. |
bool |
true |
filters |
Global Fluentd filter configuration which will be run before the route output. This can be multiple <filter> blocks as a single string value. |
string |
null |
route_config |
Global Fluentd filter configuration which will be run before the route output. This can be multiple <filter> blocks as a single string value. |
list(object) (Appendix F6) |
[] |
resource_overrides |
Resource overrides for pod containers. Map key(s) can be default |
map(object) (see Appendix G) |
{} |
Specification for the core_services_config.fluentd.route_config
object.
Variable | Description | Type | Default |
---|---|---|---|
match |
The log tag match to use for this route. | string |
|
label |
The label to use for this route. | string |
|
copy |
If the matched logs should be copied to later routes. | bool |
false |
config |
The output configuration to use for the route. | string |
Specification for the core_services_config.grafana
object.
Variable | Description | Type | Default |
---|---|---|---|
admin_password |
Admin password. | string |
changeme |
additional_data_sources |
Additional data sources. | list(object) |
[] |
additional_plugins |
Additional plugins to install. | list(string) |
[] |
resource_overrides |
Resource overrides for pod containers. Map key(s) can be default , sidecar |
map(object) (see Appendix G) |
{} |
Specification for the core_services_config.ingress_internal_core
object.
Variable | Description | Type | Default |
---|---|---|---|
domain |
Internal ingress domain. | string |
|
subdomain_suffix |
Suffix to add to internal ingress subdomains, if not set cluster name will be used. | string |
{CLUSTER_NAME} |
lb_source_cidrs |
CIDR blocks of the IPs allowed to connect to the internal ingress endpoints. | list(string) |
["10.0.0.0/8", "100.65.0.0/16"] |
lb_subnet_name |
Name of the subnet to create the load balancer in, if not set subnet where node groups reside will be auto selected. Should not be set unless specifically required. | string |
|
public_dns |
If the internal ingress DNS should be public or private. | bool |
false |
Specification for the core_services_config.kube_state_metrics
object
Variable | Description | Type | Default |
---|---|---|---|
resource_overrides |
Resource overrides for pod containers. Map key(s) can be default |
map(object) (see Appendix H) |
{} |
Specification for the core_services_config.prometheus
object.
Variable | Description | Type | Default |
---|---|---|---|
remote_write |
Remote write endpoints for metrics. | list(object) |
[] |
resource_overrides |
Resource overrides for pod containers. Map key(s) can be default , thanos_sidecar , config_reloader |
map(object) (see Appendix G) |
{} |
Specification for the core_services_config.prometheus_node_exporter
object.
Variable | Description | Type | Default |
---|---|---|---|
resource_overrides |
Resource overrides for pod containers. Map key(s) can be default |
map(object) (see Appendix G) |
{} |
Specification for the core_services_config.thanos
object.
Variable | Description | Type | Default |
---|---|---|---|
resource_overrides |
Resource overrides for pod containers. Map key(s) can be store_gateway_default , rule_default , query_frontend_default , query_default , compact_default |
map(object) (see Appendix G) |
{} |
Specification for the core_services_config.loki
object.
Variable | Description | Type | Default |
---|---|---|---|
resource_overrides |
Resource overrides for pod containers. Map key(s) can be gateway_default , write_default read_default or backend_default |
map(object) (see Appendix G) |
{} |
Specification for the resource_overrides
object.
Variable | Description | Type | Default |
---|---|---|---|
cpu |
Value to set for cpu requests | number |
null |
cpu_limit |
Value to set for cpu limit. If cpu_limit specified, and cpu not specified then will be rounded to nearest full cpu to cpu value |
number |
null |
memory |
Value to set for memory | number |
null |
Specification for the maintenance
object.
Variable | Description | Type | Default |
---|---|---|---|
utc_offset |
Maintenance offset to UTC as a duration (e.g. +00:00 ); this will be used to specify local time. If this is not set a default will be calculated based on the cluster location. |
string |
null |
control_plane |
Planned maintainence window for the cluster control plane. | object (Appendix H1) |
[] |
nodes |
Planned maintainence window for the cluster nodes. | object (Appendix H2) |
[] |
not_allowed |
Absolute windows when all maintainance is not allowed. | list(object) (Appendix H3) |
[] |
Specification for the maintenance_window.control_plane
object.
Variable | Description | Type | Default |
---|---|---|---|
frequency |
Frequency of the maintainance window; one of WEEKLY , FORTNIGHTLY or MONTHLY . |
string |
WEEKLY |
day_of_month |
Day of the month for the maintainance window if the frequency is set to MONTHLY ; between 1 & 28 . |
number |
1 |
day_of_week |
Day of the week for the maintainance window if the frequency is set to WEEKLY or FORTNIGHTLY ; one of MONDAY , TUESDAY , WEDNESDAY , THURSDAY , FRIDAY , SATURDAY or SUNDAY . |
string |
SUNDAY |
start_time |
Start time for the maintainance window adjusted against UTC by the utc_offset ; in the format HH:mm . |
string |
00:00 |
duration |
Duration of the maintainance window in hours. | number |
4 |
Specification for the maintenance_window.nodes
object.
Variable | Description | Type | Default |
---|---|---|---|
frequency |
Frequency of the maintainance window; one of WEEKLY , FORTNIGHTLY , MONTHLY or DAILY . |
string |
WEEKLY |
day_of_month |
Day of the month for the maintainance window if the frequency is set to MONTHLY ; between 1 & 28 . |
number |
1 |
day_of_week |
Day of the week for the maintainance window if the frequency is set to WEEKLY or FORTNIGHTLY ; one of MONDAY , TUESDAY , WEDNESDAY , THURSDAY , FRIDAY , SATURDAY or SUNDAY . |
string |
SUNDAY |
start_time |
Start time for the maintainance window adjusted against UTC by the utc_offset ; in the format HH:mm . |
string |
00:00 |
duration |
Duration of the maintainance window in hours. | number |
4 |
Specification for the maintenance_window.not_allowed
object.
Variable | Description | Type | Default |
---|---|---|---|
start |
Start time for a window when maintenance is not allowed; in RFC 3339 format. | string |
|
end |
End time for a window when maintenance is not allowed; in RFC 3339 format. | string |
Variable | Description | Type |
---|---|---|
cluster_id |
ID of the Azure Kubernetes Service (AKS) managed cluster. | string |
cluster_name |
Name of the Azure Kubernetes Service (AKS) managed cluster. | string |
cluster_version |
Version of the Azure Kubernetes Service (AKS) managed cluster (<major>.<minor> ). |
string |
cluster_version_full |
Full version of the Azure Kubernetes Service (AKS) managed cluster (<major>.<minor>.<patch> ). |
string |
latest_version_full |
Latest full Kubernetes version the Azure Kubernetes Service (AKS) managed cluster could be on (<major>.<minor>.<patch> ). |
string |
cluster_fqdn |
FQDN of the Azure Kubernetes Service managed cluster. | string |
cluster_endpoint |
Endpoint for the Azure Kubernetes Service managed cluster API server. | string |
cluster_certificate_authority_data |
Base64 encoded certificate data for the Azure Kubernetes Service managed cluster API server. | string |
node_resource_group_name |
Auto-generated resource group which contains the resources for this managed Kubernetes cluster. | string |
effective_outbound_ips |
Outbound IPs from the Azure Kubernetes Service cluster managed load balancer (this will be an empty array if the cluster is uisng a user-assigned NAT Gateway). | list(string) |
cluster_identity |
User assigned identity used by the cluster. | object |
kubelet_identity |
Kubelet identity. | object |
cert_manager_identity |
Identity that Cert Manager uses. | object |
coredns_custom_config_map_name |
Name of the CoreDNS custom ConfigMap , if external config has been enabled. |
string |
coredns_custom_config_map_namespace |
Namespace of the CoreDNS custom ConfigMap , if external config has been enabled. |
object |
dashboards |
Dashboards exposed. | object |
external_dns_private_identity |
Identity that private ExternalDNS uses. | object |
external_dns_public_identity |
Identity that public ExternalDNS uses. | object |
fluent_bit_aggregator_identity |
Identity that Fluent Bit Aggregator uses. | object |
fluentd_identity |
Identity that Fluentd uses. | object |
grafana_identity |
Identity that Grafana uses. | object |
internal_lb_source_ranges |
All internal CIDRs. | string |
oms_agent_identity |
Identity that the OMS agent uses. | object |
windows_config |
Windows configuration. | object |