Skip to content

Commit

Permalink
force selection of AZs IDs for each node pool (#39)
Browse files Browse the repository at this point in the history
* force selection of AZs IDs for each node pool

* handle AZs subnet creation by name for backwards compat

* require AZ IDs selection by full name

* style changes

* fix ENIConfig name as it must match topology label
  • Loading branch information
steved authored Feb 24, 2023
1 parent f2dd3ca commit b5c25d8
Show file tree
Hide file tree
Showing 11 changed files with 170 additions and 167 deletions.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,9 @@ aws s3 rb s3://"${AWS_TERRAFORM_REMOTE_STATE_BUCKET}" --force
| [aws_key_pair.domino](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/key_pair) | resource |
| [aws_kms_alias.domino](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_alias) | resource |
| [aws_kms_key.domino](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/kms_key) | resource |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_caller_identity.aws_account](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/caller_identity) | data source |
| [aws_default_tags.this](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/default_tags) | data source |
| [aws_ec2_instance_type.all](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ec2_instance_type) | data source |
| [aws_ec2_instance_type_offerings.nodes](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ec2_instance_type_offerings) | data source |
| [aws_iam_policy_document.kms_key_global](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_iam_policy_document.route53](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/iam_policy_document) | data source |
| [aws_kms_key.key](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/kms_key) | data source |
Expand All @@ -118,12 +116,11 @@ aws s3 rb s3://"${AWS_TERRAFORM_REMOTE_STATE_BUCKET}" --force

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_additional_node_groups"></a> [additional\_node\_groups](#input\_additional\_node\_groups) | Additional EKS managed node groups definition. | <pre>map(object({<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = list(string)<br> spot = optional(bool, false)<br> min_per_az = number<br> max_per_az = number<br> desired_per_az = number<br> labels = map(string)<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = object({<br> size = string<br> type = string<br> })<br> }))</pre> | `{}` | no |
| <a name="input_availability_zones"></a> [availability\_zones](#input\_availability\_zones) | List of Availibility zones to distribute the deployment, EKS needs at least 2,https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html.<br> Note that setting this variable bypasses validation of the status of the zones data 'aws\_availability\_zones' 'available'.<br> Caller is responsible for validating status of these zones. | `list(string)` | `[]` | no |
| <a name="input_additional_node_groups"></a> [additional\_node\_groups](#input\_additional\_node\_groups) | Additional EKS managed node groups definition. | <pre>map(object({<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = list(string)<br> spot = optional(bool, false)<br> min_per_az = number<br> max_per_az = number<br> desired_per_az = number<br> availability_zone_ids = list(string)<br> labels = map(string)<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = object({<br> size = string<br> type = string<br> })<br> }))</pre> | `{}` | no |
| <a name="input_bastion"></a> [bastion](#input\_bastion) | if specifed, a bastion is created with the specified details | <pre>object({<br> ami = optional(string, null) # default will use the latest 'amazon_linux_2' ami<br> instance_type = optional(string, "t2.micro")<br> authorized_ssh_ip_ranges = optional(list(string), ["0.0.0.0/0"])<br> })</pre> | `null` | no |
| <a name="input_cidr"></a> [cidr](#input\_cidr) | The IPv4 CIDR block for the VPC. | `string` | `"10.0.0.0/16"` | no |
| <a name="input_create_efs_backup_vault"></a> [create\_efs\_backup\_vault](#input\_create\_efs\_backup\_vault) | Create backup vault for EFS toggle. | `bool` | `true` | no |
| <a name="input_default_node_groups"></a> [default\_node\_groups](#input\_default\_node\_groups) | EKS managed node groups definition. | <pre>object(<br> {<br> compute = object(<br> {<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = optional(list(string), ["m5.2xlarge"])<br> spot = optional(bool, false)<br> min_per_az = optional(number, 0)<br> max_per_az = optional(number, 10)<br> desired_per_az = optional(number, 1)<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "default"<br> })<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = optional(object(<br> {<br> size = optional(number, 100)<br> type = optional(string, "gp3")<br> }),<br> {<br> size = 100<br> type = "gp3"<br> }<br> )<br> }),<br> platform = object(<br> {<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = optional(list(string), ["m5.4xlarge"])<br> spot = optional(bool, false)<br> min_per_az = optional(number, 1)<br> max_per_az = optional(number, 10)<br> desired_per_az = optional(number, 1)<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "platform"<br> })<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = optional(object(<br> {<br> size = optional(number, 100)<br> type = optional(string, "gp3")<br> }),<br> {<br> size = 100<br> type = "gp3"<br> }<br> )<br> }),<br> gpu = object(<br> {<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = optional(list(string), ["g4dn.xlarge"])<br> spot = optional(bool, false)<br> min_per_az = optional(number, 0)<br> max_per_az = optional(number, 10)<br> desired_per_az = optional(number, 0)<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "default-gpu"<br> "nvidia.com/gpu" = true<br> })<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [<br> { key = "nvidia.com/gpu", value = "true", effect = "NO_SCHEDULE" }<br> ])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = optional(object(<br> {<br> size = optional(number, 100)<br> type = optional(string, "gp3")<br> }),<br> {<br> size = 100<br> type = "gp3"<br> }<br> )<br> })<br> })</pre> | <pre>{<br> "compute": {},<br> "gpu": {},<br> "platform": {}<br>}</pre> | no |
| <a name="input_default_node_groups"></a> [default\_node\_groups](#input\_default\_node\_groups) | EKS managed node groups definition. | <pre>object(<br> {<br> compute = object(<br> {<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = optional(list(string), ["m5.2xlarge"])<br> spot = optional(bool, false)<br> min_per_az = optional(number, 0)<br> max_per_az = optional(number, 10)<br> desired_per_az = optional(number, 1)<br> availability_zone_ids = list(string)<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "default"<br> })<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = optional(object(<br> {<br> size = optional(number, 100)<br> type = optional(string, "gp3")<br> }),<br> {<br> size = 100<br> type = "gp3"<br> }<br> )<br> }),<br> platform = object(<br> {<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = optional(list(string), ["m5.4xlarge"])<br> spot = optional(bool, false)<br> min_per_az = optional(number, 1)<br> max_per_az = optional(number, 10)<br> desired_per_az = optional(number, 1)<br> availability_zone_ids = list(string)<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "platform"<br> })<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = optional(object(<br> {<br> size = optional(number, 100)<br> type = optional(string, "gp3")<br> }),<br> {<br> size = 100<br> type = "gp3"<br> }<br> )<br> }),<br> gpu = object(<br> {<br> ami = optional(string, null)<br> bootstrap_extra_args = optional(string, "")<br> instance_types = optional(list(string), ["g4dn.xlarge"])<br> spot = optional(bool, false)<br> min_per_az = optional(number, 0)<br> max_per_az = optional(number, 10)<br> desired_per_az = optional(number, 0)<br> availability_zone_ids = list(string)<br> labels = optional(map(string), {<br> "dominodatalab.com/node-pool" = "default-gpu"<br> "nvidia.com/gpu" = true<br> })<br> taints = optional(list(object({ key = string, value = optional(string), effect = string })), [<br> { key = "nvidia.com/gpu", value = "true", effect = "NO_SCHEDULE" }<br> ])<br> tags = optional(map(string), {})<br> gpu = optional(bool, null)<br> volume = optional(object(<br> {<br> size = optional(number, 100)<br> type = optional(string, "gp3")<br> }),<br> {<br> size = 100<br> type = "gp3"<br> }<br> )<br> })<br> })</pre> | n/a | yes |
| <a name="input_deploy_id"></a> [deploy\_id](#input\_deploy\_id) | Domino Deployment ID. | `string` | `"domino-eks"` | no |
| <a name="input_ecr_force_destroy_on_deletion"></a> [ecr\_force\_destroy\_on\_deletion](#input\_ecr\_force\_destroy\_on\_deletion) | Toogle to allow recursive deletion of all objects in the ECR repositories. if 'false' terraform will NOT be able to delete non-empty repositories | `bool` | `false` | no |
| <a name="input_efs_access_point_path"></a> [efs\_access\_point\_path](#input\_efs\_access\_point\_path) | Filesystem path for efs. | `string` | `"/domino"` | no |
Expand All @@ -136,7 +133,6 @@ aws s3 rb s3://"${AWS_TERRAFORM_REMOTE_STATE_BUCKET}" --force
| <a name="input_k8s_version"></a> [k8s\_version](#input\_k8s\_version) | EKS cluster k8s version. | `string` | `"1.24"` | no |
| <a name="input_kms_key_id"></a> [kms\_key\_id](#input\_kms\_key\_id) | if use\_kms is set, use the specified KMS key | `string` | `null` | no |
| <a name="input_kubeconfig_path"></a> [kubeconfig\_path](#input\_kubeconfig\_path) | fully qualified path name to write the kubeconfig file | `string` | `""` | no |
| <a name="input_number_of_azs"></a> [number\_of\_azs](#input\_number\_of\_azs) | Number of AZ to distribute the deployment, EKS needs at least 2. | `number` | `3` | no |
| <a name="input_pod_cidr"></a> [pod\_cidr](#input\_pod\_cidr) | The IPv4 CIDR block for the VPC. | `string` | `"100.64.0.0/16"` | no |
| <a name="input_pod_cidr_network_bits"></a> [pod\_cidr\_network\_bits](#input\_pod\_cidr\_network\_bits) | Number of network bits to allocate to the private subnet. i.e /19 -> 8,192 IPs. | `number` | `19` | no |
| <a name="input_pod_subnets"></a> [pod\_subnets](#input\_pod\_subnets) | Optional list of pod subnet ids | `list(string)` | `null` | no |
Expand Down
69 changes: 16 additions & 53 deletions main.tf
Original file line number Diff line number Diff line change
@@ -1,34 +1,3 @@
# Validating zone offerings.

# Check the zones where the instance types are being offered
data "aws_ec2_instance_type_offerings" "nodes" {
for_each = { for name, ng in merge(var.default_node_groups, var.additional_node_groups) : name => ng.instance_types }

filter {
name = "instance-type"
values = each.value
}

location_type = "availability-zone"

lifecycle {
# Validating the number of zones is greater than 2. EKS needs at least 2.
postcondition {
condition = length(toset(self.locations)) >= 2
error_message = "Availability of the instance types does not satisfy the number of zones"
}
}
}

# Get "available" azs for the region
data "aws_availability_zones" "available" {
state = "available"
filter {
name = "region-name"
values = [var.region]
}
}

data "aws_subnet" "public" {
count = var.vpc_id != null ? length(var.public_subnets) : 0
id = var.public_subnets[count.index]
Expand All @@ -45,15 +14,9 @@ data "aws_subnet" "pod" {
}

locals {
# Get zones where ALL instance types are offered(intersection).
zone_intersection_instance_offerings = setintersection([for k, v in data.aws_ec2_instance_type_offerings.nodes : toset(v.locations)]...)
# Get the zones that are available and offered in the region for the instance types.
az_names = var.vpc_id != null ? distinct(data.aws_subnet.private[*].availability_zone) : length(var.availability_zones) > 0 ? var.availability_zones : data.aws_availability_zones.available.names
offered_azs = setintersection(local.zone_intersection_instance_offerings, toset(local.az_names))
num_of_azs = var.vpc_id != null ? length(local.az_names) : var.number_of_azs

# error -> "Availability of the instance types does not satisfy the desired number of zones, or the desired number of zones is higher than the available/offered zones"
azs_to_use = slice(tolist(local.offered_azs), 0, local.num_of_azs)
az_ids = var.vpc_id != null ? distinct(data.aws_subnet.private[*].availability_zone) : distinct(flatten([for name, ng in local.node_groups : ng.availability_zone_ids]))
num_of_azs = length(local.az_ids)

kms_key_arn = var.use_kms ? try(data.aws_kms_key.key[0].arn, resource.aws_kms_key.domino[0].arn) : null
}
Expand Down Expand Up @@ -117,24 +80,24 @@ locals {
module "network" {
count = var.vpc_id == null ? 1 : 0

source = "./submodules/network"
deploy_id = var.deploy_id
region = var.region
cidr = var.cidr
pod_cidr = var.pod_cidr
use_pod_cidr = var.use_pod_cidr
availability_zones = local.azs_to_use
public_cidrs = local.public_cidr_blocks
private_cidrs = local.private_cidr_blocks
pod_cidrs = local.pod_cidr_blocks
flow_log_bucket_arn = { arn = module.storage.s3_buckets["monitoring"].arn }
source = "./submodules/network"
deploy_id = var.deploy_id
region = var.region
cidr = var.cidr
pod_cidr = var.pod_cidr
use_pod_cidr = var.use_pod_cidr
availability_zone_ids = local.az_ids
public_cidrs = local.public_cidr_blocks
private_cidrs = local.private_cidr_blocks
pod_cidrs = local.pod_cidr_blocks
flow_log_bucket_arn = { arn = module.storage.s3_buckets["monitoring"].arn }
}

locals {
vpc_id = var.vpc_id != null ? var.vpc_id : module.network[0].vpc_id
public_subnets = var.vpc_id != null ? [for s in data.aws_subnet.public : { subnet_id = s.id, az = s.availability_zone }] : module.network[0].public_subnets
private_subnets = var.vpc_id != null ? [for s in data.aws_subnet.private : { subnet_id = s.id, az = s.availability_zone }] : module.network[0].private_subnets
pod_subnets = var.vpc_id != null ? [for s in data.aws_subnet.pod : { subnet_id = s.id, az = s.availability_zone }] : module.network[0].pod_subnets
public_subnets = var.vpc_id != null ? [for s in data.aws_subnet.public : { subnet_id = s.id, az = s.availability_zone, az_id = s.availability_zone_id }] : module.network[0].public_subnets
private_subnets = var.vpc_id != null ? [for s in data.aws_subnet.private : { subnet_id = s.id, az = s.availability_zone, az_id = s.availability_zone_id }] : module.network[0].private_subnets
pod_subnets = var.vpc_id != null ? [for s in data.aws_subnet.pod : { subnet_id = s.id, az = s.availability_zone, az_id = s.availability_zone_id }] : module.network[0].pod_subnets
}

module "bastion" {
Expand Down
Loading

0 comments on commit b5c25d8

Please sign in to comment.