Skip to content

Commit

Permalink
Merge pull request #1 from dominodatalab/miguelhar.PLAT-4955.initial-…
Browse files Browse the repository at this point in the history
…commit

Miguelhar.plat 4955.initial commit
  • Loading branch information
miguelhar authored Sep 6, 2022
2 parents d172d57 + 83096cd commit b4af957
Show file tree
Hide file tree
Showing 66 changed files with 8,941 additions and 2 deletions.
4 changes: 4 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# CODEOWNERS
# https://help.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-file-location

* @dominodatalab/platform
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# Local .terraform directories
**/.terraform/*

**/resources/*
# .tfstate files
*.tfstate
*.tfstate.*

**.terraform.lock.hcl*
**.terraform.lock.hcl
# Crash log files
crash.log

Expand All @@ -27,3 +29,11 @@ override.tf.json

# Include tfplan files to ignore the plan output of command: terraform plan -out=tfplan
# example: *tfplan*
aws-auth.yaml
domino.pem
domino.pem.pub
k8s-functions.sh
k8s-pre-setup.sh
kubeconfig
mallory.json
domino.yml
41 changes: 41 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-merge-conflict
- id: end-of-file-fixer
- id: no-commit-to-branch
- id: check-case-conflict
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.74.1
hooks:
- id: terraform_docs
args:
- '--hook-config=--path-to-file=README.md'
- '--hook-config=--add-to-existing-file=true'
- '--hook-config=--create-file-if-not-exist=true'
- '--hook-config=--recursive.enabled=true'
- '--hook-config=--recursive.path=submodules'
- id: terraform_fmt
- id: terraform_tflint
args:
- '--args=--config=__GIT_WORKING_DIR__/.tflint.hcl'
- '--args=--only=terraform_deprecated_interpolation'
- '--args=--only=terraform_deprecated_index'
- '--args=--only=terraform_unused_declarations'
- '--args=--only=terraform_comment_syntax'
- '--args=--only=terraform_documented_outputs'
- '--args=--only=terraform_documented_variables'
- '--args=--only=terraform_typed_variables'
- '--args=--only=terraform_module_pinned_source'
- '--args=--only=terraform_naming_convention'
- '--args=--only=terraform_required_version'
- '--args=--only=terraform_required_providers'
- '--args=--only=terraform_standard_module_structure'
- '--args=--only=terraform_workspace_remote'
- id: terraform_validate
# - id: terrascan # Skipping until they update lifecycle block; Data resources do not have lifecycle settings, so a lifecycle block is not allowed.
# args:
# - '--args=--non-recursive'
# - '--args=--policy-type=aws'
# - '--args=--skip-rules=AC_AWS_0369' #Flow logs are enabled, terrascan does not follow the logical path of the resource
6 changes: 6 additions & 0 deletions .tflint.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
plugin "aws" {
enabled = true
deep_check = true
version = "0.14.0"
source = "github.com/terraform-linters/tflint-ruleset-aws"
}
142 changes: 141 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,142 @@
# terraform-aws-eks
Terraform module for deploying a Domino on EKS

## Create SSH Key pair
### Prerequisites
* Host with `ssh-keygen` installed

### Command
```bash
ssh-keygen -q -P '' -t rsa -b 4096 -m PEM -f domino.pem
```

## Create terraform remote state bucket(OPTIONAL)
* Authenticate with aws, make sure that environment variables: `AWS_REGION`, `AWS_ACCESS_KEY_ID` ,`AWS_SECRET_ACCESS_KEY` are set. If your account has MFA set up you will also need `AWS_SESSION_TOKEN`.

### Prerequisites
* [awscli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
* jq (Optional, it parses the api response)

#### 1. Create Bucket(if you already have a bucket just set the `AWS_TERRAFORM_REMOTE_STATE_BUCKET` to its name, and skip this step):
```bash
export AWS_ACCOUNT="$(aws sts get-caller-identity | jq -r .Account)"
export AWS_TERRAFORM_REMOTE_STATE_BUCKET="domino-terraform-rs-${AWS_ACCOUNT}-${AWS_REGION}"

aws s3api create-bucket \
--bucket "${AWS_TERRAFORM_REMOTE_STATE_BUCKET}" \
--region ${AWS_REGION} \
--create-bucket-configuration LocationConstraint="${AWS_REGION}" | jq .
```

#### Verify bucket exists

```bash
aws s3api head-bucket --bucket "${AWS_TERRAFORM_REMOTE_STATE_BUCKET}"
```
You should NOT see an error.

## 2. Initialize the terraform remote-state
Create a file called terraform.tf(the name does not matter) with the following content
```hcl
terraform {
backend "s3" {}
}
```

```bash
### Set the deploy id. This will be used later as well.
export TF_VAR_deploy_id="domino-eks-1" ## <-- Feel free to rename.
terraform init -migrate-state \
-backend-config="bucket=${AWS_TERRAFORM_REMOTE_STATE_BUCKET}" \
-backend-config="key=domino-eks/${TF_VAR_deploy_id}" \
-backend-config="region=${AWS_REGION}"
```



## If you need to delete the bucket

```bash

aws s3 rb s3://"${AWS_TERRAFORM_REMOTE_STATE_BUCKET}" --force
```

# Terraform-docs

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.2.0 |
| <a name="requirement_aws"></a> [aws](#requirement\_aws) | >= 4.0 |
| <a name="requirement_local"></a> [local](#requirement\_local) | >= 2.2.0 |
| <a name="requirement_null"></a> [null](#requirement\_null) | >= 3.1.1 |
| <a name="requirement_tls"></a> [tls](#requirement\_tls) | >= 3.4.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_aws"></a> [aws](#provider\_aws) | 4.26.0 |
| <a name="provider_null"></a> [null](#provider\_null) | 3.1.1 |
| <a name="provider_tls"></a> [tls](#provider\_tls) | 4.0.1 |

## Modules

| Name | Source | Version |
|------|--------|---------|
| <a name="module_bastion"></a> [bastion](#module\_bastion) | ./submodules/bastion | n/a |
| <a name="module_eks"></a> [eks](#module\_eks) | ./submodules/eks | n/a |
| <a name="module_k8s_setup"></a> [k8s\_setup](#module\_k8s\_setup) | ./submodules/k8s | n/a |
| <a name="module_network"></a> [network](#module\_network) | ./submodules/network | n/a |
| <a name="module_storage"></a> [storage](#module\_storage) | ./submodules/storage | n/a |
| <a name="module_subnets_cidr"></a> [subnets\_cidr](#module\_subnets\_cidr) | ./submodules/subnets-cidr | n/a |

## Resources

| Name | Type |
|------|------|
| [aws_key_pair.domino](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/key_pair) | resource |
| [aws_availability_zones.available](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/availability_zones) | data source |
| [aws_ec2_instance_type_offerings.nodes](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/data-sources/ec2_instance_type_offerings) | data source |
| [null_data_source.validate_zones](https://registry.terraform.io/providers/hashicorp/null/latest/docs/data-sources/data_source) | data source |
| [tls_public_key.domino](https://registry.terraform.io/providers/hashicorp/tls/latest/docs/data-sources/public_key) | data source |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_additional_node_groups"></a> [additional\_node\_groups](#input\_additional\_node\_groups) | Additional EKS managed node groups definition. | <pre>map(object({<br> ami = optional(string)<br> name = string<br> instance_type = string<br> min_per_az = number<br> max_per_az = number<br> desired_per_az = number<br> label = string<br> volume = object({<br> size = string<br> type = string<br> })<br> }))</pre> | `{}` | no |
| <a name="input_availability_zones"></a> [availability\_zones](#input\_availability\_zones) | List of Availibility zones to distribute the deployment, EKS needs at least 2,https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html.<br> Note that setting this variable bypasses validation of the status of the zones data 'aws\_availability\_zones' 'available'.<br> Caller is responsible for validating status of these zones. | `list(string)` | `[]` | no |
| <a name="input_base_cidr_block"></a> [base\_cidr\_block](#input\_base\_cidr\_block) | CIDR block to serve the main private and public subnets. | `string` | `"10.0.0.0/16"` | no |
| <a name="input_bastion_ami_id"></a> [bastion\_ami\_id](#input\_bastion\_ami\_id) | AMI ID for the bastion EC2 instance, otherwise we will use the latest 'amazon\_linux\_2' ami | `string` | `""` | no |
| <a name="input_create_bastion"></a> [create\_bastion](#input\_create\_bastion) | Create bastion toggle. | `bool` | `false` | no |
| <a name="input_default_node_groups"></a> [default\_node\_groups](#input\_default\_node\_groups) | EKS managed node groups definition. | <pre>object({<br> compute = object({<br> name = string<br> ami = optional(string)<br> instance_type = string<br> min_per_az = number<br> max_per_az = number<br> desired_per_az = number<br> volume = object({<br> size = string<br> type = string<br> })<br> }),<br> platform = object({<br> name = string<br> ami = optional(string)<br> instance_type = string<br> min_per_az = number<br> max_per_az = number<br> desired_per_az = number<br> volume = object({<br> size = string<br> type = string<br> })<br> }),<br> gpu = object({<br> name = string<br> ami = optional(string)<br> instance_type = string<br> min_per_az = number<br> max_per_az = number<br> desired_per_az = number<br> volume = object({<br> size = string<br> type = string<br> })<br> })<br> })</pre> | <pre>{<br> "compute": {<br> "desired_per_az": 1,<br> "instance_type": "m5.2xlarge",<br> "max_per_az": 10,<br> "min_per_az": 0,<br> "name": "compute",<br> "volume": {<br> "size": "100",<br> "type": "gp3"<br> }<br> },<br> "gpu": {<br> "desired_per_az": 0,<br> "instance_type": "g4dn.xlarge",<br> "max_per_az": 10,<br> "min_per_az": 0,<br> "name": "gpu",<br> "volume": {<br> "size": "100",<br> "type": "gp3"<br> }<br> },<br> "platform": {<br> "desired_per_az": 1,<br> "instance_type": "m5.4xlarge",<br> "max_per_az": 10,<br> "min_per_az": 0,<br> "name": "platform",<br> "volume": {<br> "size": "100",<br> "type": "gp3"<br> }<br> }<br>}</pre> | no |
| <a name="input_deploy_id"></a> [deploy\_id](#input\_deploy\_id) | Domino Deployment ID. | `string` | `"domino-eks"` | no |
| <a name="input_efs_access_point_path"></a> [efs\_access\_point\_path](#input\_efs\_access\_point\_path) | Filesystem path for efs. | `string` | `"/domino"` | no |
| <a name="input_eks_master_role_names"></a> [eks\_master\_role\_names](#input\_eks\_master\_role\_names) | IAM role names to be added as masters in eks. | `list(string)` | `[]` | no |
| <a name="input_enable_vpc_endpoints_s3"></a> [enable\_vpc\_endpoints\_s3](#input\_enable\_vpc\_endpoints\_s3) | Enable VPC endpoints for S3 service. This is intented for mission critical, highly available deployments | `bool` | `false` | no |
| <a name="input_k8s_version"></a> [k8s\_version](#input\_k8s\_version) | EKS cluster k8s version. | `string` | `"1.23"` | no |
| <a name="input_number_of_azs"></a> [number\_of\_azs](#input\_number\_of\_azs) | Number of AZ to distribute the deployment, EKS needs at least 2. | `number` | `3` | no |
| <a name="input_private_cidr_network_bits"></a> [private\_cidr\_network\_bits](#input\_private\_cidr\_network\_bits) | Number of network bits to allocate to the private subnet. i.e /19 -> 8,192 IPs. | `number` | `19` | no |
| <a name="input_public_cidr_network_bits"></a> [public\_cidr\_network\_bits](#input\_public\_cidr\_network\_bits) | Number of network bits to allocate to the public subnet. i.e /27 -> 32 IPs. | `number` | `27` | no |
| <a name="input_region"></a> [region](#input\_region) | AWS region for the deployment | `string` | n/a | yes |
| <a name="input_route53_hosted_zone_name"></a> [route53\_hosted\_zone\_name](#input\_route53\_hosted\_zone\_name) | AWS Route53 Hosted zone. | `string` | n/a | yes |
| <a name="input_s3_force_destroy_on_deletion"></a> [s3\_force\_destroy\_on\_deletion](#input\_s3\_force\_destroy\_on\_deletion) | Toogle to allow recursive deletion of all objects in the s3 buckets. if 'false' terraform will NOT be able to delete non-empty buckets | `bool` | `false` | no |
| <a name="input_ssh_pvt_key_path"></a> [ssh\_pvt\_key\_path](#input\_ssh\_pvt\_key\_path) | SSH private key filepath. | `string` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | Deployment tags. | `map(string)` | `{}` | no |
| <a name="input_vpc_id"></a> [vpc\_id](#input\_vpc\_id) | VPC ID for bringing your own vpc, will bypass creation of such. | `string` | `""` | no |

## Outputs

| Name | Description |
|------|-------------|
| <a name="output_deploy_id"></a> [deploy\_id](#output\_deploy\_id) | Deployment ID. |
| <a name="output_efs_access_point_id"></a> [efs\_access\_point\_id](#output\_efs\_access\_point\_id) | EFS access\_point id |
| <a name="output_efs_file_system_id"></a> [efs\_file\_system\_id](#output\_efs\_file\_system\_id) | EFS filesystem id |
| <a name="output_efs_volume_handle"></a> [efs\_volume\_handle](#output\_efs\_volume\_handle) | EFS volume handle <filesystem id id>::<accesspoint id> |
| <a name="output_hostname"></a> [hostname](#output\_hostname) | Domino instance URL. |
| <a name="output_k8s_tunnel_command"></a> [k8s\_tunnel\_command](#output\_k8s\_tunnel\_command) | Command to run the k8s tunnel mallory. |
| <a name="output_region"></a> [region](#output\_region) | Deployment region. |
| <a name="output_ssh_bastion_command"></a> [ssh\_bastion\_command](#output\_ssh\_bastion\_command) | Command to ssh into the bastion host |
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
156 changes: 156 additions & 0 deletions main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Validating zone offerings.

# Check the zones where the instance types are being offered
data "aws_ec2_instance_type_offerings" "nodes" {
for_each = merge(var.default_node_groups, var.additional_node_groups)

filter {
name = "instance-type"
values = [each.value.instance_type]
}

location_type = "availability-zone"

lifecycle {
# Validating the number of zones is greater than 2. EKS needs at least 2.
postcondition {
condition = length(toset(self.locations)) >= 2
error_message = "Availability of the instance types does not satisfy the number of zones"
}
}
}

# Get "available" azs for the region
data "aws_availability_zones" "available" {
state = "available"
filter {
name = "region-name"
values = [var.region]
}
}

locals {
# Get zones where ALL instance types are offered(intersection).
zone_intersection_instance_offerings = setintersection([for k, v in data.aws_ec2_instance_type_offerings.nodes : toset(v.locations)]...)
# Get the zones that are available and offered in the region for the instance types.
az_names = length(var.availability_zones) > 0 ? var.availability_zones : data.aws_availability_zones.available.names
offered_azs = setintersection(local.zone_intersection_instance_offerings, toset(local.az_names))
available_azs_data = zipmap(data.aws_availability_zones.available.names, data.aws_availability_zones.available.zone_ids)
# Getting the required azs name and id.
bastion_user = "ec2-user"
working_dir = path.cwd
ssh_pvt_key_path = "${local.working_dir}/${var.ssh_pvt_key_path}"
kubeconfig_path = "${local.working_dir}/kubeconfig"
}

# Validate that the number of offered and available zones satisfy the number of required zones. https://github.com/hashicorp/terraform/issues/31122 may result in a more elegant validation and deprecation of the null_data_source
data "null_data_source" "validate_zones" {
inputs = {
validated = true
}
lifecycle {
precondition {
condition = length(local.offered_azs) >= var.number_of_azs
error_message = "Availability of the instance types does not satisfy the desired number of zones, or the desired number of zones is higher than the available/offered zones"
}
}
}

locals {
availability_zones = { for name in slice(tolist(local.offered_azs), 0, var.number_of_azs) : name => local.available_azs_data[name] if data.null_data_source.validate_zones.outputs["validated"] }
}

## Importing SSH pvt key to access bastion and EKS nodes

data "tls_public_key" "domino" {
private_key_openssh = file(var.ssh_pvt_key_path)
}

resource "aws_key_pair" "domino" {
key_name = var.deploy_id
public_key = trimspace(data.tls_public_key.domino.public_key_openssh)
}

module "subnets_cidr" {
source = "./submodules/subnets-cidr"
availability_zones = local.availability_zones
base_cidr_block = var.base_cidr_block
public_cidr_network_bits = var.public_cidr_network_bits
private_cidr_network_bits = var.private_cidr_network_bits
subnet_name_prefix = var.deploy_id
}

module "network" {
source = "./submodules/network"
region = var.region
public_subnets = module.subnets_cidr.public_subnets
private_subnets = module.subnets_cidr.private_subnets
deploy_id = var.deploy_id
base_cidr_block = var.base_cidr_block
vpc_id = var.vpc_id
enable_vpc_endpoints_s3 = var.enable_vpc_endpoints_s3
monitoring_s3_bucket_arn = module.storage.monitoring_s3_bucket_arn

}

locals {
public_subnets = module.network.public_subnets
private_subnets = module.network.private_subnets
}

module "storage" {
source = "./submodules/storage"
deploy_id = var.deploy_id
efs_access_point_path = var.efs_access_point_path
s3_force_destroy_on_deletion = var.s3_force_destroy_on_deletion
subnets = [for s in local.private_subnets : {
name = s.name
id = s.id
cidr_block = s.cidr_block
}]
vpc_id = module.network.vpc_id
}

module "bastion" {
count = var.create_bastion ? 1 : 0

source = "./submodules/bastion"
region = var.region
vpc_id = module.network.vpc_id
deploy_id = var.deploy_id
ssh_pvt_key_path = aws_key_pair.domino.key_name
bastion_public_subnet_id = local.public_subnets[0].id
bastion_ami_id = var.bastion_ami_id
}

module "eks" {
source = "./submodules/eks"
region = var.region
k8s_version = var.k8s_version
vpc_id = module.network.vpc_id
deploy_id = var.deploy_id
private_subnets = local.private_subnets
ssh_pvt_key_path = aws_key_pair.domino.key_name
route53_hosted_zone_name = var.route53_hosted_zone_name
bastion_security_group_id = try(module.bastion[0].security_group_id, "")
create_bastion_sg = var.create_bastion
kubeconfig_path = local.kubeconfig_path
default_node_groups = var.default_node_groups
additional_node_groups = var.additional_node_groups
s3_buckets = module.storage.s3_buckets
}

module "k8s_setup" {
source = "./submodules/k8s"
ssh_pvt_key_path = abspath(local.ssh_pvt_key_path)
bastion_user = local.bastion_user
bastion_public_ip = try(module.bastion[0].public_ip, "")
k8s_cluster_endpoint = module.eks.cluster_endpoint
managed_nodes_role_arns = module.eks.managed_nodes_role_arns
eks_master_role_names = concat(var.eks_master_role_names, module.eks.eks_master_role_name)
kubeconfig_path = local.kubeconfig_path
depends_on = [
module.eks,
module.bastion
]
}
Loading

0 comments on commit b4af957

Please sign in to comment.