-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EKS Managed Node Groups - Change Disk Size #3180
Comments
I have done some more reseach on this and I updated the managed node group using the following directives:
Terraform apply (using a test cluster of course) created a new managed node group, drained and destroyed the old node group succesfully. On checking the test cluster pods though I noticed that the aws_load_balancer_controller addon fails to start on the new nodes with the following Error log:
Any clues why this may have happened ? Cheers |
I was able to rectify the error with the aws-load-balancer-controller deployment by adding the container arg --aws-vpc-id=XXXXXXXX and restarting the deployment. Question is why did the managed node group replacement initiated by the terraform changes above suddenly break the ability of the nodes to access the EC2Metadata thus requiring explicit declaration of the VPC ID ? Potentially using an updated AMI release ? May need to test this ..... |
More information on this. I managed to change the terraform scripts to allow disk_size setup without breaking the aws load balancer controller deployment. First add these directives to your managed node group setup. i.e.
In your eks_addon terraform script make sure you add the necessary aws_load_balancer_controller helm configuration option to set the vpcId :
This way the controller doesn't have to query the nodes EC2metadata ( and fail due to hop limits) for the VpcId ... Roudabout but works from my tests. Previous questions still apply though :-) Cheers |
You could try the docs https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/faq.md |
I am wondering why changing 20GB is very small for e.g., AI workloads which have big containers. Any insights? |
this is not a module question - the module defaults to creating and using a custom launch template in order to give users the widest array of options for customization. This is just how EKS managed node groups work. See https://docs.aws.amazon.com/eks/latest/userguide/launch-templates.html |
Hi |
@bryantbiggs what is you recommendation for the following use case: "In my EKS nodes, I need 100GB storage to download and store my container images in order to run them".
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 100
volume_type = "gp3"
iops = 3000
throughput = 125
encrypted = true
delete_on_termination = true
}
}
}
|
2 |
@tonydekeizer in case you haven't done it. Add the section specified by @poussa inside of the eks_manage_node_groups.
|
Hi @nelsonpipo Thank you. Yes, we have tested this and it works but added the details as per @poussa post i.e
I see you suggest using the devlink sda device mapping .. is that for a particular reason ? Cheers |
if you describe the AMI, you will see the device names used with the AMI - the Amazon Linux (AL2, AL2023) are typically AL2023:aws ec2 describe-images --image-id $(aws ssm get-parameter --name /aws/service/eks/optimized-ami/1.31/amazon-linux-2023/x86_64/standard/recommended/image_id \
--region us-west-2 --query "Parameter.Value" --output text) --region us-west-2 {
"Images": [
{
"PlatformDetails": "Linux/UNIX",
"UsageOperation": "RunInstances",
"BlockDeviceMappings": [
{
"Ebs": {
"DeleteOnTermination": true,
"Iops": 3000,
"SnapshotId": "snap-0be3ceb0e2f0255e7",
"VolumeSize": 20,
"VolumeType": "gp3",
"Throughput": 125,
"Encrypted": false
},
"DeviceName": "/dev/xvda"
}
],
"Description": "EKS-optimized Kubernetes node based on Amazon Linux 2023, (k8s: 1.31.0, containerd: 1.7.*)",
"EnaSupport": true,
"Hypervisor": "xen",
"ImageOwnerAlias": "amazon",
"Name": "amazon-eks-node-al2023-x86_64-standard-1.31-v20241024",
"RootDeviceName": "/dev/xvda",
"RootDeviceType": "ebs",
"SriovNetSupport": "simple",
"VirtualizationType": "hvm",
"BootMode": "uefi-preferred",
"DeprecationTime": "2026-10-24T07:07:53.000Z",
"ImdsSupport": "v2.0",
"ImageId": "ami-00369ea992801deb2",
"ImageLocation": "amazon/amazon-eks-node-al2023-x86_64-standard-1.31-v20241024",
"State": "available",
"OwnerId": "602401143452",
"CreationDate": "2024-10-24T07:07:53.000Z",
"Public": true,
"Architecture": "x86_64",
"ImageType": "machine"
}
]
} Bottlerocketaws ec2 describe-images --image-id $(aws ssm get-parameter --name /aws/service/bottlerocket/aws-k8s-1.31/x86_64/latest/image_id \
--region us-west-2 --query "Parameter.Value" --output text) --region us-west-2 {
"Images": [
{
"PlatformDetails": "Linux/UNIX",
"UsageOperation": "RunInstances",
"BlockDeviceMappings": [
{
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-0fa9c2e0271b950f3",
"VolumeSize": 2,
"VolumeType": "gp2",
"Encrypted": false
},
"DeviceName": "/dev/xvda"
},
{
"Ebs": {
"DeleteOnTermination": true,
"SnapshotId": "snap-0abff213fa53bbbef",
"VolumeSize": 20,
"VolumeType": "gp2",
"Encrypted": false
},
"DeviceName": "/dev/xvdb"
}
],
"Description": "bottlerocket-aws-k8s-1.31-x86_64-v1.26.1-943d9a41",
"EnaSupport": true,
"Hypervisor": "xen",
"ImageOwnerAlias": "amazon",
"Name": "bottlerocket-aws-k8s-1.31-x86_64-v1.26.1-943d9a41",
"RootDeviceName": "/dev/xvda",
"RootDeviceType": "ebs",
"SriovNetSupport": "simple",
"VirtualizationType": "hvm",
"BootMode": "uefi-preferred",
"DeprecationTime": "2026-10-24T21:49:18.000Z",
"ImageId": "ami-056fd8b527acedaca",
"ImageLocation": "amazon/bottlerocket-aws-k8s-1.31-x86_64-v1.26.1-943d9a41",
"State": "available",
"OwnerId": "651937483462",
"CreationDate": "2024-10-24T21:49:18.000Z",
"Public": true,
"Architecture": "x86_64",
"ImageType": "machine"
}
]
} |
Thnaks @bryantbiggs , makes sense. :-) |
Description
We have a EKS terraform script that generates a EKS Cluster with a EKS Managed node group.
Here is a code snippet of the node group section:
The cluster and node group creates correctly during the terraform apply and is in production operation.
We have subsequently noticed we need to add more storage to the nodes (logging/images etc) and had not realised the default launch template defaults to 20G Storage per node.
We added the following directive to the above:
disk_size = 100
On running terraform plan it indicates no changes are required.
On doing some research it was suggested to add two additional directives to the default_node_group settinsg to force the change.
On doing a terraform plan we get the following error:
We are unsure how to get terraform to destroy and build the new node group with the increased disk size. We do not want to do this manually and get our production terraform state out of sync with the actual setup in EKS.
Not sure if this is a bug or we are providing the incorrect directives in our terraform EKS script ?
Cheers
Tony
Versions
Module version [Required]:
20.14.0
Terraform version:
1.9.0
Provider version(s):
Reproduction Code [Required]
See above
Steps to reproduce the behavior:
See above
No
Yes
see above
Expected behavior
Terraform Plan and apply will destory existing managed nod egroup and add new one using modified launch template and hence increased disk size.
Actual behavior
Terraform plan/apply either do nothing or get an error.
Terminal Output Screenshot(s)
see above
Additional context
The text was updated successfully, but these errors were encountered: