You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But the upgrade-manager sometimes gets an HTTP 403 when performing TerminateInstanceInAutoScalingGroup:
time="2020-12-16T21:55:39Z" level=debug msg="cache hit => false, service => autoscaling.TerminateInstanceInAutoScalingGroup"
2020-12-16T21:55:39.873Z ERROR controllers.RollingUpgrade AccessDenied {"rollingupgrade": "REDACTED", "instanceID": "REDACTED", "error": "AccessDenied: User: arn:aws:sts::REDACTED:assumed-role/REDACTED is not authorized to perform: autoscaling:TerminateInstanceInAutoScalingGroup\n\tstatus code: 403, request id: REDACTED"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/keikoproj/upgrade-manager/controllers.(*RollingUpgradeReconciler).error
/workspace/controllers/rollingupgrade_controller.go:1181
github.com/keikoproj/upgrade-manager/controllers.(*RollingUpgradeReconciler).TerminateNode
/workspace/controllers/rollingupgrade_controller.go:398
github.com/keikoproj/upgrade-manager/controllers.(*RollingUpgradeReconciler).DrainTerminate
/workspace/controllers/rollingupgrade_controller.go:994
github.com/keikoproj/upgrade-manager/controllers.(*RollingUpgradeReconciler).UpdateInstanceEager
/workspace/controllers/rollingupgrade_controller.go:977
github.com/keikoproj/upgrade-manager/controllers.(*RollingUpgradeReconciler).UpdateInstance
/workspace/controllers/rollingupgrade_controller.go:1051
It seems to happen when the instance was already terminated, the Autoscaling endpoint cannot back track the ASG from the instance ID anymore, thus no ASG tags can be matched from the IAM policy, resulting in a 403, it will retry a 2nd time, get the same, and set the RollingUpgrade status to "error".
What you expected to happen:
Maybe upgrade-manager should keep going on 403 from TerminateInstanceInAutoScalingGroup, assuming it was already terminated?
How to reproduce it (as minimally and precisely as possible):
You could try:
Use the IAM policy shared here
Start rolling upgrade
Terminate a worker before upgrade-manager gets its chance
Is this a BUG REPORT or FEATURE REQUEST?: Not sure
What happened:
We have been experimenting with the following IAM policy which restricts write operations the desired EKS workers:
But the upgrade-manager sometimes gets an HTTP 403 when performing
TerminateInstanceInAutoScalingGroup
:It seems to happen when the instance was already terminated, the Autoscaling endpoint cannot back track the ASG from the instance ID anymore, thus no ASG tags can be matched from the IAM policy, resulting in a 403, it will retry a 2nd time, get the same, and set the RollingUpgrade status to "error".
What you expected to happen:
Maybe upgrade-manager should keep going on 403 from
TerminateInstanceInAutoScalingGroup
, assuming it was already terminated?How to reproduce it (as minimally and precisely as possible):
You could try:
Anything else we need to know?:
Environment:
Other debugging information (if applicable):
The text was updated successfully, but these errors were encountered: