Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Fix env vars used in e2e Karpenter install and diff test scripts #7006

Closed

Conversation

bryantbiggs
Copy link
Member

@bryantbiggs bryantbiggs commented Sep 12, 2024

Fixes #N/A

Notice that the region is missing from the endpoint that is rendered in the script

image

Description

  • Fix env vars used in e2e Karpenter install and diff test scripts
    • Since the URI format for ECR is using the ECR DKR endpoint, there shouldn't need to be a difference between endpoints for public cluster vs private cluster (unless the images are located in a different accounts). Therefore the endpoints should be the same between public and private cluster usage which means the same account ID and region. If my understanding of
      - name: install-karpenter
      shell: bash
      env:
      ECR_ACCOUNT_ID: ${{ inputs.ecr_account_id }}
      ECR_REGION: ${{ inputs.ecr_region }}
      ACCOUNT_ID: ${{ inputs.account_id }}
      CLUSTER_NAME: ${{ inputs.cluster_name }}
      PRIVATE_CLUSTER: ${{ inputs.private_cluster }}
      run: |
      ./test/hack/e2e_scripts/install_karpenter.sh
      - name: diff-karpenter
      shell: bash
      env:
      ECR_ACCOUNT_ID: ${{ inputs.ecr_account_id }}
      ECR_REGION: ${{ inputs.ecr_region }}
      run: |
      ./test/hack/e2e_scripts/diff_karpenter.sh
      is correct - ACCOUND_ID and REGION are the account ID and region where the clusters are created whereas the ECR_ACCOUNT_ID and ECR_REGION are the account ID and region where the Helm chart and images are stored within ECR. The prior scripts were switching between a mix of cluster account ID and region and the ECR account ID and region, which not all of those values are provided to the script and resulted in incorrect template rendering

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@bryantbiggs bryantbiggs requested a review from a team as a code owner September 12, 2024 22:19
Copy link

netlify bot commented Sep 12, 2024

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit f175f32
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/66e431cddef9f50008a0ebf9

@bryantbiggs
Copy link
Member Author

cc @jigisha620

@coveralls
Copy link

coveralls commented Sep 12, 2024

Pull Request Test Coverage Report for Build 10849227974

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 83.025%

Totals Coverage Status
Change from base Build 10842727395: 0.0%
Covered Lines: 5512
Relevant Lines: 6639

💛 - Coveralls

Copy link
Contributor

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/karpenter snapshot

Copy link
Contributor

Snapshot successfully published to oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter:0-de4dda146397e7020060a40ea06067ba648f9070.
To install you must login to the ECR repo with an AWS account:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 021119463062.dkr.ecr.us-east-1.amazonaws.com

helm upgrade --install karpenter oci://021119463062.dkr.ecr.us-east-1.amazonaws.com/karpenter/snapshot/karpenter --version "0-de4dda146397e7020060a40ea06067ba648f9070" --namespace "kube-system" --create-namespace \
  --set "settings.clusterName=${CLUSTER_NAME}" \
  --set "settings.interruptionQueue=${CLUSTER_NAME}" \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

@bryantbiggs
Copy link
Member Author

/karpenter snapshot

@jonathan-innis
Copy link
Contributor

@bryantbiggs Is this still relevant since the code was updated with a fix that resolved the failure that we were running into?

@bryantbiggs
Copy link
Member Author

yes, its still relevant

there are a few parts:

  1. Because the endpoint that is used is using the ECR private endpoint format, there shouldn't need to be a difference between the endpoint used for testing a private clusters versus a non-private cluster. The only time you would need to make decision on which endpoint would be if you were pulling images from a public registry like Dockerhub, Public ECR, etc. So the chart URI should be the same regardless of public or private cluster
  2. It doesn't look like tests are currently executed against private clusters because not all of the required environment variables are being passed down to the scripts. For example this script needs the REGION environment variable but the workflow does not pass that environment variable. Likewise, this script requires both ACCOUNT_ID and REGION when its a private cluster, but those are not passed in the workflow
  3. There seems to be a mixup between ACCOUNT_ID and ECR_ACCOUNT_ID as well as REGION and ECR_REGION - I can't say which is right or wrong without knowing the intent, but based on the two issues above - I think the ECR_* versions should be used whenever it comes to an OCI artifact coming out of ECR

Copy link
Contributor

github-actions bot commented Oct 4, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants