Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address issue with AWS instance type schema #2787

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

viniciusdc
Copy link
Contributor

@viniciusdc viniciusdc commented Oct 22, 2024

Reference Issues or PRs

closes #2782

What does this implement/fix?

Put a x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features not to work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

  • Did you test the pull request locally?
  • Did you add new tests?

How to test this PR?

As the previous error happened only during terraform plan you will need to mock the deployment of nebari, I suggest mofiying the source code to only plan the code instead of deploying (this should work for the second stage, since there are no pre-variable depency AFAIK) -- stage 1 can be skipped by using the state local instead of remote.

  • Init an AWS deployment, e.g, nebari init aws --project testing-gpu
  • add a GPU node block in your config, I have an example in the linked issue;
  • with the changes mentioned above, trigger nebari deploy so that it only plans the stage,
  • You need to validate that the actual instance type is respected when the GPU block is passed, i.e, if the node group does not have gpu: enabled you should see AL2_X86_64 if gpu is passed, then its variant should show up ``AL2_X86_64_GPU`
  • As a final validation, make sure to check the node_template block and test if by passing an AMI the intance_type also changes to CUSTOM.

Any other comments?

I made a deployment on AWS from this branch and GPUs were working as expected

@viniciusdc viniciusdc added the needs: review 👀 This PR is complete and ready for reviewing label Oct 23, 2024
@viniciusdc viniciusdc added this to the 2024.9.2 milestone Oct 24, 2024
@dcmcand
Copy link
Contributor

dcmcand commented Oct 24, 2024

@viniciusdc can you fill out the how to test section here?

@viniciusdc
Copy link
Contributor Author

viniciusdc commented Oct 24, 2024

done. Thanks for catching it!

@viniciusdc
Copy link
Contributor Author

@dcmcand This will address the issue you saw. Could you have another look? (Let me know if you see any other pydantic errors; they might show up again.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs: review 👀 This PR is complete and ready for reviewing
Projects
Status: New 🚦
Development

Successfully merging this pull request may close these issues.

[BUG] - AWS instance type not properly respected when gpu are enabled
2 participants