-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add tensorflow and pytorch CUDA version tests for GPU image build #452
Changes from all commits
2f93134
bfe92bf
3d8056b
cecd28a
b3128c1
ff89c2d
7263d42
f02ac49
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
ARG SAGEMAKER_DISTRIBUTION_IMAGE | ||
FROM $SAGEMAKER_DISTRIBUTION_IMAGE | ||
|
||
ARG MAMBA_DOCKERFILE_ACTIVATE=1 | ||
|
||
# Execute cuda valudaiton script: | ||
# 1. Check if TensorFlow is installed with CUDA support for GPU image | ||
# 2. Check if Pytorch is installed with CUDA support for GPU image | ||
COPY --chown=$MAMBA_USER:$MAMBA_USER scripts/cuda_validation.py . | ||
RUN chmod +x cuda_validation.py | ||
RUN python3 cuda_validation.py |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Verify Tensorflow CUDA | ||
import tensorflow as tf | ||
|
||
cuda_available = tf.test.is_built_with_cuda() | ||
if not cuda_available: | ||
raise Exception("TensorFlow is installed without CUDA support for GPU image build.") | ||
print("TensorFlow is built with CUDA support.") | ||
|
||
|
||
# Verify Pytorch is installed with CUDA version | ||
import subprocess | ||
|
||
# Run the micromamba list command and capture the output | ||
result = subprocess.run(["micromamba", "list"], stdout=subprocess.PIPE, text=True) | ||
|
||
# Split the output into lines | ||
package_lines = result.stdout.strip().split("\n") | ||
|
||
# Find the PyTorch entry | ||
pytorch_entry = None | ||
for line in package_lines: | ||
dependency_info = line.strip().split() | ||
if dependency_info and dependency_info[0] == "pytorch": | ||
pytorch_entry = line.split() | ||
break | ||
|
||
# If PyTorch is installed, print its information | ||
if pytorch_entry: | ||
package_name = pytorch_entry[0] | ||
package_version = pytorch_entry[1] | ||
package_build = pytorch_entry[2] | ||
print(f"PyTorch: {package_name} {package_version} {package_build}") | ||
# Raise exception if CUDA is not detected | ||
if "cuda" not in package_build: | ||
raise Exception("Pytorch is installed without CUDA support for GPU image build.") | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe also do a print here "Pytorch is built with CUDA support" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good point, updated exception message |
||
# Verify Pytorch has CUDA working properly | ||
# Because this function only works on a GPU instance, so it may fail in local test | ||
# To test manually on a GPU instance, run: "docker run --gpus all <image id>" | ||
import torch | ||
|
||
if not torch.cuda.is_available(): | ||
raise Exception( | ||
"Pytorch is installed with CUDA support but not working in current environment. \ | ||
Make sure to execute this test case in GPU environment if you are not" | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
ARG SAGEMAKER_DISTRIBUTION_IMAGE | ||
FROM $SAGEMAKER_DISTRIBUTION_IMAGE | ||
|
||
ARG MAMBA_DOCKERFILE_ACTIVATE=1 | ||
|
||
# Execute cuda valudaiton script: | ||
# 1. Check if TensorFlow is installed with CUDA support for GPU image | ||
# 2. Check if Pytorch is installed with CUDA support for GPU image | ||
COPY --chown=$MAMBA_USER:$MAMBA_USER scripts/cuda_validation.py . | ||
RUN chmod +x cuda_validation.py | ||
RUN python3 cuda_validation.py |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Verify Tensorflow CUDA | ||
import tensorflow as tf | ||
|
||
cuda_available = tf.test.is_built_with_cuda() | ||
if not cuda_available: | ||
raise Exception("TensorFlow is installed without CUDA support for GPU image build.") | ||
print("TensorFlow is built with CUDA support.") | ||
|
||
|
||
# Verify Pytorch is installed with CUDA version | ||
import subprocess | ||
|
||
# Run the micromamba list command and capture the output | ||
result = subprocess.run(["micromamba", "list"], stdout=subprocess.PIPE, text=True) | ||
|
||
# Split the output into lines | ||
package_lines = result.stdout.strip().split("\n") | ||
|
||
# Find the PyTorch entry | ||
pytorch_entry = None | ||
for line in package_lines: | ||
dependency_info = line.strip().split() | ||
if dependency_info and dependency_info[0] == "pytorch": | ||
pytorch_entry = line.split() | ||
break | ||
|
||
# If PyTorch is installed, print its information | ||
if pytorch_entry: | ||
package_name = pytorch_entry[0] | ||
package_version = pytorch_entry[1] | ||
package_build = pytorch_entry[2] | ||
print(f"PyTorch: {package_name} {package_version} {package_build}") | ||
# Raise exception if CUDA is not detected | ||
if "cuda" not in package_build: | ||
raise Exception("Pytorch is installed without CUDA support for GPU image build.") | ||
|
||
# Verify Pytorch has CUDA working properly | ||
# Because this function only works on a GPU instance, so it may fail in local test | ||
# To test manually on a GPU instance, run: "docker run --gpus all <image id>" | ||
import torch | ||
|
||
if not torch.cuda.is_available(): | ||
raise Exception( | ||
"Pytorch is installed with CUDA support but not working in current environment. \ | ||
Make sure to execute this test case in GPU environment if you are not" | ||
) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
ARG SAGEMAKER_DISTRIBUTION_IMAGE | ||
FROM $SAGEMAKER_DISTRIBUTION_IMAGE | ||
|
||
ARG MAMBA_DOCKERFILE_ACTIVATE=1 | ||
|
||
# Execute cuda valudaiton script: | ||
# 1. Check if TensorFlow is installed with CUDA support for GPU image | ||
# 2. Check if Pytorch is installed with CUDA support for GPU image | ||
COPY --chown=$MAMBA_USER:$MAMBA_USER scripts/cuda_validation.py . | ||
RUN chmod +x cuda_validation.py | ||
RUN python3 cuda_validation.py |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Verify Tensorflow CUDA | ||
import tensorflow as tf | ||
|
||
cuda_available = tf.test.is_built_with_cuda() | ||
if not cuda_available: | ||
raise Exception("TensorFlow is installed without CUDA support for GPU image build.") | ||
print("TensorFlow is built with CUDA support.") | ||
|
||
|
||
# Verify Pytorch is installed with CUDA version | ||
import subprocess | ||
|
||
# Run the micromamba list command and capture the output | ||
result = subprocess.run(["micromamba", "list"], stdout=subprocess.PIPE, text=True) | ||
|
||
# Split the output into lines | ||
package_lines = result.stdout.strip().split("\n") | ||
|
||
# Find the PyTorch entry | ||
pytorch_entry = None | ||
for line in package_lines: | ||
dependency_info = line.strip().split() | ||
if dependency_info and dependency_info[0] == "pytorch": | ||
pytorch_entry = line.split() | ||
break | ||
|
||
# If PyTorch is installed, print its information | ||
if pytorch_entry: | ||
package_name = pytorch_entry[0] | ||
package_version = pytorch_entry[1] | ||
package_build = pytorch_entry[2] | ||
print(f"PyTorch: {package_name} {package_version} {package_build}") | ||
# Raise exception if CUDA is not detected | ||
if "cuda" not in package_build: | ||
raise Exception("Pytorch is installed without CUDA support for GPU image build.") | ||
|
||
# Verify Pytorch has CUDA working properly | ||
# Because this function only works on a GPU instance, so it may fail in local test | ||
# To test manually on a GPU instance, run: "docker run --gpus all <image id>" | ||
import torch | ||
|
||
if not torch.cuda.is_available(): | ||
raise Exception( | ||
"Pytorch is installed with CUDA support but not working in current environment. \ | ||
Make sure to execute this test case in GPU environment if you are not" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this ARG for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik, these 3 lines are used to activate test environment, we have it in other tests too: https://github.com/aws/sagemaker-distribution/blob/main/test/test_artifacts/v1/autogluon.test.Dockerfile#L1-L4