Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild cuda118, cuda1120, and protobuf #195

Merged
merged 22 commits into from
Nov 4, 2023

Conversation

RaulPPelaez
Copy link
Contributor

@RaulPPelaez RaulPPelaez commented Oct 25, 2023

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Continuing #191 I added the necessary CUDA arch flags to build_pytorch.sh.

regro-cf-autotick-bot and others added 10 commits October 11, 2023 17:17
The transition to CUDA 12 SDK includes new packages for all CUDA libraries and
build tools. Notably, the cudatoolkit package no longer exists, and packages
should depend directly on the specific CUDA libraries (libcublas, libcusolver,
etc) as needed. For an in-depth overview of the changes and to report problems
[see this issue]( conda-forge/conda-forge.github.io#1963 ).
Please feel free to raise any issues encountered there. Thank you! 🙏
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@RaulPPelaez
Copy link
Contributor Author

@conda-forge-admin, please rerender

@RaulPPelaez
Copy link
Contributor Author

These ones were running, but they were timed out:

  [5220/5707] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/ForeachUnaryOp.cu.o
  [5221/5707] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/IGammaKernel.cu.o
  [5222/5707] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/Im2Col.cu.o
  [5223/5707] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/GridSampler.cu.o
##[error]The operation was canceled.
Finishing: Run docker build

They just take really really long it seems. Not sure what can be done in order to reduce this time.
Any advice here @jakirkham?

@hmaarrfk
Copy link
Contributor

I've always been building cuda locally. the fat binaries are just really slow to compile.

@hmaarrfk
Copy link
Contributor

I can try to start building this locally and see where it goes. I am not overly enthusiastic to build out 3 cuda versions (11.2, 11.8, and 12.0).....

@RaulPPelaez
Copy link
Contributor Author

I've always been building cuda locally. the fat binaries are just really slow to compile.

I did not know that was an option hehe. I know how to build locally, but I assume the results are not something I can somehow upload here myself?

I can try to start building this locally and see where it goes. I am not overly enthusiastic to build out 3 cuda versions (11.2, 11.8, and 12.0).....

I understand it is a hassle, but torch is a really big package and I believe it is worth to provide these options. It is true though that minor CUDA versions are supposed to be compatible amongst them, but even in that case I would assume CUDA 11.8 provides some kind of performance enhancement with respect 11.2.
At least CUDA 12 should be an option to reduce the chances of dependency hell.

@hmaarrfk
Copy link
Contributor

See https://github.com/conda-forge/cfep/blob/main/cfep-03.md

is 11.8 useful if we provide 12.0?

It just really is ALOT of package data (storage) and compilation time.

@hmaarrfk
Copy link
Contributor

@RaulPPelaez were you able to get Cuda 12.0 to also compile? if so do you want to add it to this merge request and I can trigger a rebuild.

@jakirkham
Copy link
Member

the biggest barrier is me starting my script.

Yeah we need to figure out a way to automate builds

@RaulPPelaez
Copy link
Contributor Author

@RaulPPelaez were you able to get Cuda 12.0 to also compile? if so do you want to add it to this merge request and I can trigger a rebuild.

Those also timed out, the PR for CUDA 12 #193 should also be compatible with CUDA 11.8.
Not sure how the pipeline works in this case. I merged #193 here, not sure if that is enough.

@RaulPPelaez
Copy link
Contributor Author

@conda-forge-admin, please rerender

@jakirkham
Copy link
Member

Think what Mark is asking is if you have tried building locally (and if so whether those builds completed)

Guessing most build errors occur early (like those seen previously: #193 (comment) )

Though there are some errors that might not show up until the linking stage

Another question would be whether we see issues during testing (not sure what testing is done for the CUDA packages)

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 1, 2023

I have the following done canaries completed.

conda-forge-build-done-linux_64_blas_implgenericc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11numpy1.22python3.10.____cpython
conda-forge-build-done-linux_64_blas_implgenericc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11numpy1.22python3.8.____cpython
conda-forge-build-done-linux_64_blas_implgenericc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11numpy1.22python3.9.____cpython
conda-forge-build-done-linux_64_blas_implgenericc_compiler_version11cuda_compilernvcccuda_compiler_version11.8cxx_compiler_version11numpy1.23python3.11.____cpython
conda-forge-build-done-linux_64_blas_implmklc_compiler_version10cuda_compilernvcccuda_compiler_version11.2cxx_compiler_version10numpy1.22python3.10.____cpython
conda-forge-build-done-linux_64_blas_implmklc_compiler_version10cuda_compilernvcccuda_compiler_version11.2cxx_compiler_version10numpy1.23python3.11.____cpython
conda-forge-build-done-linux_64_blas_implmklc_compiler_version12cuda_compilercuda-nvcccuda_compiler_version12.0cxx_compiler_version12numpy1.22python3.8.____cpython

That said, i'm wondering if for the sake of conda-forge's storage quote, if i shoul dcancel (now that I have one build of 11.2, 11.8 and 12.0 complete) cancel these builds and work to integrate #197

@jakirkham
Copy link
Member

@h-vetinari do you have a sense of how #197 should factor in here?

@h-vetinari
Copy link
Member

@h-vetinari do you have a sense of how #197 should factor in here?

If it can be integrated, all the better! If not, it's not the end of the world, but should then go into the next build. With pytorch, the recent protobuf bumps seemed to go pretty smoothly though (from what I can tell, esp. compared to TF).

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 3, 2023

they work. builds take a full day. restarting them now with protobuf migration

@hmaarrfk hmaarrfk changed the title Rebuild cuda118 0 1 h58afc5 Rebuild cuda118, cuda1120, and protobuf Nov 3, 2023
@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 4, 2023

log files

log_files.zip

@hmaarrfk hmaarrfk merged commit 8659210 into conda-forge:main Nov 4, 2023
36 of 49 checks passed
@Tobias-Fischer
Copy link
Contributor

Hi @hmaarrfk - have these builds actually been uploaded to the conda-forge channel? I might be blind seeing only the cpu builds.

@Tobias-Fischer
Copy link
Contributor

Background: I’m trying to figure out why conda-forge/pytorch_scatter-feedstock#55 can’t find solve the environment

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 6, 2023

lol, maybe not.... I might just upload the next ones if that is ok?

@jakirkham
Copy link
Member

Meaning with the changes from PR ( #199 )?

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 6, 2023

yes.

I mean, i'm really not excited about them either, but they are complete....

@jakirkham
Copy link
Member

Yeah that seems better from a user standpoint (recent version + Python 3.12)

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Nov 6, 2023

Ohhhh. i found the builds:

https://anaconda.org/mark.harfouche/pytorch/files
https://anaconda.org/mark.harfouche/pytorch-gpu/files

Just copied them over.

@jakirkham
Copy link
Member

Thanks Mark! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants