cuda112 & add pytorch to ml-notebook & add cupy #345

ngam · 2022-06-13T21:51:34Z

fixes #320
fixes #322

@rabernat @scottyhq this ~~minimal~~ PR takes tensorflow to cuda112, adds cupy, and adds pytorch in ml-notebook

Ideally, we should add jaxlib==*=*cuda* (~~and maybe even pytorch==*=*cuda* if you want to just have one "ML" notebook~~ added a proposal!) but that will have to wait a little longer

Helpful notes for review:

pytorch-gpu is just a meta package and so don't worry about its disappearance.
openblas need not be present in the case of pytorch (the conda-forge pytorch only uses openblas --- in a bad way actually, long story --- for the case of m1 macs)

github-actions · 2022-06-13T21:51:49Z

👈 Try on Mybinder.org!
👈 Try on Pangeo GCP Binder!
👈 Try on Pangeo AWS Binder!

pangeo-bot · 2022-06-13T21:51:50Z

/condalock
Automatically locking new conda environment, building, and testing images...

ngam · 2022-06-13T21:57:20Z

/condalock

ngam · 2022-06-13T22:03:56Z

/condalock

ngam · 2022-06-13T22:13:55Z

/condalock

ngam · 2022-06-13T22:20:40Z

/condalock

ngam · 2022-06-13T22:31:27Z

/condalock

ngam · 2022-06-13T22:35:24Z

/condalock

ngam · 2022-06-13T22:38:56Z

/condalock

ngam · 2022-06-13T23:10:23Z

/condalock

ngam · 2022-06-13T23:41:23Z

/condalock

ngam · 2022-06-13T23:45:23Z

/condalock

rabernat · 2022-06-13T23:48:23Z

Thank you so much for working on this! 🙏

Can you help me understand--if we are adding pytorch to ml-notebook, is there still any point to maintaining the separate pytorch-notebook?

ngam · 2022-06-14T00:10:41Z

if we are adding pytorch to ml-notebook, is there still any point to maintaining the separate pytorch-notebook?

No, there is no point. I simply didn't want to propose that big change here because I wasn't sure if it would be a welcome idea (but we can definitely do it!)

More generally, I don't see a need to worry about separating pytorch and tensorflow going forward. We will likely face some issues from time to time, but the tensorflow and pytorch feedstocks are currently maintained by the same set of volunteers, so the migrations happen around the same time. In fact, the trickiest package here is torchgeo for now (because it has dependencies on things like gdal).

In summary, if we can add a recent enough tensorflow on top of pangeo-notebook with all these geo packages, we should be able to add pytorch as well. As far as I am concerned, I would treat jax and tensorflow very similarly. I don't have that deep of an experience with the rapids-ai collection, but I can look into that next. My impression is that they're on the easier side.

I am still investigating what's blocking us from getting to tensorflow 2.8.1, but I think it is some deep conflict caused by icu and/or absl stuff. Once we get tensorflow 2.8.1, we will have higher performance than the NGC containers (in my testing) and we get the cuda version of jaxlib as well (they have similar migrations).

ngam · 2022-06-14T00:11:55Z

Could @weiji14 weigh on the pytorch discussion? Do you have any issues with merging the two notebooks?

ngam · 2022-06-14T00:55:00Z

Update: I would abandon this PR if people are happy to wait a few days for #346 --- that will bring everything to the very cutting edge. We are just waiting on rasterio now.

weiji14 · 2022-06-14T01:05:33Z

Could @weiji14 weigh on the pytorch discussion? Do you have any issues with merging the two notebooks?

Nope, as long as pytorch is available somewhere! I do worry though about the larger docker file size (especially if the RAPIDS AI stuff is meant to be added later), but other than that, ok with simplifying maintenance down to just one ml-notebook image.

In fact, the trickiest package here is torchgeo for now (because it has dependencies on things like gdal).

Yeah, for better or worse, the dev version of torchgeo is also pinning upper versions of lots of packages (see https://github.com/microsoft/torchgeo/pull/557/files) which might make things more difficult going forward 🙂

ngam · 2022-06-14T01:14:19Z

Image sizes:

pangeo/ml-notebook        PR          5b99cf0c4873   Less than a second ago   10.2GB
pangeo/pytorch-notebook   PR          71b06612a1d4   1 second ago             8.84GB

Will likely grow by a little bit in #346 (up to 10.5GB, I would guess)

update: build from other PR:

pangeo/ml-notebook        PR          dd12e3f61112   Less than a second ago   10.4GB

ngam · 2022-06-16T14:15:18Z

I am going to close this PR now. All issues are resolved with the merge in rasterio feedstock. A seamless update should be possible without tinkering (in a few hours). I am happy to help, but maybe having the bots do it makes more sense for continuity?

Let me know if you face any issues. I recommend:

taking off the pins on tensorflow, jaxlib, and pytorch, except for ==*=*cuda112*
adding tensorflow, jaxlib, jax, pytorch, torchvision, torchgeo, etc. all under ml-notebook (with ==*=*cuda112* where appropriate)

Happy to produce another PR if desired.

try updates

3ccc87f

another try

3e06c6a

no constraints

97332f6

pangeo-bot and others added 2 commits June 13, 2022 22:08

[condalock-command] autogenerated conda-lock files

5d2d34a

try cuda112

26c97bf

pangeo-bot and others added 2 commits June 13, 2022 22:18

[condalock-command] autogenerated conda-lock files

8055216

get rid of ngam channel

9d6524a

ngam changed the title ~~try updates~~ takes tensorflow to cuda112 Jun 13, 2022

pangeo-bot and others added 2 commits June 13, 2022 22:25

[condalock-command] autogenerated conda-lock files

83a8ba4

jaxlib cuda

7d8ee4b

no jaxlib cuda yet

7f64a43

ngam marked this pull request as ready for review June 13, 2022 22:37

try pytorch and torchgeo in ML

b831085

[condalock-command] autogenerated conda-lock files

523a8e4

ngam changed the title ~~takes tensorflow to cuda112~~ takes tensorflow to cuda112 & combine pytorch+tf in ML notebook Jun 13, 2022

ngam mentioned this pull request Jun 13, 2022

Use cudatoolkit=11 in both tensorflow and pytorch images #320

Closed

ngam changed the title ~~takes tensorflow to cuda112 & combine pytorch+tf in ML notebook~~ tensorflow to cuda112 & combine pytorch+tf in ML notebook Jun 13, 2022

bring jax back

57a0007

pangeo-bot and others added 2 commits June 13, 2022 23:14

[condalock-command] autogenerated conda-lock files

615c2d2

fix jax and better tests

8c7940c

ngam mentioned this pull request Jun 13, 2022

Add cupy to ml notebooks #322

Open

ngam changed the title ~~tensorflow to cuda112 & combine pytorch+tf in ML notebook~~ cuda112 & add pytorch to ml-notebook & add cupy Jun 13, 2022

add cupy

4ce2aa2

[condalock-command] autogenerated conda-lock files

1281952

ngam closed this Jun 16, 2022

ngam mentioned this pull request Jul 27, 2022

bump to condalock 1.1 and mamba 0.25 #361

Merged

weiji14 mentioned this pull request Jun 4, 2024

cuda-nvcc missing again #438

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda112 & add pytorch to ml-notebook & add cupy #345

cuda112 & add pytorch to ml-notebook & add cupy #345

ngam commented Jun 13, 2022 •

edited

Loading

github-actions bot commented Jun 13, 2022

pangeo-bot commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

rabernat commented Jun 13, 2022

ngam commented Jun 14, 2022 •

edited

Loading

ngam commented Jun 14, 2022

ngam commented Jun 14, 2022

weiji14 commented Jun 14, 2022

ngam commented Jun 14, 2022 •

edited

Loading

ngam commented Jun 16, 2022

cuda112 & add pytorch to ml-notebook & add cupy #345

cuda112 & add pytorch to ml-notebook & add cupy #345

Conversation

ngam commented Jun 13, 2022 • edited Loading

github-actions bot commented Jun 13, 2022

pangeo-bot commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

ngam commented Jun 13, 2022

rabernat commented Jun 13, 2022

ngam commented Jun 14, 2022 • edited Loading

ngam commented Jun 14, 2022

ngam commented Jun 14, 2022

weiji14 commented Jun 14, 2022

ngam commented Jun 14, 2022 • edited Loading

ngam commented Jun 16, 2022

ngam commented Jun 13, 2022 •

edited

Loading

ngam commented Jun 14, 2022 •

edited

Loading

ngam commented Jun 14, 2022 •

edited

Loading