Reducing the cuDNN Download and Installation sizes - cuDNN JIT #751
jhalabi-nv
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
The cuDNN team is working on a new cuDNN package to reduce our download and installation size.
There are two key pieces:
1
is already available, but2
is still a WIP. We’re posting here because we understand that the download and installation size of cuDNN is a pain point for this project, and we want to keep you in the loop about our roadmap. Also, you can try out1
and emulate2
by removing the unnecessary sub-libraries, this can serve as a proof of concept.More specifically about
1
:In cuDNN 9.2, we released the GRAPH_JIT_ONLY configuration mode. The cuDNN flash attention kernels are runtime compiled, and thus included in this
GRAPH_JIT_ONLY
configuration. This means that you could enableGRAPH_JIT_ONLY
and remove the unnecessary cuDNN sub-libraries, which would reduce the cuDNN installation size from 835MB to 14MB.The cuDNN Library Configuration section illustrates how this can be achieved. You’ll need to set an environment variable and remove any non-required cuDNN sub-libraries.
Then once we have
2
available which should reduce the download size from 435MB to 6MB on Ubuntu, we'll create a PR to update the llm.c documentation. That's coming soon.In the meantime, we'd be excited to see if anyone can repro our setup with
GRAPH_JIT_ONLY
, , demonstrating the significantly reduced cuDNN footprint.Beta Was this translation helpful? Give feedback.
All reactions