Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better compiler caching strategy on Windows #462

Open
zzjjbb opened this issue Sep 18, 2024 · 2 comments
Open

Better compiler caching strategy on Windows #462

zzjjbb opened this issue Sep 18, 2024 · 2 comments

Comments

@zzjjbb
Copy link
Contributor

zzjjbb commented Sep 18, 2024

Is your feature request related to a problem? Please describe.
This problem is annoying me for years: I find pycuda runs extremely slow on Windows but not on Linux. My program contains ~20 ElementwiseKernels and ReductionKernels. I find that the SourceModule is used to compile the code, and it will save the cubin files to the cache_dir. It works well on any Linux machine as I tested, which only have ~1s overhead to load the functions later. However, running my code on Windows for the first time costs ~2min, and later it still costs ~1min. This is because it always need to preprocess the code since the source code always contains #include <pycuda-complex.hpp>:

if "#include" in source:
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))

As I tested, on any Windows computer, running nvcc --preprocess "empty_file.cu" --compiler-options -EP takes several seconds. In other words, the condition of whether using cache takes a very long time to compute.

Describe the solution you'd like
I tried to monkey patch this to remove the preprocess call above, and it works well. I'd like to find a better way to do it. The easiest way I can think of is adding an option to force ignoring the #include check (though it should not be used by default, since the user must know the potential risk)

Describe alternatives you've considered
Is there any nvcc options to speed-up the preprocessing? I don't know.

Additional context
The link below is one of the examples I worked on, but I guess any simple functionality of the GPUArray relies on the SourceModule is impacted by this.
https://github.com/bu-cisl/SSNP-IDT/blob/master/examples/forward_model.py

@inducer
Copy link
Owner

inducer commented Sep 18, 2024

Thanks for the report, I had no idea. Continuing along those lines, I'm not sure I have a good idea for how to approach this. We could introduce a flag so you don't have to monkeypatch, but that sacrifices correctness to an extent.

@zzjjbb
Copy link
Contributor Author

zzjjbb commented Sep 18, 2024

#463
Does this make sense: we check the include_dirs, and ignore #include when that's empty? This should cover most of the simple cases, though it may still be incorrect if the user upgrades the CUDA/pycuda version, and keeps the cache (it's possible to append these version numbers in the cache folder/file name to invalidate this case). Also, we can do this only for the poor Windows users to minimize the potential problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants