Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransformerEngine install fails with no clear cause #1249

Open
sytelus opened this issue Oct 14, 2024 · 1 comment
Open

TransformerEngine install fails with no clear cause #1249

sytelus opened this issue Oct 14, 2024 · 1 comment
Labels
bug Something isn't working build Build system

Comments

@sytelus
Copy link

sytelus commented Oct 14, 2024

Below is happening in WSL, Ubuntu 22.04, CUDA 12.4.0 and cuDNN 9.5.0 are install and found correctly, I think. There doesn't seem to any clear cause of failure, except for below warning:

      /home/shitals/miniconda3/envs/nemo/lib/python3.11/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 12.4
        warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')

My gcc version is 11.4.

The full log is in this attachment: te_fail_log.txt

@timmoon10 timmoon10 added bug Something isn't working build Build system labels Oct 14, 2024
@timmoon10
Copy link
Collaborator

It seems that random build jobs are being killed. Could it be that the parallel build process is overwhelming your system resources? Try setting MAX_JOBS=1 in your environment and rebuilding.

Guidance for build problems: #355 (comment)
Guidance for disabling parallel build: #1077 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Build system
Projects
None yet
Development

No branches or pull requests

2 participants