Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train? #195

Open
AdvancedHe opened this issue Dec 26, 2023 · 1 comment
Open

How to train? #195

AdvancedHe opened this issue Dec 26, 2023 · 1 comment

Comments

@AdvancedHe
Copy link

Hello, when I run stage1.py, it appears:
[GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0), GpuDevice(id=2, process_index=0), GpuDevice(id=3, process_index=0), GpuDevice(id=4, process_index=0), GpuDevice(id=5, process_index=0), GpuDevice(id=6, process_index=0), GpuDevice(id=7, process_index=0)]
2023-12-26 10:10:17.684365: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2023-12-26 10:10:17.684429: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2023-12-26 10:10:17.684452: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] ./cuda_sdk_lib
2023-12-26 10:10:17.684468: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] /usr/local/cuda-11.1
2023-12-26 10:10:17.684483: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] .
2023-12-26 10:10:17.684501: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-12-26 10:10:17.704204: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-12-26 10:10:17.704247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas --version
2023-12-26 10:10:17.704902: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-12-26 10:10:17.704980: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:472] ptxas returned an error during compilation of ptx to sass: 'Internal: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)

How to solve this problem?

@AdvancedHe
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant