How to train? #195

AdvancedHe · 2023-12-26T02:13:08Z

Hello, when I run stage1.py, it appears:
[GpuDevice(id=0, process_index=0), GpuDevice(id=1, process_index=0), GpuDevice(id=2, process_index=0), GpuDevice(id=3, process_index=0), GpuDevice(id=4, process_index=0), GpuDevice(id=5, process_index=0), GpuDevice(id=6, process_index=0), GpuDevice(id=7, process_index=0)]
2023-12-26 10:10:17.684365: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:73] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
2023-12-26 10:10:17.684429: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:74] Searched for CUDA in the following directories:
2023-12-26 10:10:17.684452: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] ./cuda_sdk_lib
2023-12-26 10:10:17.684468: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] /usr/local/cuda-11.1
2023-12-26 10:10:17.684483: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:77] .
2023-12-26 10:10:17.684501: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2023-12-26 10:10:17.704204: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-12-26 10:10:17.704247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas --version
2023-12-26 10:10:17.704902: E external/org_tensorflow/tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
2023-12-26 10:10:17.704980: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:472] ptxas returned an error during compilation of ptx to sass: 'Internal: Failed to launch ptxas' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
Aborted (core dumped)

How to solve this problem?

AdvancedHe · 2023-12-26T02:13:40Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train? #195

How to train? #195

AdvancedHe commented Dec 26, 2023

AdvancedHe commented Dec 26, 2023

How to train? #195

How to train? #195

Comments

AdvancedHe commented Dec 26, 2023

AdvancedHe commented Dec 26, 2023