You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When i use DS_BUILD_AIO=1 CFLAGS="-I$CONDA_PREFIX/include/ -I/usr/include/" LDFLAGS="-L$CONDA_PREFIX/lib/ -L/usr/lib/x86_64-linux-gnu/" pip install -e . to install async_io op, i get fake successful msg.
it indeed displays Successfully installed deepspeed , but i use ds_report and only get .
And i use print stderr msg and i find that
To figure out how to result in this case's coming. I read the source code such as "setup.py"...
and i find problem in "setup.py line 182" for op_name, builder in ALL_OPS.items(): op_compatible = builder.is_compatible()
When op_name is "async_io", builder.is_compatible() returns false. And i open the "DeepSpeed/deepspeed/ops/op_builder/async_io.py" and find "line 93" def is_compatible(self, verbose=False) . It's result depends on "line 99" aio_compatible = self.has_function('io_submit', ('aio', )) .
Go on to find def has_function() in "DeepSpeed/deepspeed/ops/op_builder/builder.py line308" , and i confirm it raise linkerror in line362 compiler.link_executable(objs, os.path.join(tempdir, 'a.out'), extra_preargs=self.strip_empty_entries(ldflags), libraries=libraries, library_dirs=library_dirs) by "distutils.unixccompiler.UnixCCompiler"
I don't know why it happened and to address this issue i had to change the "class AsyncIOBuilder"("DeepSpeed/deepspeed/ops/op_builder/async_io.py") like the following picture .
And i install it again and get the correct result.
I hope u can figure out why it caused link error. And i don't know my change whether to cause aio disabled when i use offload.
The text was updated successfully, but these errors were encountered:
LZhengguo
changed the title
{{ env.GITHUB_WORKFLOW }} Cannot install async_io op even if it's compatible flag is displaying OK by ds_report cmd!
Cannot install async_io op even if it's compatible flag is displaying OK by ds_report cmd!
Dec 31, 2024
[2025-01-10 10:56:34,249] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [YES] ...... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
[WARNING] NVIDIA Inference is only supported on Ampere and newer architectures
[WARNING] FP Quantizer is using an untested triton version (3.1.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.5
[WARNING] using untested triton version (3.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
Perhaps the async_io path isn't being found? Its odd that there are no errors in the build log, could you share that output from the DS_BUILD_AIO pip install . command?
When i use
DS_BUILD_AIO=1 CFLAGS="-I$CONDA_PREFIX/include/ -I/usr/include/" LDFLAGS="-L$CONDA_PREFIX/lib/ -L/usr/lib/x86_64-linux-gnu/" pip install -e .
to install async_io op, i get fake successful msg.it indeed displays
Successfully installed deepspeed
, but i useds_report
and only get .And i use print stderr msg and i find that
To figure out how to result in this case's coming. I read the source code such as "setup.py"...
and i find problem in "setup.py line 182"
for op_name, builder in ALL_OPS.items(): op_compatible = builder.is_compatible()
When op_name is "async_io", builder.is_compatible() returns false. And i open the "DeepSpeed/deepspeed/ops/op_builder/async_io.py" and find "line 93"
def is_compatible(self, verbose=False)
. It's result depends on "line 99"aio_compatible = self.has_function('io_submit', ('aio', ))
.Go on to find
def has_function()
in "DeepSpeed/deepspeed/ops/op_builder/builder.py line308" , and i confirm it raise linkerror in line362compiler.link_executable(objs, os.path.join(tempdir, 'a.out'), extra_preargs=self.strip_empty_entries(ldflags), libraries=libraries, library_dirs=library_dirs)
by "distutils.unixccompiler.UnixCCompiler"I don't know why it happened and to address this issue i had to change the "class AsyncIOBuilder"("DeepSpeed/deepspeed/ops/op_builder/async_io.py") like the following picture .
And i install it again and get the correct result.
I hope u can figure out why it caused link error. And i don't know my change whether to cause aio disabled when i use offload.
The text was updated successfully, but these errors were encountered: