Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in the path to a library with static modules causes an assertion failure while constructing module #110

Open
MCJack123 opened this issue Dec 3, 2024 · 1 comment

Comments

@MCJack123
Copy link

I'm trying out some generative AI models on my 7900 GRE, and it's been generally okay, but I had some issues with the bitsandbytes module inside a venv. Compiling worked, but attempting to load the module in Python failed with this error:

python: /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1152: hip::FatBinaryInfo** hip::StatCO::addFatBinary(const void*, bool): Assertion `err == hipSuccess' failed.

Running in GDB gave me a backtrace like this (it's not the exact one, this was from while debugging):

#0  hip::PlatformState::GetUniqueFileHandle (this=0x5555559690a0, file_path=...)
    at /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_platform.cpp:955
#1  hip::FatBinaryInfo::ExtractFatBinaryUsingCOMGR (this=0x555564f3a660, devices=...)
    at /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_fatbin.cpp:148
#2  0x00007fff7fee3ce2 in hip::StatCO::digestFatBinary (this=0x5555559691a8, data=<optimized out>, 
    programs=@0x555564f383a0: 0x555564f3a660)
    at /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1142
#3  0x00007fff8005079d in hip::StatCO::addFatBinary (this=0x5555559691a8, data=<optimized out>, 
    initialized=<optimized out>)
    at /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_code_object.cpp:1151
#4  hip::PlatformState::addFatBinary (this=0x5555559690a0, data=<optimized out>)
    at /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_platform.cpp:879
#5  hip::__hipRegisterFatBinary (data=<optimized out>)
    at /usr/src/debug/hip-runtime/clr-rocm-6.2.4/hipamd/src/hip_platform.cpp:77
#6  0x00007ffc15a320e7 in __hip_module_ctor ()
   from /run/media/jack/Class 4 Storage/bitsandbytes/bitsandbytes/libbitsandbytes_rocm62.so
#7  0x00007ffff7fcb5b7 in call_init (l=<optimized out>, argc=1, argv=0x7fffffffb968, env=0x55555ab9e800)
    at dl-init.c:74
...

Digging through the crash in GDB was a pain due to the code being optimized out, but after stepping through the init code line by line, I noticed a crucial file path was cut off at the first space. This part of code appears to be what causes it:

clr/rocclr/os/os_posix.cpp

Lines 820 to 824 in f0063ba

tokens >> permissions
>> std::hex >> offset >> std::dec
>> device
>> inode
>> uri_file_path;

operator>> only extracts to the first space, which is used for extracting the other parts of the map file. However, it's not appropriate for extracting the path to the file, which may contain spaces (and did in my case). This should be replaced with I assume a full std::getline to read until the end of the line:

tokens >> permissions
             >> std::hex >> offset >> std::dec
             >> device
             >> inode;
std::getline(tokens, uri_file_path);

I'd make a PR, but I'm not entirely confident in this resolution; and there may be other places where the same issue is present.

@MCJack123 MCJack123 changed the title Spaces in the path to a library with static modules causes an assertion failure Spaces in the path to a library with static modules causes an assertion failure while constructing module Dec 3, 2024
@tcgu-amd
Copy link

tcgu-amd commented Dec 3, 2024

Hi @MCJack123, thanks for reaching out! What you are describing is a known issue and we are working on fixing it. However, it might take a while for the fix to be included in an official release. In the meantime, please try to avoid using spaces in the path the avoid the error. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants