Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTR build failure in CI #1756

Closed
Tracked by #1032
elliottslaughter opened this issue Sep 10, 2024 · 10 comments
Closed
Tracked by #1032

HTR build failure in CI #1756

elliottslaughter opened this issue Sep 10, 2024 · 10 comments
Assignees

Comments

@elliottslaughter
Copy link
Contributor

Reported by @mariodirenzo:

All my CIs based on the CMake build of Legion have been failing with errors like

/home/gitlab-runner/legion-debug-cmake/runtime/realm/gasnet1/gasnetmsg.cc:1665:
 undefined reference to `gasneti_thunk_tm'

The error appeared after https://gitlab.com/StanfordLegion/legion/-/merge_requests/1427 was merged.

Mario, can you provide any additional details on the specific CMake command line, and the system (OS, CUDA, CMake, etc. versions)?

@elliottslaughter
Copy link
Contributor Author

@mariodirenzo Please retest this on latest master, we've merged various fixes to the CMake build that I think are likely to resolve this.

@elliottslaughter
Copy link
Contributor Author

I've been told that CI is passing now, so closing. (If this is not true, feel free to reopen.)

@mariodirenzo
Copy link

mariodirenzo commented Sep 26, 2024

Unfortunately, I haven't been able to get a build of Legion using CMake after https://gitlab.com/StanfordLegion/legion/-/merge_requests/1427 was merged.
If the runtime is built properly, I get this error

[0 - 7fd0f5de7c40]    0.000000 {5}{gex}: Failed to load gex wrapper at librealm_gex_wrapper.so

when executing the code.
Other configurations fail when compiling either my code or the runtime itself.
I do not have any time to debug this until mid-October. I'll get in touch when I have further info

@seemamirch
Copy link
Contributor

@mariodirenzo - if you can provides details to reproduce I can debug further. I don't see any cmake builds in the CI (the link I have been using for it - https://lc.llnl.gov/gitlab/stanford-psaap/ci/-/pipelines)

@elliottslaughter elliottslaughter removed this from the 24.09 milestone Sep 26, 2024
@seemamirch
Copy link
Contributor

One config that fails with cmake

  1. Build failure with cmake + gasnet1 on lassen (different from what's reported above)
  • the same options work on sapling
  • the same options but without cmake work on lassen
  • the scripts and output from the builds (with/without cmake) are on sapling -> /scratch2/seemah/lassen_cmake_issue/
  • build.sh - build script for both, bad.txt is the build output with cmake, good.txt is without cmake
  • cmake version 3.23.1

@muraj
Copy link

muraj commented Oct 23, 2024

@seemamirch I don't have access to the internal CI link you posted. Can you provide the logs of the issue? If you're seeing the following error, it is likely because you're building with the gasnetex wrapper enabled somehow, which requires either an environment variable set to be able to locate the wrapper library, or it needs to be available in a library search path (e.g. LD_LIBRARY_PATH).

[0 - 7fd0f5de7c40] 0.000000 {5}{gex}: Failed to load gex wrapper at librealm_gex_wrapper.so

If you do not want to use the gasnetex wrapper, then you need to disable it's use (by default I believe it is not used, so I believe you would be enabling it yourself or through another part of the build system, maybe legate or something?)

If you can provide the full log including the cmake command line used to configure Realm, and the build command and build output, I might be able to reproduce the issue and help resolve it for you. Thanks!

@seemamirch
Copy link
Contributor

@seemamirch I don't have access to the internal CI link you posted. Can you provide the logs of the issue? If you're seeing the following error, it is likely because you're building with the gasnetex wrapper enabled somehow, which requires either an environment variable set to be able to locate the wrapper library, or it needs to be available in a library search path (e.g. LD_LIBRARY_PATH).

[0 - 7fd0f5de7c40] 0.000000 {5}{gex}: Failed to load gex wrapper at librealm_gex_wrapper.so

If you do not want to use the gasnetex wrapper, then you need to disable it's use (by default I believe it is not used, so I believe you would be enabling it yourself or through another part of the build system, maybe legate or something?)

If you can provide the full log including the cmake command line used to configure Realm, and the build command and build output, I might be able to reproduce the issue and help resolve it for you. Thanks!

I've moved the files to https://sapling2.stanford.edu/~seemah/lassen_cmake_issue/) so you can view them.
my issue is without gasnetex
The CI link is not relevant for this issue
@mariodirenzo may benefit from your comments above (with gasnetex)

@muraj
Copy link

muraj commented Oct 24, 2024

Okay, looking at "bad.tzt", I see the following line:

/usr/WS1/mirchandaney1/legion_rc/runtime/realm/realm_config.h:90:2: error: #error Shared memory not supported on GASNET1
#error Shared memory not supported on GASNET1

This has to do with the shared memory support added several releases ago, and the logic to enable this is here in the cmake build system:

option(REALM_USE_SHM "Enable shared memory usage" ON)

Which, if gasnet1 is enabled, then this should be disabled. So either your configure command is enabling this feature and overriding the default, or something is wrong with the logic here.

@mariodirenzo
Copy link

As an update, I've been able to get a successful CI of HTR++ binding the latest master from scratch and setting GEX_BUILD_SHARED and Legion_USE_GASNETEX_WRAPPER to OFF. From my point of view, we can close the issue

@seemamirch
Copy link
Contributor

It's possible there was a config issue with my cmake build on lassen - I can't reproduce it. Since @mariodirenzo is also ok with the cmake build of HTR++ - closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants