Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: trilinos test failing #131

Closed
shahzebsiddiqui opened this issue Oct 3, 2022 · 2 comments
Closed

[Bug]: trilinos test failing #131

shahzebsiddiqui opened this issue Oct 3, 2022 · 2 comments
Assignees
Labels
bug Something isn't working E4S-Testsuite Issues related with E4S Testsuite (https://github.com/E4S-Project/testsuite)

Comments

@shahzebsiddiqui
Copy link
Contributor

CDASH Build

https://my.cdash.org/test/63278708

Link to buildspec file

https://github.com/buildtesters/buildtest-nersc/blob/devel/buildspecs/e4s/E4S-Testsuite/perlmutter/22.05/trilinos.yml

Please describe the issue?

see issue E4S-Project/testsuite#39

Relevant log output

+ cd -
/global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Running /global/cfs/cdirs/m3503/buildtest/runs/perlmutter_check/2022-09-28/perlmutter.slurm.regular/trilinos/trilinos_e4s_testsuite_22.05/75260858/stage/testsuite/validation_tests/trilinos
Skipping load: Environment already setup
+ cd ./build
+ export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
+ export OMP_NUM_THREADS=4
+ OMP_NUM_THREADS=4
+ srun -n 8 ./Zoltan
MPICH ERROR [Rank 0] [job id 3289011.0] [Wed Sep 28 19:56:45 2022] [nid003233] - Abort(-1) (rank 0 in comm 0): MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked
 (Other MPI error)

aborting job:
MPIDI_CRAY_init: GPU_SUPPORT_ENABLED is requested, but GTL library is not linked

srun: error: nid003233: tasks 0-7: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=3289011.0
Run failed
@shahzebsiddiqui shahzebsiddiqui added the bug Something isn't working label Oct 3, 2022
@wspear
Copy link
Collaborator

wspear commented Oct 3, 2022

Setting MPICH_GPU_SUPPORT_ENABLED to 0 at runtime works around this error but the test still segfaults after it prints a confirmation of success. The backtrace from the resulting core file looks like:

#0  __freeBlasMemPool (numa_mask=<optimized out>, tag=<optimized out>)
    at ./src/crayblas_util.c:353

I'm not sure turning off MPICH_GPU_SUPPORT_ENABLED is an acceptable fix.

@shahzebsiddiqui shahzebsiddiqui added the E4S-Testsuite Issues related with E4S Testsuite (https://github.com/E4S-Project/testsuite) label Nov 29, 2022
@shahzebsiddiqui
Copy link
Contributor Author

this is a duplicate of issue #145 so closing this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working E4S-Testsuite Issues related with E4S Testsuite (https://github.com/E4S-Project/testsuite)
Projects
None yet
Development

No branches or pull requests

2 participants