Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add nvtx equivalent for rocm #940

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from
Open

Conversation

simonpintarelli
Copy link
Collaborator

@simonpintarelli simonpintarelli commented Dec 15, 2023

Adding
https://github.com/ROCm/roctracer/tree/amd-master?tab=readme-ov-file#roctx-api
which is identical to the currently used nvtx api.

  • roctracer lacks a cmake-config, add a FindRocTX.cmake to cmake/modules
  • note: +nvtx isn't part of the ci

@simonpintarelli
Copy link
Collaborator Author

build tested locally (cuda12) and (rocm 5.7.1) with +cuda+nvtx, +rocm+nvtx.

@gsavva
Copy link
Collaborator

gsavva commented Sep 26, 2024

@simonpintarelli would it be possible to bring this up to date and possibly merge?
Or would you suggest we keep the roctracer feature in its separate branch ?

@simonpintarelli simonpintarelli marked this pull request as ready for review September 26, 2024 17:05
@simonpintarelli simonpintarelli changed the title WIP: add nvtx equivalent for rocm add nvtx equivalent for rocm Sep 26, 2024
@simonpintarelli
Copy link
Collaborator Author

Thanks for the reminder @gsavva. Is it correct that it worked for you on lumi?

@gsavva
Copy link
Collaborator

gsavva commented Sep 27, 2024

Thanks for the reminder @gsavva. Is it correct that it worked for you on lumi?

Yes, I was using it on LUMI-G, and it was working with the only caveat described in the issue #961 (I had to comment-out a few timers for the post-processing script of rocprofiler to function properly).

Copy link
Collaborator

@toxa81 toxa81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to be merged

@toxa81
Copy link
Collaborator

toxa81 commented Oct 4, 2024

ping @gsavva

@gsavva
Copy link
Collaborator

gsavva commented Oct 4, 2024

@toxa81 I'll need to check the status of #961 with rocprof and my extra GPU timers and give my feedback.

- add dependency on `roctracer-dev` when(+rocm+nvtx)
- conflict +nvtx when neither rocm nor cuda is enabled
@gsavva
Copy link
Collaborator

gsavva commented Oct 14, 2024

The test_lr_solver fails with --roctx-trace :
srun -u -n1 rocprof --roctx-trace test_lr_solver --device=gpu --N=2 --num_bands=10

test_lr_solver : Failed
exception occured:
SpFFT: GPU FFT error

On the other hand, test_lr_solver runs and completes fine when:

  • it is run on its own (no rocprof),
  • it is run using only rocprof (no --roctx-trace, which is meant for tracing and visualizing specific regions on the code; SIRIUS timers are taken as regions)

I would suggest merging this PR and investigate this issue separately (it might be related to the recent update of ROCm)

@simonpintarelli
Copy link
Collaborator Author

it is indeed crashing, here is the backtrace:

#0  0x000014c081453d2b in raise () from /lib64/libc.so.6
#1  0x000014c0814553e5 in abort () from /lib64/libc.so.6
#2  0x000014c081aa55c9 in __gnu_cxx::__verbose_terminate_handler ()
    at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x000014c081ab0bfa in __cxxabiv1::__terminate (handler=<optimized out>)
    at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x000014c081ab0c65 in std::terminate () at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x000014c081ab0eb7 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x14c118464f58 <typeinfo for spfft::GPUFFTError>, 
    dest=0x14c11843fee0 <spfft::GPUFFTError::~GPUFFTError()>) at ../../../../cpe-gcc-12.2.0-202304182231.7dfee50f41751/libstdc++-v3/libsupc++/eh_throw.cc:98
#6  0x000014c11844003f in spfft::gpu::fft::check_result(hipfftResult_t) [clone .part.0] ()
   from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#7  0x000014c118442a5b in spfft::TransformReal2DGPU<double>::TransformReal2DGPU(spfft::GPUArrayView3D<double>, spfft::GPUArrayView3D<HIP_vector_type<double, 2u> >, spfft::GPUStreamHandle, std::shared_ptr<spfft::GPUArray<char> >) ()
   from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#8  0x000014c1184463ed in spfft::ExecutionGPU<double>::ExecutionGPU(int, std::shared_ptr<spfft::Parameters>, spfft::HostArray<std::complex<double> >&, spfft::HostArray<std::complex<double> >&, spfft::GPUArray<HIP_vector_type<double, 2u> >&, spfft::GPUArray<HIP_vector_type<double, 2u> >&, std::shared_ptr<spfft::GPUArray<char> > const&) () from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#9  0x000014c11843acd4 in spfft::TransformInternal<double>::TransformInternal(SpfftProcessingUnitType, std::shared_ptr<spfft::GridInternal<double> >, std::shared_ptr<spfft::Parameters>) () from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#10 0x000014c11843653a in spfft::Transform::Transform(std::shared_ptr<spfft::GridInternal<double> > const&, SpfftProcessingUnitType, SpfftTransformType, int, int, int, int, int, SpfftIndexFormatType, int const*) ()
   from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#11 0x000014c11843ca35 in spfft::Grid::create_transform(SpfftProcessingUnitType, SpfftTransformType, int, int, int, int, int, SpfftIndexFormatType, int const*) const () from /scratch/project_465000416/sipintar/spack-install/24.09-ext-rocm/spfft-1.1.0-3feeom4/lib64/libspfft.so.1
#12 0x000014c1203e31c9 in sirius::Simulation_context::update (this=this@entry=0x1ac7180)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/src/context/simulation_context.cpp:873
#13 0x000014c1203e935a in sirius::Simulation_context::initialize (this=<optimized out>)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/src/context/simulation_context.cpp:486
#14 0x000000000047f457 in sirius::create_simulation_context (conf__=..., L__=..., num_atoms__=<optimized out>, coord__=..., add_vloc__=add_vloc__@entry=true, 
    add_dion__=add_dion__@entry=true) at /opt/cray/pe/gcc/12.2.0/snos/include/g++/bits/unique_ptr.h:191
#15 0x000000000043257a in test_lr_solver (args__=...)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/apps/tests/test_lr_solver.cpp:304
#16 0x0000000000440159 in sirius::call_test<int (&)(sirius::cmd_args const&), sirius::cmd_args&> (label__=..., 
    f__=@0x4317c0: {int (const sirius::cmd_args &)} 0x4317c0 <test_lr_solver(sirius::cmd_args const&)>)
    at /opt/cray/pe/gcc/12.2.0/snos/include/g++/bits/char_traits.h:354
#17 0x0000000000428a9b in main (argn=4, argv=0x7ffc24ba23c8)
    at /tmp/sipintar/spack-stage/spack-stage-sirius-git.feat_roctracer_develop-hj3ab676gx2i5l7mlpd4bwvuljcjr4qi/spack-src/apps/tests/test_lr_solver.cpp:344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants