-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Realm: Optimize gather copies in Moya/FleCSI #1733
Comments
Backtrace for rank 0
|
Backtrace for rank1
|
rank 2 backtrace
|
rank 3 backtrace
|
rank 4 backtrace
|
backtrace 5
|
backtrace 6
|
backtrace 7
|
backtrace 8
|
backtrace 9
|
backtrace 10
|
backtrace 11
|
backtrace 12
|
backtrace 13
|
backtrace 14
|
backtrace 15
|
Just FYI, when you have a bunch of files like this you can just attach them directly by clicking and dragging into the comment text box. (Technically the file extension needs to be |
thread 16
|
I'm just cutting and pasting from my terminal. |
17
|
Not that it matters to me, since I'm not the one debugging it, but if I were to do this, I'd open a text editor like Emacs or Vim and paste the backtraces into it, then save the files out as |
18
|
Well the job went down now. I hope there is enough info in the first 19 of 32 ranks. |
Are these backtraces from a run that is hanging or running slowly? |
I have an impression that this is coming from the "hang" but @jpietarilagraham would be best to confirm. I think what Elliott had suggested perhaps just making it a single |
We have been working with the LANL team on Moay, which is an unstructured multimaterial Lagrangian hydrodynamics application. The application performs a number of gather operations of various types (to be determined), where most of them involve gathering data into framebuffer memory. It has been observed that the performance of these operations is sub-optimal
I am filing an issue to track the progress of this work. The application is run in settings of 1, 2, 4, and 8 nodes, with 4 GPUs per node. In addition, a hang is observed when running the application with 8 nodes.
The expectation is that it should be possible to leverage faster cuda-dma gather path in Realm to improve the timing.
@jpietarilagraham please fill-on more details. @lightsighter for visibility Thanks
The text was updated successfully, but these errors were encountered: