Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when running on Legion master #11

Open
manopapad opened this issue Dec 9, 2020 · 0 comments
Open

Segfault when running on Legion master #11

manopapad opened this issue Dec 9, 2020 · 0 comments

Comments

@manopapad
Copy link

Building pagerank against current Legion master and running it on hollywood.lux results in a segfault at this location:

Thread 12 (Thread 0x7f09487f1ac0 (LWP 59394)):
#0  0x00007f0962443722 in __GI___waitpid (pid=59395, stat_loc=stat_loc@entry=0x7f09487ec988, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f09623ae107 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x0000000002e60108 in gasneti_bt_gdb ()
#3  0x0000000002e63a6f in gasneti_print_backtrace ()
#4  0x00000000014b025f in gasneti_defaultSignalHandler ()
#5  <signal handler called>
#6  0x00000000014cab3e in pull_init_task_impl (task=0x7f010c209f10, regions=..., ctx=0x7f010c050280, runtime=0xa2fd9e0) at pagerank_gpu.cu:231
#7  0x00000000014bbcf0 in Legion::LegionTaskWrapper::legion_task_wrapper<GraphPiece, &(pull_init_task_impl(Legion::Task const*, std::vector<Legion::PhysicalRegion, std::allocator<Legion::PhysicalRegion> > const&, Legion::Internal::TaskContext*, Legion::Runtime*))> (args=0x7f010c02e6d8, arglen=8, userdata=0x0, userlen=0, p=...) at ../legion/runtime/legion/legion.inl:20435
#8  0x0000000002834a4d in Realm::LocalTaskProcessor::execute_task (this=0xa6ac590, func_id=102, task_args=...) at ../legion/runtime/realm/proc_impl.cc:1090
#9  0x0000000002e2a556 in Realm::Task::execute_on_processor (this=0x7f010c02e560, p=...) at ../legion/runtime/realm/tasks.cc:306
#10 0x0000000002e2e22e in Realm::KernelThreadTaskScheduler::execute_task (this=0xa6a79b0, task=0x7f010c02e560) at ../legion/runtime/realm/tasks.cc:1380
#11 0x000000000292f956 in Realm::Cuda::GPUTaskScheduler<Realm::KernelThreadTaskScheduler>::execute_task (this=0xa6a79b0, task=0x7f010c02e560) at ../legion/runtime/realm/cuda/cuda_module.cc:1657
#12 0x0000000002e2d246 in Realm::ThreadedTaskScheduler::scheduler_loop (this=0xa6a79b0) at ../legion/runtime/realm/tasks.cc:1127
#13 0x0000000002e2d700 in Realm::ThreadedTaskScheduler::scheduler_loop_wlock (this=0xa6a79b0) at ../legion/runtime/realm/tasks.cc:1231
#14 0x0000000002e33c9c in Realm::Thread::thread_entry_wrapper<Realm::ThreadedTaskScheduler, &Realm::ThreadedTaskScheduler::scheduler_loop_wlock> (obj=0xa6a79b0) at ../legion/runtime/realm/threads.inl:97
#15 0x00000000026cfbcb in Realm::KernelThread::pthread_entry (data=0x9f7ea10) at ../legion/runtime/realm/threads.cc:774
#16 0x00007f0964a826db in start_thread (arg=0x7f09487f1ac0) at pthread_create.c:463
#17 0x00007f0962480a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It looks like the failing statement https://github.com/LuxGraph/Lux/blob/master/pagerank/pagerank_gpu.cu#L231 is trying to access GPU memory directly inside a GPU variant (i.e. in code that runs on the host). The same result could probably be achieved with a cudaMemCpy.

Note that this crash does not happen on the Legion stable branch.

Also note that alloc_bytes needs to be changed to alloc_bytes_local for Lux to compile properly against Legion master (at https://github.com/LuxGraph/Lux/blob/master/pagerank/pagerank_gpu.cu#L272 and https://github.com/LuxGraph/Lux/blob/master/pagerank/pagerank_gpu.cu#L275).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant