Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in the quadruplets workflow on CPU #564

Closed
fwyzard opened this issue Oct 24, 2020 · 31 comments
Closed

Segmentation fault in the quadruplets workflow on CPU #564

fwyzard opened this issue Oct 24, 2020 · 31 comments
Labels
bug fixed Pixels Pixels-related developments

Comments

@fwyzard
Copy link

fwyzard commented Oct 24, 2020

In recent Patatrack releases (both CMSSW_11_2_0_pre7 with the current CMSSW_11_2_X_Patatrack branch, and CMSSW_11_2_0_pre8 with the current master branch) I see frequent crashes in the 11634.501 workflow, that is, Patatrack pixel quadruplets running on the CPU.

This may actually have been there for a while, and have been revealed by the update to the workflow (before we were actually testing the legacy pixel tracks with the new fits).

I've observed this using the relvals and global tags from CMSSW_11_2_0_pre3 and CMSSW_11_2_0_pre7, so it's likely not dependent on the input data.

Begin processing the 1st record. Run 1, Event 6, LumiSection 1 on stream 6 at 24-Oct-2020 16:29:37.139 CEST
Begin processing the 2nd record. Run 1, Event 4, LumiSection 1 on stream 3 at 24-Oct-2020 16:29:37.152 CEST
Begin processing the 3rd record. Run 1, Event 8, LumiSection 1 on stream 0 at 24-Oct-2020 16:29:37.153 CEST
Begin processing the 4th record. Run 1, Event 7, LumiSection 1 on stream 1 at 24-Oct-2020 16:29:37.155 CEST
Begin processing the 5th record. Run 1, Event 5, LumiSection 1 on stream 5 at 24-Oct-2020 16:29:37.156 CEST
Begin processing the 6th record. Run 1, Event 9, LumiSection 1 on stream 4 at 24-Oct-2020 16:29:37.157 CEST
Begin processing the 7th record. Run 1, Event 3, LumiSection 1 on stream 2 at 24-Oct-2020 16:29:37.159 CEST
Begin processing the 8th record. Run 1, Event 2, LumiSection 1 on stream 7 at 24-Oct-2020 16:29:37.160 CEST


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

Sat Oct 24 16:29:38 CEST 2020
Thread 9 (Thread 0x7f8a475ff700 (LWP 98433)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f8ab6d6c53d in __cxxabiv1::__dynamic_cast (src_ptr=src_ptr@entry=0x7f8a1524d9f0, src_type=src_type@entry=0x7f8aaaf3d8c0 <typeinfo for TrackingRecHit>, dst_type=0x7f8a8cdd7368 <typeinfo for SiStripRecHit2D>, dst_type@entry=0x7f8a3d8441f0 <typeinfo for SiStripRecHit2D>, src2dst=src2dst@entry=0) at ../../../../libstdc++-v3/libsupc++/dyncast.cc:73
#5  0x00007f8a3d8293d9 in TrackerHitAssociator::associateHitId (this=this@entry=0x7f8a475f7990, thit=..., simtkid=std::vector of length 0, capacity 0, simhitCFPos=simhitCFPos@entry=0x7f8a475f7870) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackerHitAssociation/src/TrackerHitAssociator.cc:347
#6  0x00007f8a3d829c69 in TrackerHitAssociator::associateHit (this=this@entry=0x7f8a475f7990, thit=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackerHitAssociation/src/TrackerHitAssociator.cc:232
#7  0x00007f8a3cfb5c1e in SiPixelPhase1RecHitsV::analyze (this=0x7f8a522bf200, iEvent=..., iSetup=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/Validation/SiPixelPhase1RecHitsV/src/SiPixelPhase1RecHitsV.cc:38
#8  0x00007f8ab904c774 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#9  0x00007f8ab9026d1e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#10 0x00007f8ab8f8cf25 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#11 0x00007f8ab8f8d0dd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#12 0x00007f8ab8f8d3e6 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#13 0x00007f8ab8f8eaea in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#14 0x00007f8ab7787bfd in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x7f8a71793e00, context_guard=..., t=t@entry=0x7f8a4af7da40, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:393
#15 0x00007f8ab7787ef5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f8a71793e00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#16 0x00007f8ab77819ff in tbb::internal::arena::process (this=0x7f8ab4187480, s=...) at ../../src/tbb/arena.cpp:196
#17 0x00007f8ab77803d3 in tbb::internal::market::process (this=0x7f8ab4183580, j=...) at ../../src/tbb/market.cpp:667
#18 0x00007f8ab777c7dc in tbb::internal::rml::private_worker::run (this=0x7f8aac906e00) at ../../src/tbb/private_server.cpp:266
#19 0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#20 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f8a487fc700 (LWP 98432)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f8ab64cd1c9 in syscall () from /lib64/libc.so.6
#5  0x00007f8ab777c9b5 in tbb::internal::futex_wait (comparand=2, futex=0x7f8aac906f2c) at ../../include/tbb/machine/linux_common.h:81
#6  tbb::internal::binary_semaphore::P (this=0x7f8aac906f2c) at ../../src/tbb/semaphore.h:205
#7  rml::internal::thread_monitor::commit_wait (c=..., this=0x7f8aac906f20) at ../../src/tbb/../rml/server/thread_monitor.h:255
#8  tbb::internal::rml::private_worker::run (this=0x7f8aac906f00) at ../../src/tbb/private_server.cpp:273
#9  0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#10 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f8a491fd700 (LWP 98431)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f8a3d8f2736 in std::vector<HepMC::GenVertex const*, std::allocator<HepMC::GenVertex const*> >::push_back (__x=@0x7f8a491f5688: 0x7f8a1a115880, this=0x7f8a42e14f08) at /data/cmssw/slc7_amd64_gcc820/external/gcc/8.2.0-bcolbf/include/c++/8.4.0/new:169
#5  HistoryBase::traceGenHistory (this=this@entry=0x7f8a42e14f08, genVertex=0x7f8a1a115880) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackHistory/src/HistoryBase.cc:49
#6  0x00007f8a3d8f2b7a in HistoryBase::traceSimHistory (this=<optimized out>, trackingVertex=..., depth=<optimized out>) at /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/src/DataFormats/Common/interface/refcore_implementation.h:69
#7  0x00007f8a3d8f37cf in HistoryBase::traceSimHistory (this=0x7f8a42e14f08, trackingParticle=..., depth=-3) at /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/src/SimDataFormats/TrackingAnalysis/interface/TrackingParticle.h:90
#8  0x00007f8a3d8f33b9 in HistoryBase::traceSimHistory (this=<optimized out>, trackingVertex=..., depth=-3) at /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/src/DataFormats/Common/interface/RefCore.h:46
#9  0x00007f8a3d8f37cf in HistoryBase::traceSimHistory (this=this@entry=0x7f8a42e14f08, trackingParticle=..., depth=-2) at /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/src/SimDataFormats/TrackingAnalysis/interface/TrackingParticle.h:90
#10 0x00007f8a3c7d46cb in HistoryBase::evaluate (tpr=..., this=0x7f8a42e14f08) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackHistory/interface/HistoryBase.h:98
#11 TrackingParticleBHadronRefSelector::produce (this=0x7f8a42e14c00, iEvent=..., iSetup=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackHistory/plugins/TrackingParticleBHadronRefSelector.cc:52
#12 0x00007f8ab904c774 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#13 0x00007f8ab9026d1e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#14 0x00007f8ab8f8cf25 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#15 0x00007f8ab8f8d0dd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#16 0x00007f8ab8f8d3e6 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#17 0x00007f8ab8f8eaea in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#18 0x00007f8ab7787bfd in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x7f8ab3ff3e00, context_guard=..., t=t@entry=0x7f8a4af7c740, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:393
#19 0x00007f8ab7787ef5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f8ab3ff3e00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#20 0x00007f8ab77819ff in tbb::internal::arena::process (this=0x7f8ab4187480, s=...) at ../../src/tbb/arena.cpp:196
#21 0x00007f8ab77803d3 in tbb::internal::market::process (this=0x7f8ab4183580, j=...) at ../../src/tbb/market.cpp:667
#22 0x00007f8ab777c7dc in tbb::internal::rml::private_worker::run (this=0x7f8aac907000) at ../../src/tbb/private_server.cpp:266
#23 0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#24 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f8a49bfe700 (LWP 98430)):
#0  0x00007f8ab64c820d in poll () from /lib64/libc.so.6
#1  0x00007f8aa9bf943f in full_read.constprop () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x00007f8aa9bf9b7c in edm::service::InitRootHandlers::stacktraceFromThread() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  0x00007f8aa9bfaa59 in sig_dostack_then_abort () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#4  <signal handler called>
#5  GPUCACell::init (outerHitId=4020, innerHitId=<optimized out>, doubletId=0, layerPairId=0, hh=..., cellTracks=..., cellNeighbors=..., this=0x7f8a6b60e840) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h:107
#6  gpuPixelDoublets::doubletsFromHisto (maxNumOfDoublets=524288, doPtCut=true, doZ0Cut=true, doClusterCut=true, ideal_cond=true, maxr=0x7f8a3db30cc0 <gpuPixelDoublets::maxr>, maxz=0x7f8a3db30d20 <gpuPixelDoublets::maxz>, minz=0x7f8a3db30d80 <gpuPixelDoublets::minz>, phicuts=0x7f8a3db30de0 <gpuPixelDoublets::phicuts>, isOuterHitOfCell=0x7f8a6820a7c0, hh=..., cellTracks=0x7f8a466550c0, cellNeighbors=0x7f8a466550b0, nCells=0x7f8a13f3adf0, cells=0x7f8a6b60e840, nPairs=<optimized out>, layerPairs=0x7f8a3db30e20 <gpuPixelDoublets::layerPairs> "") at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/gpuPixelDoubletsAlgos.h:226
#7  gpuPixelDoublets::getDoubletsFromHisto (maxNumOfDoublets=524288, doPtCut=true, doZ0Cut=true, doClusterCut=true, ideal_cond=true, nActualPairs=<optimized out>, isOuterHitOfCell=0x7f8a6820a7c0, hhp=0x7f8a323f9120, cellTracks=0x7f8a466550c0, cellNeighbors=0x7f8a466550b0, nCells=0x7f8a13f3adf0, cells=0x7f8a6b60e840) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/gpuPixelDoublets.h:109
#8  CAHitNtupletGeneratorKernels<cms::cudacompat::CPUTraits>::buildDoublets (this=this@entry=0x7f8a49bf6900, hh=..., stream=stream@entry=0x0) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc:56
#9  0x00007f8a3dad7752 in CAHitNtupletGeneratorOnGPU::makeTuples (this=this@entry=0x7f8aa8c62820, hits_d=..., bfield=0.0114256972) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorOnGPU.cc:209
#10 0x00007f8a3dac4c8e in CAHitNtupletCUDA::produce (this=0x7f8aa8c62800, streamID=..., iEvent=..., es=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletCUDA.cc:81
#11 0x00007f8ab90325ef in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#12 0x00007f8ab902685e in edm::WorkerT<edm::global::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#13 0x00007f8ab8f8cf25 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#14 0x00007f8ab8f8d0dd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#15 0x00007f8ab8f8d3e6 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#16 0x00007f8ab8f8eaea in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#17 0x00007f8ab7787bfd in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x7f8ab3ffbe00, context_guard=..., t=t@entry=0x7f8a4af81940, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:393
#18 0x00007f8ab7787ef5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f8ab3ffbe00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#19 0x00007f8ab77819ff in tbb::internal::arena::process (this=0x7f8ab4187480, s=...) at ../../src/tbb/arena.cpp:196
#20 0x00007f8ab77803d3 in tbb::internal::market::process (this=0x7f8ab4183580, j=...) at ../../src/tbb/market.cpp:667
#21 0x00007f8ab777c7dc in tbb::internal::rml::private_worker::run (this=0x7f8aac906e80) at ../../src/tbb/private_server.cpp:266
#22 0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#23 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#24 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f8a4a7ff700 (LWP 98429)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f8ab64cd1c9 in syscall () from /lib64/libc.so.6
#5  0x00007f8ab777c9b5 in tbb::internal::futex_wait (comparand=2, futex=0x7f8aac906fac) at ../../include/tbb/machine/linux_common.h:81
#6  tbb::internal::binary_semaphore::P (this=0x7f8aac906fac) at ../../src/tbb/semaphore.h:205
#7  rml::internal::thread_monitor::commit_wait (c=..., this=0x7f8aac906fa0) at ../../src/tbb/../rml/server/thread_monitor.h:255
#8  tbb::internal::rml::private_worker::run (this=0x7f8aac906f80) at ../../src/tbb/private_server.cpp:273
#9  0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#10 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f8a4b9fe700 (LWP 98428)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f8a3dad4100 in cms::cuda::VecArray<unsigned int, 128>::reset (this=0x7f8a69d2ef14) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/HeterogeneousCore/CUDAUtilities/interface/VecArray.h:90
#5  gpuPixelDoublets::initDoublets (cellTracksContainer=0x7f8a6af4d780, cellTracks=0x7f8a3e162740, cellNeighborsContainer=<optimized out>, cellNeighbors=0x7f8a3e162730, nHits=19161, isOuterHitOfCell=<optimized out>) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/gpuPixelDoublets.h:75
#6  CAHitNtupletGeneratorKernels<cms::cudacompat::CPUTraits>::buildDoublets (this=this@entry=0x7f8a4b9f6900, hh=..., stream=stream@entry=0x0) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc:35
#7  0x00007f8a3dad7752 in CAHitNtupletGeneratorOnGPU::makeTuples (this=this@entry=0x7f8aa8c62820, hits_d=..., bfield=0.0114256972) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorOnGPU.cc:209
#8  0x00007f8a3dac4c8e in CAHitNtupletCUDA::produce (this=0x7f8aa8c62800, streamID=..., iEvent=..., es=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletCUDA.cc:81
#9  0x00007f8ab90325ef in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#10 0x00007f8ab902685e in edm::WorkerT<edm::global::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#11 0x00007f8ab8f8cf25 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#12 0x00007f8ab8f8d0dd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#13 0x00007f8ab8f8d3e6 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#14 0x00007f8ab8f8eaea in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#15 0x00007f8ab7787bfd in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x7f8ab3fe3e00, context_guard=..., t=t@entry=0x7f8a4af92b40, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:393
#16 0x00007f8ab7787ef5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f8ab3fe3e00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#17 0x00007f8ab77819ff in tbb::internal::arena::process (this=0x7f8ab4187480, s=...) at ../../src/tbb/arena.cpp:196
#18 0x00007f8ab77803d3 in tbb::internal::market::process (this=0x7f8ab4183580, j=...) at ../../src/tbb/market.cpp:667
#19 0x00007f8ab777c7dc in tbb::internal::rml::private_worker::run (this=0x7f8aac907100) at ../../src/tbb/private_server.cpp:266
#20 0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#21 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f8a4c3ff700 (LWP 98427)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007f8ab771c19e in ?? () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libz.so.1
#5  0x00007f8ab771d369 in inflate () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libz.so.1
#6  0x00007f8ab7f59d09 in R__unzip () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libCore.so
#7  0x00007f8ab8a4d886 in TBasket::ReadBasketBuffers(long long, int, TFile*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libTree.so
#8  0x00007f8ab8a57792 in TBranch::GetBasketImpl(int, TBuffer*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libTree.so
#9  0x00007f8ab8a57df9 in TBranch::GetBasketAndFirst(TBasket*&, long long&, TBuffer*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libTree.so
#10 0x00007f8ab8a584c4 in TBranch::GetEntry(long long, int) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libTree.so
#11 0x00007f8ab8a689f9 in TBranchElement::GetEntry(long long, int) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libTree.so
#12 0x00007f8ab8a688c6 in TBranchElement::GetEntry(long long, int) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/external/slc7_amd64_gcc820/lib/libTree.so
#13 0x00007f8a7eb158e6 in edm::RootTree::getEntry(TBranch*, long long) const () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#14 0x00007f8a7eaeee22 in edm::RootDelayedReader::getProduct_(edm::BranchID const&, edm::EDProductGetter const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginIOPoolInput.so
#15 0x00007f8ab8ef04cf in edm::DelayedReader::getProduct(edm::BranchID const&, edm::EDProductGetter const*, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#16 0x00007f8ab8faa09d in edm::InputProductResolver::prefetchAsync_(edm::WaitingTask*, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}::operator()() const () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#17 0x00007f8ab8faa1b4 in edm::SerialTaskQueue::QueuedTask<edm::SerialTaskQueueChain::push<edm::InputProductResolver::prefetchAsync_(edm::WaitingTask*, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}&>(edm::InputProductResolver::prefetchAsync_(edm::WaitingTask*, edm::Principal const&, bool, edm::ServiceToken const&, edm::SharedResourcesAcquirer*, edm::ModuleCallingContext const*) const::{lambda()#1}&)::{lambda()#1}>::execute() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#18 0x00007f8ab7787bfd in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x7f8ab3febe00, context_guard=..., t=0x7f8a4af7d240, t@entry=0x7f8ab3fed440, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:393
#19 0x00007f8ab7787ef5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f8ab3febe00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#20 0x00007f8ab77819ff in tbb::internal::arena::process (this=0x7f8ab4187480, s=...) at ../../src/tbb/arena.cpp:196
#21 0x00007f8ab77803d3 in tbb::internal::market::process (this=0x7f8ab4183580, j=...) at ../../src/tbb/market.cpp:667
#22 0x00007f8ab777c7dc in tbb::internal::rml::private_worker::run (this=0x7f8aac907080) at ../../src/tbb/private_server.cpp:266
#23 0x00007f8ab777c9e9 in tbb::internal::rml::private_worker::thread_routine (arg=<optimized out>) at ../../src/tbb/private_server.cpp:219
#24 0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f8a8d9bd700 (LWP 98416)):
#0  0x00007f8ab67b1179 in waitpid () from /lib64/libpthread.so.0
#1  0x00007f8aa9bf8fd7 in edm::service::cmssw_stacktrace_fork() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#2  0x00007f8aa9bf9a9a in edm::service::InitRootHandlers::stacktraceHelperThread() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  0x00007f8ab6d96d2f in std::execute_native_thread_routine (__p=0x7f8aab4e51d0) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#4  0x00007f8ab67a9dd5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f8ab64d2ead in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f8ab4ad4540 (LWP 98393)):
#0  0x00007f8ab6499e2d in nanosleep () from /lib64/libc.so.6
#1  0x00007f8ab6499cc4 in sleep () from /lib64/libc.so.6
#2  0x00007f8aa9bf8e20 in sig_pause_for_stacktrace () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  OmniClusterRef::isPixel (this=0x7f8a2379f220) at /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/src/DataFormats/Common/interface/RefCoreWithIndex.h:78
#5  OmniClusterRef::cluster_pixel (this=0x7f8a2379f220) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/DataFormats/TrackerRecHit2D/interface/OmniClusterRef.h:41
#6  TrackerSingleRecHit::cluster_pixel (this=0x7f8a2379f1e0) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/DataFormats/TrackerRecHit2D/interface/TrackerSingleRecHit.h:47
#7  SiPixelRecHit::cluster (this=0x7f8a2379f1e0) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/DataFormats/TrackerRecHit2D/interface/SiPixelRecHit.h:47
#8  TrackerHitAssociator::associatePixelRecHit (this=0x7ffdc31ce790, pixelrechit=0x7f8a2379f1e0, simtrackid=std::vector of length 0, capacity 0, simhitCFPos=0x7ffdc31ce670) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackerHitAssociation/src/TrackerHitAssociator.cc:578
#9  0x00007f8a3d829b6e in TrackerHitAssociator::associateHitId (this=this@entry=0x7ffdc31ce790, thit=..., simtkid=std::vector of length 0, capacity 0, simhitCFPos=simhitCFPos@entry=0x7ffdc31ce670) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackerHitAssociation/src/TrackerHitAssociator.cc:368
#10 0x00007f8a3d829c69 in TrackerHitAssociator::associateHit (this=this@entry=0x7ffdc31ce790, thit=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/SimTracker/TrackerHitAssociation/src/TrackerHitAssociator.cc:232
#11 0x00007f8a3cfb5c1e in SiPixelPhase1RecHitsV::analyze (this=0x7f8a522c0400, iEvent=..., iSetup=...) at /data/user/fwyzard/patatrack/validation/run_562.fq8sph5yrS/testing/src/Validation/SiPixelPhase1RecHitsV/src/SiPixelPhase1RecHitsV.cc:38
#12 0x00007f8ab904c774 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#13 0x00007f8ab9026d1e in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#14 0x00007f8ab8f8cf25 in decltype ({parm#1}()) edm::convertException::wrap<edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}>(edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*)::{lambda()#1}) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#15 0x00007f8ab8f8d0dd in bool edm::Worker::runModule<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#16 0x00007f8ab8f8d3e6 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr const*, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#17 0x00007f8ab8f8eaea in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#18 0x00007f8ab7787bfd in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::process_bypass_loop (this=this@entry=0x7f8ab4178e00, context_guard=..., t=t@entry=0x7f8a4afbbe40, isolation=isolation@entry=0) at ../../src/tbb/custom_scheduler.h:393
#19 0x00007f8ab7787ef5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f8ab4178e00, parent=..., child=<optimized out>) at ../../include/tbb/task.h:1003
#20 0x00007f8ab8f0e355 in edm::EventProcessor::processLumis(std::shared_ptr<void> const&) () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#21 0x00007f8ab8f1649e in edm::EventProcessor::runToCompletion() () from /data/cmssw/slc7_amd64_gcc820/cms/cmssw/CMSSW_11_2_0_pre8/lib/slc7_amd64_gcc820/libFWCoreFramework.so
#22 0x000000000040f59d in tbb::interface7::internal::delegated_function<main::{lambda()#1}::operator()() const::{lambda()#1} const, void>::operator()() const ()
#23 0x00007f8ab7782bc1 in tbb::interface7::internal::task_arena_base::internal_execute (this=0x7ffdc31cf700, d=warning: RTTI symbol not found for class 'tbb::interface7::internal::delegated_function<main::{lambda()#1}::operator()() const::{lambda()#1} const, void>'
#24 0x0000000000410561 in main::{lambda()#1}::operator()() const ()
#25 0x000000000040eff5 in main ()

Current Modules:

Module: CAHitNtupletCUDA:pixelTrackSoA (crashed)
Module: TrackingParticleBHadronRefSelector:trackingParticlesBHadron
Module: none
Module: none
Module: SiPixelPhase1RecHitsV:pixelOnlyRecHitsAnalyzerV
Module: SiPixelPhase1RecHitsV:pixelOnlyRecHitsAnalyzerV
Module: none
Module: CAHitNtupletCUDA:pixelTrackSoA

A fatal system signal has occurred: segmentation violation

To reproduce it with pre7:

cmsrel CMSSW_11_2_0_pre7_Patatrack
cd CMSSW_11_2_0_pre7_Patatrack/src
cmsenv
git cms-init -x cms-patatrack
git checkout -t cms-patatrack/CMSSW_11_2_X_Patatrack
git diff $CMSSW_VERSION --name-only --no-renames | cut -d/ -f-2 | uniq | xargs -r git cms-addpkg
git cms-checkdeps -a
cmsCudaRebuild.sh

cmsDriver.py step3 \
    -s RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM \
    --era Run3 \
    --geometry DB:Extended \
    --conditions 112X_mcRun3_2021_realistic_v8 \
    --filein /store/relval/CMSSW_11_2_0_pre7/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_112X_mcRun3_2021_realistic_v8-v1/20000/8BD20F29-96F9-7C44-9078-E641186F0B19.root \
    --fileout file:step3.root \
    --eventcontent RECOSIM,DQM \
    --datatier GEN-SIM-RECO,DQMIO \
    --customise RecoPixelVertexing/Configuration/customizePixelTracksSoAonCPU.customizePixelTracksSoAonCPU \
    -n 100 \
    --nThreads 8 \
    --nStreams 8 \
    --nConcurrentLumis 1 \
    --python_filename step3.py \
    --no_exec

cmsRun step3.py

To reproduce it with pre8 (only the Patatrack branch and global tag are different):

cmsrel CMSSW_11_2_0_pre8
cd CMSSW_11_2_0_pre8/src
cmsenv
git cms-init -x cms-patatrack
git checkout -t cms-patatrack/master
git diff $CMSSW_VERSION --name-only --no-renames | cut -d/ -f-2 | uniq | xargs -r git cms-addpkg
git cms-checkdeps -a
cmsCudaRebuild.sh

cmsDriver.py step3 \
    -s RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM \
    --era Run3 \
    --geometry DB:Extended \
    --conditions 112X_mcRun3_2021_realistic_v10 \
    --filein /store/relval/CMSSW_11_2_0_pre7/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_112X_mcRun3_2021_realistic_v8-v1/20000/8BD20F29-96F9-7C44-9078-E641186F0B19.root \
    --fileout file:step3.root \
    --eventcontent RECOSIM,DQM \
    --datatier GEN-SIM-RECO,DQMIO \
    --customise RecoPixelVertexing/Configuration/customizePixelTracksSoAonCPU.customizePixelTracksSoAonCPU \
    -n 100 \
    --nThreads 8 \
    --nStreams 8 \
    --nConcurrentLumis 1 \
    --python_filename step3.py \
    --no_exec

cmsRun step3.py

The input file is available under /gpu_data/store/... on the online machines, under /data/store/... on vocms006, and otherwiase over xrootd.

@fwyzard
Copy link
Author

fwyzard commented Oct 24, 2020

@VinInn could you have a look ?

@VinInn
Copy link

VinInn commented Oct 25, 2020

with pre7 cannot reproduce on patatrack02
btw: the job is fully sequential: threads are sitting on futex wait all the time
got this time to time

[2020-10-25 12:08:17.011886 +0100][Error  ][PostMaster        ] [cmsxrootd-site1.fnal.gov:1093 #0] Forcing error on disconnect: [ERROR] Operation interrupted.

and takes forever

@VinInn
Copy link

VinInn commented Oct 25, 2020

with pre8 doesn't crash either...

@VinInn
Copy link

VinInn commented Oct 25, 2020

do not understand: the file is served from Italy
root://xrootd-cms.infn.it//store/relval/CMSSW_11_2_0_pre7/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_112X_mcRun3_2021_realistic_v8-v1/20000/8BD20F29-96F9-7C44-9078-E641186F0B19.root
why a fully reproducible error from Fermi?
[2020-10-25 15:34:58.032207 +0100][Error ][PostMaster ] [cmsxrootd-site2.fnal.gov:1093 #0] Forcing error on disconnect: [ERROR] Operation interrupted.

@fwyzard
Copy link
Author

fwyzard commented Nov 28, 2020

This seems frequently reproducible with pre10:

cmsrel CMSSW_11_2_0_pre10_Patatrack
cd CMSSW_11_2_0_pre10_Patatrack
cmsenv

xrdcp root://cmsxrootd.fnal.gov//store/relval/CMSSW_11_2_0_pre9/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_112X_mcRun3_2021_realistic_v11-v1/00000/f6888c6e-1fe6-413e-b2b5-e54ff3a4fe2b.root .

cmsDriver.py step3 \
  --geometry DB:Extended \
  --era Run3 \
  --conditions auto:phase1_2021_realistic \
  -s RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM \
  -n 100 \
  --filein file:f6888c6e-1fe6-413e-b2b5-e54ff3a4fe2b.root \
  --eventcontent RECOSIM,DQM \
  --datatier GEN-SIM-RECO,DQMIO \
  --customise RecoPixelVertexing/Configuration/customizePixelTracksSoAonCPU.customizePixelTracksSoAonCPU,RecoPixelVertexing/Configuration/customizePixelTracksSoAonCPU.customizePixelTracksForTriplets \
  --nThreads 8 \
  --no_exec
  
cmsRun step3_RAW2DIGI_RECO_VALIDATION_DQM.py

results in

%MSG-i ThreadStreamSetup:  (NoModuleName) 28-Nov-2020 17:40:25 CET pre-events
setting # threads 8
setting # streams 8
%MSG
28-Nov-2020 17:40:34 CET  Initiating request to open file file:f6888c6e-1fe6-413e-b2b5-e54ff3a4fe2b.root
28-Nov-2020 17:40:40 CET  Successfully opened file file:f6888c6e-1fe6-413e-b2b5-e54ff3a4fe2b.root

...

Begin processing the 1st record. Run 1, Event 7405, LumiSection 75 on stream 4 at 28-Nov-2020 17:41:01.137 CET
Begin processing the 2nd record. Run 1, Event 7402, LumiSection 75 on stream 3 at 28-Nov-2020 17:41:01.160 CET
Begin processing the 3rd record. Run 1, Event 7404, LumiSection 75 on stream 2 at 28-Nov-2020 17:41:01.162 CET
Begin processing the 4th record. Run 1, Event 7407, LumiSection 75 on stream 5 at 28-Nov-2020 17:41:01.164 CET
Begin processing the 5th record. Run 1, Event 7403, LumiSection 75 on stream 7 at 28-Nov-2020 17:41:01.167 CET
Begin processing the 6th record. Run 1, Event 7408, LumiSection 75 on stream 1 at 28-Nov-2020 17:41:01.169 CET
Begin processing the 7th record. Run 1, Event 7401, LumiSection 75 on stream 0 at 28-Nov-2020 17:41:01.172 CET
Begin processing the 8th record. Run 1, Event 7406, LumiSection 75 on stream 6 at 28-Nov-2020 17:41:01.174 CET


A fatal system signal has occurred: segmentation violation
The following is the call stack containing the origin of the signal.

The full report is attached: crash.log.

@fwyzard fwyzard added bug Pixels Pixels-related developments labels Nov 28, 2020
@VinInn
Copy link

VinInn commented Nov 29, 2020

this is suspicious (TLS)

#4  0x00007f9b5301d947 in _dl_update_slotinfo () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f9b5300c098 in update_get_addr () from /lib64/ld-linux-x86-64.so.2
#6  0x00007f9b530229f8 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2

need to verify if shows up consistently...

@VinInn
Copy link

VinInn commented Nov 29, 2020

confirmed (only if running from local file, from xrootd does not)
in my case no TLS

@VinInn
Copy link

VinInn commented Nov 29, 2020

minimal recompiled with -g
run 5 times under gdb no crash
managed to to crash it only once w/o gdb...

@VinInn
Copy link

VinInn commented Nov 29, 2020

eventually crashed

hread 3 "cmsRun" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff879ff700 (LWP 219860)]
GPUCACell::init (outerHitId=4014, innerHitId=32, doubletId=0, layerPairId=0, hh=..., cellTracks=..., cellNeighbors=..., this=0x7fff35b02780)
    at /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h:107
107	  __device__ __forceinline__ CellNeighbors& outerNeighbors() { return *theOuterNeighbors; }
(gdb) where
#0  GPUCACell::init (outerHitId=4014, innerHitId=32, doubletId=0, layerPairId=0, hh=..., cellTracks=..., cellNeighbors=..., this=0x7fff35b02780)
    at /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h:107
#1  gpuPixelDoublets::doubletsFromHisto (maxNumOfDoublets=524288, doPtCut=true, doZ0Cut=true, doClusterCut=true, ideal_cond=true, maxr=0x7fff7921fce0 <gpuPixelDoublets::maxr>,
    maxz=0x7fff7921fd40 <gpuPixelDoublets::maxz>, minz=0x7fff7921fda0 <gpuPixelDoublets::minz>, phicuts=0x7fff7921fe00 <gpuPixelDoublets::phicuts>, isOuterHitOfCell=0x7fff9e603880, hh=...,
    cellTracks=0x7fff85f49880, cellNeighbors=0x7fff85f49870, nCells=0x7fff85ff9950, cells=0x7fff35b02780, nPairs=<optimized out>, layerPairs=0x7fff7921fe40 <gpuPixelDoublets::layerPairs> "")
    at /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/gpuPixelDoubletsAlgos.h:226
#2  gpuPixelDoublets::getDoubletsFromHisto (maxNumOfDoublets=524288, doPtCut=true, doZ0Cut=true, doClusterCut=true, ideal_cond=true, nActualPairs=<optimized out>, isOuterHitOfCell=0x7fff9e603880,
    hhp=0x7fff5d09d360, cellTracks=0x7fff85f49880, cellNeighbors=0x7fff85f49870, nCells=0x7fff85ff9950, cells=0x7fff35b02780)
    at /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/gpuPixelDoublets.h:109
#3  CAHitNtupletGeneratorKernels<cms::cudacompat::CPUTraits>::buildDoublets (this=this@entry=0x7fff879f7900, hh=..., stream=stream@entry=0x0)
    at /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc:56
#4  0x00007fff791c6682 in CAHitNtupletGeneratorOnGPU::makeTuples (this=this@entry=0x7fffb9b9b020, hits_d=..., bfield=0.0114256972)
    at /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorOnGPU.cc:209
#5  0x00007fff791b3c7e in CAHitNtupletCUDA::produce (this=0x7fffb9b9b000, streamID=..., iEvent=..., es=...)
``

will add assert 

@VinInn
Copy link

VinInn commented Nov 29, 2020

cmsRun: /home/innocent/ORI/CMSSW_11_2_0_pre10_Patatrack/src/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h:61: void GPUCACell::init(GPUCACell::CellNeighborsVector&, GPUCACell::CellTracksVector&, const Hits&, int, int, GPUCACell::hindex_type, GPUCACell::hindex_type): Assertion `theOuterNeighbors' failed.

the arrays are not initialized on CPU?

@VinInn
Copy link

VinInn commented Nov 29, 2020

cudaCompat issue
Assertion `0==blockIdx.x*blockDim.x + threadIdx.x' failed.

@VinInn
Copy link

VinInn commented Nov 29, 2020

fixed
in principle one needs to "resetGrid" before each kernel.
in practice only where needed (if one knows what is doing and in which order the kernel are called)
the latter is more difficult to control....

this is the patch with all assert in place

[innocent@patatrack02 src]$ git diff
diff --git a/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc b/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc
index 1646cb503ff..7a55e73ecd1 100644
--- a/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc
+++ b/RecoPixelVertexing/PixelTriplets/plugins/CAHitNtupletGeneratorKernels.cc
@@ -12,6 +12,8 @@ void CAHitNtupletGeneratorKernelsCPU::fillHitDetIndices(HitsView const *hv, TkSo

 template <>
 void CAHitNtupletGeneratorKernelsCPU::buildDoublets(HitsOnCPU const &hh, cudaStream_t stream) {
+
+  resetGrid();
   auto nhits = hh.nHits();

 #ifdef NTUPLE_DEBUG
@@ -31,7 +33,7 @@ void CAHitNtupletGeneratorKernelsCPU::buildDoublets(HitsOnCPU const &hh, cudaStr
   device_theCellTracksContainer_ =
       (GPUCACell::CellTracks *)(cellStorage_.get() +
                                 CAConstants::maxNumOfActiveDoublets() * sizeof(GPUCACell::CellNeighbors));
-
+  assert(0==blockIdx.x*blockDim.x + threadIdx.x);
   gpuPixelDoublets::initDoublets(device_isOuterHitOfCell_.get(),
                                  nhits,
                                  device_theCellNeighbors_.get(),
@@ -39,6 +41,7 @@ void CAHitNtupletGeneratorKernelsCPU::buildDoublets(HitsOnCPU const &hh, cudaStr
                                  device_theCellTracks_.get(),
                                  device_theCellTracksContainer_);

+  assert(!(*device_theCellNeighbors_).empty());
   // device_theCells_ = Traits:: template make_unique<GPUCACell[]>(cs, m_params.maxNumberOfDoublets_, stream);
   device_theCells_.reset((GPUCACell *)malloc(sizeof(GPUCACell) * m_params.maxNumberOfDoublets_));
   if (0 == nhits)
diff --git a/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h b/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h
index e913b77fe09..f3b33bf47d6 100644
--- a/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h
+++ b/RecoPixelVertexing/PixelTriplets/plugins/GPUCACell.h
@@ -58,7 +58,9 @@ public:

     // link to default empty
     theOuterNeighbors = &cellNeighbors[0];
+    assert(theOuterNeighbors);
     theTracks = &cellTracks[0];
+    assert(theTracks);
     assert(outerNeighbors().empty());
     assert(tracks().empty());
   }
@@ -76,6 +78,7 @@ public:
                   (ptrAsInt)(&cellNeighbors[i]));  // if fails we cannot give "i" back...
 #else
         theOuterNeighbors = &cellNeighbors[i];
+       assert(theOuterNeighbors);
 #endif
       } else
         return -1;
@@ -94,6 +97,7 @@ public:
         atomicCAS((ptrAsInt*)(&theTracks), zero, (ptrAsInt)(&cellTracks[i]));  // if fails we cannot give "i" back...
 #else
         theTracks = &cellTracks[i];
+       assert(theTracks);
 #endif
       } else
         return -1;
@@ -102,10 +106,10 @@ public:
     return tracks().push_back(t);
   }

-  __device__ __forceinline__ CellTracks& tracks() { return *theTracks; }
-  __device__ __forceinline__ CellTracks const& tracks() const { return *theTracks; }
-  __device__ __forceinline__ CellNeighbors& outerNeighbors() { return *theOuterNeighbors; }
-  __device__ __forceinline__ CellNeighbors const& outerNeighbors() const { return *theOuterNeighbors; }
+  __device__ __forceinline__ CellTracks& tracks() {       assert(theTracks); return *theTracks; }
+  __device__ __forceinline__ CellTracks const& tracks() const {       assert(theTracks); return *theTracks; }
+  __device__ __forceinline__ CellNeighbors& outerNeighbors() {        assert(theOuterNeighbors); return *theOuterNeighbors; }
+  __device__ __forceinline__ CellNeighbors const& outerNeighbors() const { assert(theOuterNeighbors); return *theOuterNeighbors; }
   __device__ __forceinline__ float get_inner_x(Hits const& hh) const { return hh.xGlobal(theInnerHitId); }
   __device__ __forceinline__ float get_outer_x(Hits const& hh) const { return hh.xGlobal(theOuterHitId); }
   __device__ __forceinline__ float get_inner_y(Hits const& hh) const { return hh.yGlobal(theInnerHitId); }

@fwyzard
Copy link
Author

fwyzard commented Nov 29, 2020

OK, I admit I'm confused: do we ever use a gridDim different from {1, 1, 1} in compatibility mode on the CPU ?

@VinInn
Copy link

VinInn commented Nov 29, 2020

I think yes (clustering?). When the blockId has a specific meaning (such as detectors)
And in principle is reset afterward.
Of course is possible to change the code and make sure the "loop" on detectors in inside the kernel.
I think I realize that at some point and was waiting integration was over to change those kernel to be fully "sequential" compatible as well (so not to depend to the grid size)

Why is crashing now: no clue. Maybe streams and threads are not one-to-one.

@VinInn
Copy link

VinInn commented Nov 29, 2020

here you see

RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc:    cms::cudacompat::resetGrid();

you are right. on CPU we DO NOT run the patatrack-clusterizer....
I am confused as well. Also because it is called in RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitSoAFromLegacy.cc

@fwyzard
Copy link
Author

fwyzard commented Nov 29, 2020

Why is crashing now: no clue. Maybe streams and threads are not one-to-one.

They are not guaranteed to be, no.

We also had rare cases where TBB after a while "retires" a worker thread, and spawns a new one; if the thread_local variable are initialised only at the beginning of the job, they would end up being uninitialised in this case.

@VinInn
Copy link

VinInn commented Nov 29, 2020

ok, most probably we need to resetGrid() in each produce. adding to vertex as well.
We should be fully covered for the time being.

@fwyzard
Copy link
Author

fwyzard commented Nov 29, 2020

With these changes

diff --git a/HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h b/HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h
index f9b4b2f8a4c1..e8aa4cdc1b06 100644
--- a/HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h
+++ b/HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h
@@ -21,11 +21,11 @@ namespace cms {
       uint32_t x, y, z;
     };
 #endif
+
     const dim3 threadIdx = {0, 0, 0};
+    const dim3 blockIdx = {0, 0, 0};
     const dim3 blockDim = {1, 1, 1};
-
-    extern thread_local dim3 blockIdx;
-    extern thread_local dim3 gridDim;
+    const dim3 gridDim = {1, 1, 1};
 
     template <typename T1, typename T2>
     T1 atomicCAS(T1* address, T1 compare, T2 val) {
@@ -78,10 +78,7 @@ namespace cms {
       return *x;
     }
 
-    inline void resetGrid() {
-      blockIdx = {0, 0, 0};
-      gridDim = {1, 1, 1};
-    }
+    inline void resetGrid() {}
 
   }  // namespace cudacompat
 }  // namespace cms
diff --git a/HeterogeneousCore/CUDAUtilities/src/cudaCompat.cc b/HeterogeneousCore/CUDAUtilities/src/cudaCompat.cc
index 7b8efda8e381..0b94c8f1d4b8 100644
--- a/HeterogeneousCore/CUDAUtilities/src/cudaCompat.cc
+++ b/HeterogeneousCore/CUDAUtilities/src/cudaCompat.cc
@@ -1,12 +1,5 @@
 #include "HeterogeneousCore/CUDAUtilities/interface/cudaCompat.h"
 
-namespace cms {
-  namespace cudacompat {
-    thread_local dim3 blockIdx;
-    thread_local dim3 gridDim;
-  }  // namespace cudacompat
-}  // namespace cms
-
 namespace {
   struct InitGrid {
     InitGrid() { cms::cudacompat::resetGrid(); }
diff --git a/RecoPixelVertexing/PixelVertexFinding/plugins/gpuVertexFinderImpl.h b/RecoPixelVertexing/PixelVertexFinding/plugins/gpuVertexFinderImpl.h
index 0da24cef219e..987b0af91dbd 100644
--- a/RecoPixelVertexing/PixelVertexFinding/plugins/gpuVertexFinderImpl.h
+++ b/RecoPixelVertexing/PixelVertexFinding/plugins/gpuVertexFinderImpl.h
@@ -157,8 +157,8 @@ namespace gpuVertexFinder {
     // std::cout << "found " << (*ws_d).nvIntermediate << " vertices " << std::endl;
     fitVertices(soa, ws_d.get(), 50.);
     // one block per vertex!
-    blockIdx.x = 0;
-    gridDim.x = 1;
+    assert(blockIdx.x == 0);
+    assert(gridDim.x == 1);
     splitVertices(soa, ws_d.get(), 9.f);
     resetGrid();
     fitVertices(soa, ws_d.get(), 5000.);

all src and plugins build fine.
There is one test that actually uses a grid of different size, RecoLocalTracker/SiPixelClusterizer/test/gpuClustering_t.h, but it also says

gridDim.x = MaxNumModules;  //not needed in the kernel for this specific case;

So... would it be OK to simple make the gridDim constant, equal to {1, 1, 1} (and adjust the test accordingly) ?

Othwerise, since

ok, most probably we need to resetGrid() in each produce. adding to vertex as well.

would it make sense to wrap all CPU "kernel" calls in something like cms::cudacompat::launch(...) that would take care of setting the grids and blocks properly ?

@VinInn
Copy link

VinInn commented Nov 29, 2020

Once I did wrap every kernel in a modified version of "your" launch and modified all drivers (cu and cc) and many of them were identical at that point.
We did not agreed that was the time and the way to do it

@VinInn
Copy link

VinInn commented Nov 29, 2020

For what concern the Clusterizer, as I said, the kernel must be modified to be independent from the grid size.
Once done there is no need to play with the blockId even there and indeed WE (well the pixel code) can just run with the cudaCompat you propose.

@fwyzard
Copy link
Author

fwyzard commented Nov 29, 2020

Once I did wrap every kernel in a modified version of "your" launch and modified all drivers (cu and cc) and many of them were identical at that point.

Yes; I that that was #428 ?

We did not agreed that was the time and the way to do it

About the time: I'd still rather do it after the integration upstream, now that it's finally getting close to happening.
About the way: I'd prefer to keep cms::cuda::launch() CUDA-only, and implement a separate cms::cudautils::launch() that calls cms::cuda::launch() for CUDA, or the CPU variant for a CPU-only case.
The reason being to make it easier to transition to something else later (be it Alpaka, Kokkos, SYCL, etc.).

For what concern the Clusterizer, as I said, the kernel must be modified to be independent from the grid size.

I take your word for it - I just don't find where the grid size is passed to the cpu kernel(s) ?

@VinInn
Copy link

VinInn commented Nov 29, 2020

this code

   if (blockIdx.x >= moduleStart[0])
      return;

    auto firstPixel = moduleStart[1 + blockIdx.x];

depends on the grid size

instead it should loop as we loop for the threadIdx

so on cpu (in the test) we are forced to loop in the driver....

    gridDim.x = MaxNumModules;  // no needed in the kernel for in this specific case
    assert(blockIdx.x == 0);
    for (; blockIdx.x < gridDim.x; ++blockIdx.x)
      clusterChargeCut(
          h_id.get(), h_adc.get(), h_moduleStart.get(), h_clusInModule.get(), h_moduleId.get(), h_clus.get(), n);
    resetGrid();

@fwyzard
Copy link
Author

fwyzard commented Nov 29, 2020

OK, I see.

So

  • the kernel(s) in RecoLocalTracker/SiPixelClusterizer/plugins/gpuClusterChargeCut.h require setting the grid size depending on the number of modules, and cannot run with the simple {1,1,1} grid size
  • they are exercised by tests in RecoLocalTracker/SiPixelClusterizer/test/gpuClustering_t.h, which then rely on being able to set the grid size and loop over the blocks when running on the CPU
  • in CMSSW, they are only used on the GPU, from RecoLocalTracker/SiPixelClusterizer/plugins/SiPixelRawToClusterGPUKernel.cu
  • in CMSSW, they are not used on the CPU

?

Do you think we can change the kernels to run with the {1, 1, 1) grid, or that would bring any downsides (performance, etc.) ?

If the kernels can be changes to work with the {1, 1, 1} grid, my preferences would be

  • update the kernels in the Patatrack branch
  • change the cudacompat code to use a constant, fixed grid size of {1, 1, 1}
  • integrate in CMSSW
  • after that, re-introduce the variable grid size, and extend cudacompat with a user friendly launch function that takes care of setting the grid size, etc.

Otherwise (if the kernels cannot be made to use a {1, 1, 1} grid size without loss of performance or other downsides), my preference would be:

@VinInn, which of these two, or what other option, would you prefer ?

@makortel since this touches also on the more "core" part of the compatibility layer, do you have any opinions ?

Second question:

  • should we change the CPU workflow to make use of them?
  • if so, is it something we should do before or after the integration?

@tsusa @mmusich do you have any opinions about this last point ?

@VinInn
Copy link

VinInn commented Nov 30, 2020

I think the best option is to adopt the modification you (@fwyzard) propose.
I was planning to change the clusterizer kernels anyway to make it independent on grid setting (as all other kernel in pixel code).

One possibility is to integrate your changes and "inhibit" the test ( /* code */) with a clear comment that need to be fixed.
Then Fix it after integration.

The clusterizer was never integrated in CPU workflows as it requires the Raw2dDgi to be ported first (and that was supposed to happen after integration to benefit of a coherent integration of Legacy and SoA code)

@VinInn
Copy link

VinInn commented Nov 30, 2020

If decision is to change clusterizer kernel now I can work on that this week (provided all other changes to LocalReco had been already integrated: I do not want to run in merging issues)

@fwyzard
Copy link
Author

fwyzard commented Nov 30, 2020

Looking at the open PRs (https://github.com/cms-patatrack/cmssw/pulls/) I don't think there should be conflicts, but I'm not 100% sure.

So I, if I understood correctly your comments, I think we could

  1. make a PR that sets the cudacompat grid be a {1, 1, 1} constant size and comments out the test
  2. make a separate PR that changes the kernels to run with any grid size and re-enabled the test

The first should happen before the integration (I can do it tomorrow or Wednesday).
The secondcan happen before or after, depending on the timeline.

@VinInn
Copy link

VinInn commented Nov 30, 2020

@fwyzard
+1

@fwyzard
Copy link
Author

fwyzard commented Nov 30, 2020

Thanks to long DAQ meeting, the first part should be done by #586 .

@VinInn
Copy link

VinInn commented Nov 30, 2020

clusterizer fixed in #588 .
the 4 line modification is completely swamped by the code-format re-indentation.

@makortel
Copy link

since this touches also on the more "core" part of the compatibility layer, do you have any opinions ?

#586 looks good to me

@fwyzard
Copy link
Author

fwyzard commented Dec 3, 2020

Fixed by #586 and #588 .

@fwyzard fwyzard closed this as completed Dec 3, 2020
@fwyzard fwyzard added the fixed label Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fixed Pixels Pixels-related developments
Projects
None yet
Development

No branches or pull requests

3 participants