-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pixel Local GPU reco crashes on missing detector input (no Pixel FED) #34496
Comments
A new Issue was created by @czangela . @Dr15Jones, @perrotta, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
assign reconstruction, heterogeneous |
FYI @cms-sw/trk-dpg-l2 @VinInn |
Interesting as it WAS protected for |
with this diff --git a/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitFromCUDA.cc b/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitFromCUDA.cc
index f2f1497b4ba..5861a0be734 100644
--- a/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitFromCUDA.cc
+++ b/RecoLocalTracker/SiPixelRecHits/plugins/SiPixelRecHitFromCUDA.cc
@@ -74,7 +74,7 @@ void SiPixelRecHitFromCUDA::acquire(edm::Event const& iEvent,
nHits_ = inputData.nHits();
- LogDebug("SiPixelRecHitFromCUDA") << "converting " << nHits_ << " Hits";
+ LogDebug("SiPixelRecHitFromCUDA") << "converting " << nHits_ << " Hits" << std::endl;
if (0 == nHits_)
return;
@@ -83,18 +83,21 @@ void SiPixelRecHitFromCUDA::acquire(edm::Event const& iEvent,
}
void SiPixelRecHitFromCUDA::produce(edm::Event& iEvent, edm::EventSetup const& es) {
+
// allocate a buffer for the indices of the clusters
auto hmsp = std::make_unique<uint32_t[]>(gpuClustering::maxNumModules + 1);
- std::copy(hitsModuleStart_.get(), hitsModuleStart_.get() + gpuClustering::maxNumModules + 1, hmsp.get());
- // wrap the buffer in a HostProduct, and move it to the Event, without reallocating the buffer or affecting hitsModuleStart
- iEvent.emplace(hostPutToken_, std::move(hmsp));
SiPixelRecHitCollection output;
+ output.reserve(gpuClustering::maxNumModules, nHits_);
if (0 == nHits_) {
iEvent.emplace(rechitsPutToken_, std::move(output));
+ iEvent.emplace(hostPutToken_, std::move(hmsp));
return;
}
- output.reserve(gpuClustering::maxNumModules, nHits_);
+
+ std::copy(hitsModuleStart_.get(), hitsModuleStart_.get() + gpuClustering::maxNumModules + 1, hmsp.get());
+ // wrap the buffer in a HostProduct, and move it to the Event, without reallocating the buffer or affecting hitsModuleStart
+ iEvent.emplace(hostPutToken_, std::move(hmsp));
auto xl = store32_.get();
auto yl = xl + nHits_; it runs for me. |
one can leave the reserve after the return; (very very minor). still: was tested (long long ago) with nHits_==0; and the history has been erased by the file renaming. So no way to understand when and why was changed. |
ok found a version in CMSSW_11_1_0_pre8_Patatrack when was named SiPixelRecHitFromSOA.cc and the code is the same. |
maybe easier to just protect the copy
|
+heterogeneous PRs to master and 11_3_X have been merged |
This issue is fully signed and ready to be closed. |
1. Description
Similar to #34197.
The idea here was to remove all
FED channels
above1199
*, and run the reconstruction on this skimmed raw data.This was run on the release
CMSSW_12_0_0_pre3
, and machinecmg-gpu1080.cern.ch
.[*] where
1200
is the minimumFED
number for the silicon pixel detector.2. Crash
The reconstruction crashes with a
segmentation fault
:Full log:
crash.log
3. Reproduce - Short version
From https://aczirkos.web.cern.ch/aczirkos/pixel_crash_test/ run on the provided dataset:
4. Reproduce - Long version
0. SSH to GPU equipped machine
Don't forget to be nice and
Where P, Q, etc. are the numbers of the visible GPUs, which you can view with
nvidia-smi
.Init release area
CMSSW_12_0_0_pre3
.1. generate configs and run
Use
pixelTrackingOnly
workflow:136.885502_RunHLTPhy2018D+RunHLTPhy2018D+HLTDR2_2018+RECODR2_2018reHLT_Patatrack_PixelOnlyGPU+HARVEST2018_pixelTrackingOnly
2. modifiy step3_RAW2DIGI_RECO_DQM.py
-> add a new module named rawDataCollector to the beginning of the Schedule -> modules after this will see and use this collection
Path, sequence, task definition:
Add to schedule
3.
sed
and replacerawDataCollector
InputTagsrun again
The text was updated successfully, but these errors were encountered: