Use unified memory for conditions #157

makortel · 2018-09-03T10:50:38Z

This PR experiments using unified memory for conditions. It adds a helper class CUDAESManaged to simplify calling the cudaMemAdvise(..., cudaMemAdviseSetReadMostly, 0) and cudaMemPrefetchAsync(...) to all allocated buffers.

For the CPE and the cabling map it also experiments passing a struct of GPU pointers to the kernel instead of a GPU pointer to a struct of GPU pointers.

It also adds CUDAManagedAllocator and CUDAManagedVector<T> because I thought first that I'd use them, but in the end didn't.

I have not done a detailed performance evaluation wrt. the current state.

cmsbot · 2018-09-03T10:50:54Z

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

CalibTracker/SiPixelESProducers
HeterogeneousCore/CUDACore
HeterogeneousCore/CUDAUtilities
RecoLocalTracker/SiPixelClusterizer
RecoLocalTracker/SiPixelRecHits

The following packages do not have a category, yet:

HeterogeneousCore/CUDACore
HeterogeneousCore/CUDAUtilities
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

cmsbot · 2018-09-11T15:36:25Z

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

cmsbot · 2018-09-13T14:50:00Z

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

cmsbot · 2018-09-13T15:40:50Z

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

cmsbot · 2018-09-13T15:42:05Z

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

cmsbot · 2018-09-13T15:43:40Z

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

fwyzard · 2018-09-13T15:45:04Z

@makortel @VinInn can you double check if the merge looks good ?

@makortel, feel free to squash away my "merge" commits, of even just rebase on top ofthe current HEAD.

fwyzard · 2018-09-13T15:52:15Z

Reference

Throughput over 1000 events:

mean: 629 ± 42 ev/s
best: 644 ± 31 ev/s

Top 10 contribution to GPU usage:

  Time(%)      Time     Calls       Avg       Min       Max  Name
   51.96%  855.55ms      1200  712.96us  200.22us  4.5703ms  gpuClustering::findClus(...)
   21.42%  352.71ms      1200  293.92us  100.48us  720.02us  gpuPixelDoublets::getDoubletsFromHisto(...)
    8.07%  132.86ms      1200  110.72us  27.552us  385.02us  kernel_connect(...)
    7.78%  128.09ms      1200  106.74us  19.744us  248.41us  kernel_find_ntuplets(...)
    3.65%   60.16ms      1200  50.131us  39.583us  64.383us  gpuPixelRecHits::getHits(...)
    2.63%   43.26ms      6028  7.1760us  1.2160us  519.58us  [CUDA memcpy HtoD]
    1.29%   21.26ms      1200  17.712us  5.4720us  56.575us  kernel_checkOverflows(...)
    0.68%   11.14ms      1200  9.2860us  5.9200us  34.079us  pixelgpudetails::RawToDigi_kernel(...)
    0.56%    9.18ms      6014  1.5250us  1.1200us  18.079us  [CUDA memset]
    0.41%    6.71ms      1200  5.5910us  3.4880us  8.8960us  void cub::DeviceScanKernel<cub::DispatchScan<unsigned int*, unsigned int*, cub::Sum, cub::NullType, int>::PtxAgentScanPolic...

Pull request #157,

Throughput over 1000 events:

mean: 613 ± 29 ev/s
best: 614 ± 28 ev/s

Top 10 contribution to GPU usage:
  Time(%)      Time     Calls       Avg       Min       Max  Name
   52.51%  851.08ms      1200  709.24us  217.28us  4.5735ms  gpuClustering::findClus(...)
   21.78%  352.96ms      1200  294.13us  97.758us  709.34us  gpuPixelDoublets::getDoubletsFromHisto(...)
    8.18%  132.61ms      1200  110.50us  26.432us  372.38us  kernel_connect(...)
    7.62%  123.46ms      1200  102.89us  23.455us  236.16us  kernel_find_ntuplets(...)
    2.93%   47.57ms      1200  39.644us  29.951us  59.328us  gpuPixelRecHits::getHits(...)
    2.54%   41.21ms      6014  6.8510us  1.2470us  45.248us  [CUDA memcpy HtoD]
    1.30%   21.11ms      1200  17.590us  6.5920us  55.231us  kernel_checkOverflows(...)
    0.58%    9.37ms      1200  7.8040us  4.1600us  26.175us  pixelgpudetails::RawToDigi_kernel(...)
    0.56%    9.05ms      6014  1.5040us  1.1200us  11.872us  [CUDA memset]
    0.40%    6.56ms      1200  5.4680us  3.4560us  8.7680us  void cub::DeviceScanKernel<cub::DispatchScan<unsigned int*, unsigned int*, cub::Sum, cub::NullType, int>::PtxAgentScanPolic...

fwyzard · 2018-09-13T15:53:05Z

While the contribution of the individual kernels does not seem to change, there seems to be an overall degradation of performance.

I think it should be addressed - or at least understood - before merging.

cmsbot · 2018-09-14T14:35:12Z

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

makortel · 2018-12-14T22:54:47Z

Rebased on top of CMSSW_10_4_0_pre4_Patatrack. Note that so far I've only compiled it, and didn't test running.

…-sw#216) Port and optimise the full workflow from pixel raw data to pixel tracks and vertices to GPUs. Clean the pixel n-tuplets with the "fishbone" algorithm (only on GPUs). Other changes: - recover the Riemann fit updates lost during the merge with CMSSW 10.4.x; - speed up clustering and track fitting; - minor bug fix to avoid trivial regression with the optimized fit.

makortel · 2019-01-08T19:04:37Z

Rebased on top of CMSSW_10_4_0_pre4_Patatrack to fix conflicts. Note that so far I've only compiled it, and didn't test running.

cmsbot added comparison-pending labels Sep 3, 2018

makortel mentioned this pull request Sep 3, 2018

Investigate the use of CUDA managed memory #85

Open

fwyzard removed alca-pending labels Sep 4, 2018

cmsbot added alca-pending labels Sep 11, 2018

makortel force-pushed the eventsetupUnifiedMemory branch from 81cff35 to 438fbe3 Compare September 14, 2018 14:34

fwyzard modified the milestones: CMSSW_10_4_0_pre2_Patatrack, CMSSW_10_4_X_Patatrack Nov 15, 2018

fwyzard changed the base branch from CMSSW_10_2_X_Patatrack to CMSSW_10_4_X_Patatrack November 15, 2018 08:30

fwyzard modified the milestone: CMSSW_10_4_X_Patatrack Nov 15, 2018

fwyzard mentioned this pull request Nov 27, 2018

Prepare the Patatrack branch for merging into CMSSW #200

Closed

makortel mentioned this pull request Dec 4, 2018

Fix modulesToUnpack in raw2digi #208

Merged

makortel force-pushed the eventsetupUnifiedMemory branch from 1f15f2f to cb3e122 Compare December 14, 2018 22:53

VinInn and others added 5 commits January 8, 2019 18:33

Move cuda_bad_alloc to its own header

3f30343

Add CUDAManagedAllocator

fd0f069

Add CUDAManagedVector

eb8daf4

Add CUDAESManaged

afd89f2

fwyzard modified the milestones: CMSSW_10_4_X_Patatrack, CMSSW_10_5_X_Patatrack Jan 8, 2019

makortel added 7 commits January 8, 2019 19:47

Migrate PixelCPEFast to unified memory

aba7012

Migrate SiPixelFedCablingMapGPUWrapper to unified memory

bc928df

Migrate SiPixelGainCalibrationForHLTGPU to unified memory

9a1fd14

Remove CUDAESProduct as obsolete

6b2937a

Reduce calls to cudaMemPrefetchAsync

5a160b2

Boolean flag per device

6d62b3d

Back to GPU struct of pointers

cbeb333

makortel force-pushed the eventsetupUnifiedMemory branch from cb3e122 to cbeb333 Compare January 8, 2019 19:04

fwyzard force-pushed the CMSSW_10_4_X_Patatrack branch from 59fe318 to db3e6f8 Compare January 9, 2019 14:14

fwyzard modified the milestones: CMSSW_10_5_X_Patatrack, CMSSW_10_6_X_Patatrack Mar 26, 2019

makortel mentioned this pull request May 29, 2020

[cudauvm] Move conditions to use managed memory cms-patatrack/pixeltrack-standalone#53

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use unified memory for conditions #157

Use unified memory for conditions #157

makortel commented Sep 3, 2018

cmsbot commented Sep 3, 2018

cmsbot commented Sep 11, 2018

cmsbot commented Sep 13, 2018

cmsbot commented Sep 13, 2018

cmsbot commented Sep 13, 2018

cmsbot commented Sep 13, 2018

fwyzard commented Sep 13, 2018 •

edited

Loading

fwyzard commented Sep 13, 2018

fwyzard commented Sep 13, 2018 •

edited

Loading

cmsbot commented Sep 14, 2018

makortel commented Dec 14, 2018

makortel commented Jan 8, 2019

Use unified memory for conditions #157

Are you sure you want to change the base?

Use unified memory for conditions #157

Conversation

makortel commented Sep 3, 2018

cmsbot commented Sep 3, 2018

cmsbot commented Sep 11, 2018

cmsbot commented Sep 13, 2018

cmsbot commented Sep 13, 2018

cmsbot commented Sep 13, 2018

cmsbot commented Sep 13, 2018

fwyzard commented Sep 13, 2018 • edited Loading

fwyzard commented Sep 13, 2018

Reference

Pull request #157,

fwyzard commented Sep 13, 2018 • edited Loading

cmsbot commented Sep 14, 2018

makortel commented Dec 14, 2018

makortel commented Jan 8, 2019

fwyzard commented Sep 13, 2018 •

edited

Loading

fwyzard commented Sep 13, 2018 •

edited

Loading