Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unified memory for conditions #157

Open
wants to merge 12 commits into
base: CMSSW_10_4_X_Patatrack
Choose a base branch
from

Conversation

makortel
Copy link

@makortel makortel commented Sep 3, 2018

This PR experiments using unified memory for conditions. It adds a helper class CUDAESManaged to simplify calling the cudaMemAdvise(..., cudaMemAdviseSetReadMostly, 0) and cudaMemPrefetchAsync(...) to all allocated buffers.

For the CPE and the cabling map it also experiments passing a struct of GPU pointers to the kernel instead of a GPU pointer to a struct of GPU pointers.

It also adds CUDAManagedAllocator and CUDAManagedVector<T> because I thought first that I'd use them, but in the end didn't.

I have not done a detailed performance evaluation wrt. the current state.

@cmsbot
Copy link

cmsbot commented Sep 3, 2018

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

CalibTracker/SiPixelESProducers
HeterogeneousCore/CUDACore
HeterogeneousCore/CUDAUtilities
RecoLocalTracker/SiPixelClusterizer
RecoLocalTracker/SiPixelRecHits

The following packages do not have a category, yet:

HeterogeneousCore/CUDACore
HeterogeneousCore/CUDAUtilities
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

@cmsbot
Copy link

cmsbot commented Sep 11, 2018

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

4 similar comments
@cmsbot
Copy link

cmsbot commented Sep 13, 2018

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

@cmsbot
Copy link

cmsbot commented Sep 13, 2018

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

@cmsbot
Copy link

cmsbot commented Sep 13, 2018

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

@cmsbot
Copy link

cmsbot commented Sep 13, 2018

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

@fwyzard
Copy link

fwyzard commented Sep 13, 2018

@makortel @VinInn can you double check if the merge looks good ?

@makortel, feel free to squash away my "merge" commits, of even just rebase on top ofthe current HEAD.

@fwyzard
Copy link

fwyzard commented Sep 13, 2018

Reference

Throughput over 1000 events:

  • mean: 629 ± 42 ev/s
  • best: 644 ± 31 ev/s

Top 10 contribution to GPU usage:

  Time(%)      Time     Calls       Avg       Min       Max  Name
   51.96%  855.55ms      1200  712.96us  200.22us  4.5703ms  gpuClustering::findClus(...)
   21.42%  352.71ms      1200  293.92us  100.48us  720.02us  gpuPixelDoublets::getDoubletsFromHisto(...)
    8.07%  132.86ms      1200  110.72us  27.552us  385.02us  kernel_connect(...)
    7.78%  128.09ms      1200  106.74us  19.744us  248.41us  kernel_find_ntuplets(...)
    3.65%   60.16ms      1200  50.131us  39.583us  64.383us  gpuPixelRecHits::getHits(...)
    2.63%   43.26ms      6028  7.1760us  1.2160us  519.58us  [CUDA memcpy HtoD]
    1.29%   21.26ms      1200  17.712us  5.4720us  56.575us  kernel_checkOverflows(...)
    0.68%   11.14ms      1200  9.2860us  5.9200us  34.079us  pixelgpudetails::RawToDigi_kernel(...)
    0.56%    9.18ms      6014  1.5250us  1.1200us  18.079us  [CUDA memset]
    0.41%    6.71ms      1200  5.5910us  3.4880us  8.8960us  void cub::DeviceScanKernel<cub::DispatchScan<unsigned int*, unsigned int*, cub::Sum, cub::NullType, int>::PtxAgentScanPolic...

Pull request #157,

Throughput over 1000 events:

  • mean: 613 ± 29 ev/s
  • best: 614 ± 28 ev/s
Top 10 contribution to GPU usage:
  Time(%)      Time     Calls       Avg       Min       Max  Name
   52.51%  851.08ms      1200  709.24us  217.28us  4.5735ms  gpuClustering::findClus(...)
   21.78%  352.96ms      1200  294.13us  97.758us  709.34us  gpuPixelDoublets::getDoubletsFromHisto(...)
    8.18%  132.61ms      1200  110.50us  26.432us  372.38us  kernel_connect(...)
    7.62%  123.46ms      1200  102.89us  23.455us  236.16us  kernel_find_ntuplets(...)
    2.93%   47.57ms      1200  39.644us  29.951us  59.328us  gpuPixelRecHits::getHits(...)
    2.54%   41.21ms      6014  6.8510us  1.2470us  45.248us  [CUDA memcpy HtoD]
    1.30%   21.11ms      1200  17.590us  6.5920us  55.231us  kernel_checkOverflows(...)
    0.58%    9.37ms      1200  7.8040us  4.1600us  26.175us  pixelgpudetails::RawToDigi_kernel(...)
    0.56%    9.05ms      6014  1.5040us  1.1200us  11.872us  [CUDA memset]
    0.40%    6.56ms      1200  5.4680us  3.4560us  8.7680us  void cub::DeviceScanKernel<cub::DispatchScan<unsigned int*, unsigned int*, cub::Sum, cub::NullType, int>::PtxAgentScanPolic...

@fwyzard
Copy link

fwyzard commented Sep 13, 2018

While the contribution of the individual kernels does not seem to change, there seems to be an overall degradation of performance.

I think it should be addressed - or at least understood - before merging.

@cmsbot
Copy link

cmsbot commented Sep 14, 2018

Pull request #157 was updated. @cmsbot, @fwyzard can you please check and sign again.

@fwyzard fwyzard changed the base branch from CMSSW_10_2_X_Patatrack to CMSSW_10_4_X_Patatrack November 15, 2018 08:30
@fwyzard fwyzard modified the milestone: CMSSW_10_4_X_Patatrack Nov 15, 2018
@makortel
Copy link
Author

Rebased on top of CMSSW_10_4_0_pre4_Patatrack. Note that so far I've only compiled it, and didn't test running.

VinInn and others added 5 commits January 8, 2019 18:33
…-sw#216)

Port and optimise the full workflow from pixel raw data to pixel tracks and vertices to GPUs.
Clean the pixel n-tuplets with the "fishbone" algorithm (only on GPUs).

Other changes:
  - recover the Riemann fit updates lost during the merge with CMSSW 10.4.x;
  - speed up clustering and track fitting;
  - minor bug fix to avoid trivial regression with the optimized fit.
@makortel
Copy link
Author

makortel commented Jan 8, 2019

Rebased on top of CMSSW_10_4_0_pre4_Patatrack to fix conflicts. Note that so far I've only compiled it, and didn't test running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants