Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to CUDA 11.4.4 #7641

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Feb 22, 2022

Update to CUDA 11.4.4:

  • CUDA runtime version 11.4.148
  • NVIDIA drivers version 470.82.01

Fixes various Apache Log4J vulnerabilities.

See https://docs.nvidia.com/cuda/archive/11.4.4/cuda-toolkit-release-notes/index.html for the full CUDA 11.4.x release notes and change log.

Update to CUDA 11.4.4:
  * CUDA runtime version 11.4.148
  * NVIDIA drivers version 470.82.01

Fixes various Apache Log4J vulnerabilities.

See https://docs.nvidia.com/cuda/archive/11.4.4/cuda-toolkit-release-notes/index.html
for the full CUDA 11.4.x release notes and change log.
@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 22, 2022

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 22, 2022

please test

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_12_3_X/master.

@smuzaffar, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 22, 2022

@smuzaffar, I don't know where Log4J is used in the CUDA tools, but should we backport this to 12.2.x as well ?

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 22, 2022

type bugfix

@smuzaffar
Copy link
Contributor

smuzaffar commented Feb 22, 2022

@fwyzard Looking at https://nvidia.custhelp.com/app/answers/detail/a_id/5294 , the only product which are affected by this are NETQ and "VGPU SOFTWARE LICENSE SERVER" . I think we are not using any of these so no need to backport

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b874d9/22578/summary.html
COMMIT: 9a8bc1e
CMSSW: CMSSW_12_3_X_2022-02-22-1100/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7641/22578/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 0 differences found in the comparisons
  • Reco comparison had 3 failed jobs
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19811
  • DQMHistoTests: Total failures: 10
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 19801
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 3 / 3 workflows

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3965137
  • DQMHistoTests: Total failures: 10979
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3954135
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 10144.568 KiB( 48 files compared)
  • DQMHistoSizes: changed ( 11634.0,... ): 1690.762 KiB HLT/Muon
  • DQMHistoSizes: changed ( 312.0 ): -0.004 KiB MessageLogger/Warnings
  • Checked 204 log files, 45 edm output root files, 49 DQM output files
  • TriggerResults: found differences in 7 / 48 workflows

@smuzaffar
Copy link
Contributor

please test for slc7_aarch64_gcc11

@smuzaffar
Copy link
Contributor

please test for slc7_ppc64le_gcc11

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b874d9/22600/summary.html
COMMIT: 9a8bc1e
CMSSW: CMSSW_12_3_X_2022-02-21-2300/slc7_ppc64le_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7641/22600/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test testFWCoreUtilities had ERRORS
---> test DRNTest had ERRORS
---> test materialBudgetTrackerPlots had ERRORS

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b874d9/22601/summary.html
COMMIT: 9a8bc1e
CMSSW: CMSSW_12_3_X_2022-02-21-2300/slc7_aarch64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7641/22601/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation warning when building: See details on the summary page.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests RelVals AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b874d9/22624/summary.html
COMMIT: 9a8bc1e
CMSSW: CMSSW_12_3_X_2022-02-22-2300/slc7_aarch64_gcc11
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/7641/22624/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b874d9/22624/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b874d9/22624/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test testFWCoreUtilities had ERRORS
---> test TestFWCoreServicesDriver had ERRORS
---> test test_edmPickEvents had ERRORS
---> test TestDQMOnlineClient-visualization_secondInstance had ERRORS
and more ...

RelVals

----- Begin Fatal Exception 24-Feb-2022 00:35:07 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 194533 lumi: 329 event: 462355458 stream: 0
   [1] Running path 'dqmofflineOnPAT_1_step'
   [2] Prefetching for module SingleTopTChannelLeptonDQM_miniAOD/'singleTopElectronMediumDQM_miniAOD'
   [3] Prefetching for module PATMuonSlimmer/'slimmedMuons'
   [4] Prefetching for module PATMuonSelector/'selectedPatMuons'
   [5] Prefetching for module PATMuonProducer/'patMuons'
   [6] Prefetching for module MuonProducer/'muons'
   [7] Prefetching for module PFProducer/'particleFlowTmp'
   [8] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [9] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [10] Prefetching for module PFConversionProducer/'pfConversions'
   [11] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 24-Feb-2022 00:48:36 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 326479 lumi: 7 event: 1579493 stream: 0
   [1] Running path 'dqmoffline_8_step'
   [2] Prefetching for module SMPDQM/'SMPDQM'
   [3] Prefetching for module MuonProducer/'muons'
   [4] Prefetching for module PFProducer/'particleFlowTmp'
   [5] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [6] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [7] Prefetching for module PFConversionProducer/'pfConversions'
   [8] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 24-Feb-2022 01:04:07 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 319450 lumi: 76 event: 106007323 stream: 0
   [1] Running path 'dqmoffline_10_step'
   [2] Prefetching for module SMPDQM/'SMPDQM'
   [3] Prefetching for module MuonProducer/'muons'
   [4] Prefetching for module PFProducer/'particleFlowTmp'
   [5] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [6] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [7] Prefetching for module PFConversionProducer/'pfConversions'
   [8] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------

AddOn Tests

----- Begin Fatal Exception 23-Feb-2022 23:12:09 CET-----------------------
An exception of category 'Vertex' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 65 stream: 0
   [1] Running path 'prevalidation_step'
   [2] Prefetching for module MultiTrackValidator/'trackValidator'
   [3] Prefetching for module JetTracksAssociationToTrackRefs/'cutsRecoTracksAK4PFJets'
   [4] Prefetching for module JetTracksAssociatorExplicit/'ak4JetTracksAssociatorExplicitAll'
   [5] Prefetching for module FastjetJetProducer/'ak4PFJets'
   [6] Prefetching for module PFLinker/'particleFlow'
   [7] Prefetching for module PFProducer/'particleFlowTmp'
   [8] Prefetching for module PFBlockProducer/'particleFlowBlock'
   [9] Prefetching for module PFElecTkProducer/'pfTrackElec'
   [10] Prefetching for module PFConversionProducer/'pfConversions'
   [11] Calling method for module ConversionProducer/'allConversions'
Exception Message:
Refitted track not found in list
----- End Fatal Exception -------------------------------------------------

@smuzaffar
Copy link
Contributor

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_12_3_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard fwyzard deleted the IB/CMSSW_12_3_X/master_cuda_11.4.4 branch April 1, 2022 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants