Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMSSW Integration of LST #45117

Open
wants to merge 132 commits into
base: master
Choose a base branch
from

Conversation

VourMa
Copy link
Contributor

@VourMa VourMa commented Jun 1, 2024

This PR integrates the LST algorithm in CMSSW. A summary of the algorithm and its scope can be found in the recent LST presentation at the Phase 2 Software days (April 2024).

The PR includes the following additions/modifications:

  • New package w/ the LST algorithm code (RecoTracker/LSTCore):
    • interface/alpaka:
      The interface exposed to CMSSW.
    • src/alpaka:
      The actual LST code.
    • standalone:
      Scripts to be used for compiling, using & testing LST outside of the full CMSSW framework.
      Not relevant for CMSSW review.
    • Minimal/at most header-only dependency of LSTCore on other CMSSW packages
      ⇒ Preserve ability to run in standalone.
  • New package w/ CMSSW modules related to LST (RecoTracker/LST):
    • interface:
      The input & output data formats for LST.
    • plugins:
      The producers:
      • Converting to/from the LST data formats (ED).
      • Loading the LST custom geometry files (ES).
      • Running LST to produce CMSSW collections (ED).
    • python:
      The configuration files needed for running LST.
    • src:
      Class definitions and ES producer supporting files.
    • test:
      Scripts for local testing
      → Dropped in favor of a proper workflow.
  • New process modifiers to test LST (changes in multiple existing packages):
    • trackingIters01:
      Runs only the first two iterations of tracking (initialStep & highPtTripletStep).
      Useful for comparisons, as LST (for now) replaces only those two tracking iterations.
    • trackingLST:
      Runs the LST algorithm instead of KalmanFilter for track building/seeding.
      The existence of the gpu process modifier defines the hardware the algorithm runs on (CPU or GPU).

There is a single change not strictly related to the above categories and a dedicated comment will be made on it.

In general, we prefer to have minimal or at most header-only dependency of LSTCore on other CMSSW packages to preserve the ability to run with standalone scripts.

This is a large PR, so we start it as an RFC with the main batch of files. In the next days, the following updates are to be expected, so that the PR can be merged:

  • Removal of test scripts and introduction of workflow.
  • Extraction of the LST data files from the proper directories (bot tests will probably not work currently).
  • Modifications to the standalone scripts → Not to be reviewed.

Goes together with cms-data/RecoTracker-LSTCore#1 (now merged).

@slava77 @ariostas


List of unresolved comments (to be updated in batches - last update: 2024/08/19):
SegmentLinking#75

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/40456

  • This PR adds an extra 788KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

A new Pull Request was created by @VourMa for master.

It involves the following packages:

  • Configuration/ProcessModifiers (operations)
  • RecoTracker/ConversionSeedGenerators (reconstruction)
  • RecoTracker/FinalTrackSelectors (reconstruction)
  • RecoTracker/IterativeTracking (reconstruction)
  • RecoTracker/LST (****)
  • RecoTracker/LSTCore (reconstruction)

The following packages do not have a category, yet:

RecoTracker/LST
Please create a PR for https://github.com/cms-sw/cms-bot/blob/master/categories_map.py to assign category

@cmsbuild, @rappoccio, @jfernan2, @davidlange6, @mandrenguyen, @fabiocos, @antoniovilela can you please review it and eventually sign? Thanks.
@VourMa, @missirol, @gpetruc, @rovere, @GiacomoSguazzoni, @VinInn, @Martin-Grunewald, @mmusich, @mtosi, @dgulhan, @JanFSchulte, @fabiocos, @felicepantaleo, @makortel this is something you requested to watch as well.
@rappoccio, @sextonkennedy, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 1, 2024

test parameters:

@slava77
Copy link
Contributor

slava77 commented Jun 1, 2024

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 1, 2024

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/summary.html
COMMIT: 891eb11
CMSSW: CMSSW_14_1_X_2024-06-01-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45117/39661/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 2 errors in the following unit tests:

---> test TestDQMOnlineClient-hlt_dqm_sourceclient had ERRORS
---> test testTrackingResolution had ERRORS

Comparison Summary

Summary:

  • You potentially added 16 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3445370
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3445344
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 206 log files, 170 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Jun 2, 2024

I found 2 errors in the following unit tests:

both are apparently related to LST
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html

@slava77
Copy link
Contributor

slava77 commented Jun 3, 2024

I found 2 errors in the following unit tests:

both are apparently related to LST https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/39661/unitTests/failed.html

An exception of category 'PluginNotFound' occurred while
   [0] Constructing the EventProcessor
Exception Message:
Unable to find plugin 'LSTModulesDevESProducer@alpaka' in category 'CMS EDM Framework ESModule'. Please check spelling of name.

it's not obvious how this dependency comes about from looking at https://github.com/cms-sw/cmssw/blob/master/DQM/TrackingMonitorSource/test/testTrackResolution_cfg.py (a Run3 test)

@makortel
do you see a clear way how the LST ES dependency makes it through here?

@mmusich
Copy link
Contributor

mmusich commented Jun 3, 2024

assign heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2024

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@makortel
Copy link
Contributor

-code-checks

did the rules change?

LLVM was updated from 17 to 18 that resulted in some changes in the formatting.

The code was developed in CMSSW_14_1_0_pre5 What would be the first pre-release with the new code-check/format rules implemented?

CMSSW_14_2_0_pre1

@fwyzard
Copy link
Contributor

fwyzard commented Sep 13, 2024

did the rules change?

Yes, see #45870 .

@fwyzard
Copy link
Contributor

fwyzard commented Sep 13, 2024

this seems like a significant regression in readability of the code

If you can figure out how to get back the old behaviour with LLVM 18, maybe we could revert the change ?

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 476KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/41508/summary.html
COMMIT: 5f9c2f6
CMSSW: CMSSW_14_2_X_2024-09-13-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45117/41508/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 45
  • DQMHistoTests: Total histograms compared: 3437830
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3437807
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 44 files compared)
  • Checked 197 log files, 168 edm output root files, 45 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53003
  • DQMHistoTests: Total failures: 887
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52116
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@slava77
Copy link
Contributor

slava77 commented Sep 13, 2024

batch 6 was merged now with the following updates:

9713499

bugfixes to incomplete alpaka::wait cleanup:

  • need to wait to get counters on host; otherwise values returned by getNumberOf methods are occasionally wrong
  • need to write to CPU buffer asynchronously or directly only after a sync
    • event->getTrackCandidates()->data()->nTrackCandidates could be OK right after initialization in ::getTrackCandidates method, but later on would get overwritten by a constructor call initialization which is executed asynchronously later (a few lines below new TrackCandidatesBuffer and where data()->nTrackCandidates was set directly in the CPU side)

f245be0

More comments from the PR review; it resolves:

d2b9e89

@VourMa will resolve the addressed comments some time soon (only a PR submitter can, apaprently)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45117/41807

@cmsbuild
Copy link
Contributor

@slava77
Copy link
Contributor

slava77 commented Sep 16, 2024

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-252c14/41541/summary.html
COMMIT: 83897c8
CMSSW: CMSSW_14_2_X_2024-09-15-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45117/41541/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

GPU Comparison Summary

Summary:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.