Feature epic metrics #2536

BruceKropp-Raytheon · 2024-12-12T17:10:13Z

Commit Queue Requirements:

Fill out all sections of this template.
All sub component pull requests have been reviewed by their code managers.
Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
Commit 'test_changes.list' from previous step

Description:

Adding CI/CD scripts to support collection of build and test stage metrics during both Nightly builds and PR builds.
Includes a new Jenksfile that can be tried as a replacement for ./tests/ci/Jenkinsfile.combined.

Potentially resolves issue: #2527

Commit Message:

CI/CD Automation tools to support UFS WM Infrastructure Metrics Dashboard

* UFSWM - 
  * use RT labels to trigger Jenkins builds

Priority:

Normal

Git Tracking

UFSWM:

None

Sub component Pull Requests:

None

UFSWM Blocking Dependencies:

None

Changes

Regression Test Changes (Please commit test_changes.list):

No Baseline Changes.

Input data Changes:

None.

Library Changes/Upgrades:

No Updates

Testing Log:

…stages Signed-off-by: Bruce Kropp <[email protected]>

BruceKropp-Raytheon · 2024-12-12T17:19:51Z

I have tested this on Hera, Hercules, and Orion.
Others TBD: Jet, Gaea, Derecho.
Stretch goal: PW hosts.

on-behalf-of @ufs-community <[email protected]>

BruceKropp-Raytheon · 2024-12-13T16:18:05Z

This is an old issue for Orion, unrelated to this PR:

Running regression tests on Orion
+ cd ..
+ module load git/2.28.0
+ '[' -z '' ']'
+ case "$-" in
+ __lmod_sh_dbg=x
+ '[' -n x ']'
+ set +x
Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for Lmod's output
Lmod has detected the following error: The following module(s) are unknown:
"git/2.28.0"

probably can remove lines 73 and 222 from ./tests/ci/Jenkinsfile.combined, and just use the latest git on the system:
module load git/2.28.0

on-behalf-of @ufs-community <[email protected]>

Signed-off-by: Bruce Kropp <[email protected]>

…/ufs-weather-model into feature/epic_metrics sync with PR build results

on-behalf-of @ufs-community <[email protected]>

Signed-off-by: Bruce Kropp <[email protected]>

DusanJovic-NOAA · 2025-01-08T21:24:14Z

This PR adds almost 2500 lines of non-trivial scripts. I find it impossible to review it by just looking at the code (diff). So, I'm approving this PR based on the assumption that it works for EPIC and that it does whatever it is supposed to do. The PR description does not explain what expected results are or how it can be, if it can be, used on supported platforms without using CI/CD Jenkins scripts.

DeniseWorthen · 2025-01-08T22:06:30Z

@BruceKropp-Raytheon Please explain exactly what this PR is doing, how and why? I don't really see how static pngs (on the dashboard) at all address the original issue which was a) tracking performance over time and b) scraping existing information from commit logs. Does this PR scrape information, or produce it's own (ie, somehow run it's own RT)?

DeniseWorthen · 2025-01-09T14:48:19Z

So it looks like this PR is a combination of features, one which updates the jenkins CI and one which supposedly addresses the metric tracking issue. If it is a combination, it should be split into the relevant parts for basic good CM practice.

For the metrics, Dusan was able to spend about an hour implementing something which actually addresses #2527.

$ cat rt_logs_grep_time.sh
#!/bin/bash
set -eu

TEST=$1
BRANCH=develop

for COMMIT in $(git log --since="2024-06-01" --format="%H" ${BRANCH}); do
  TIME=$(git show ${COMMIT}:tests/logs/RegressionTests_hera.log | grep " '${TEST}' " | awk '{print $6}' | awk -F']' '{print $1}')
  echo "${COMMIT} ${TIME}"
done

Which produces the following

$ ./rt_logs_grep_time.sh cpld_control_p8_intel
5324d642e2257ee659ab75c0fc404d3127b2d9f7 15:46
76471dc6b7bfc3342416d1a3402f360724f7c0fa 15:45
241dd8e3b9feae29f1925806bdb05816ae49f427 15:40
295008915d1ad09fb5d4e24624d0c19627273af4 15:58
e1193704767800bfaece56eb2a4b058bd4d0afbc 15:24
6ec6b458b6dc09af48658146d3908502b18272cf 15:28
409bc85b64b2ced642b0024cef2cd9c78ce46fd9 15:26
63ace62a36a263f03b914a92fc5536509e862dbc 13:58
a3c3bb587cdb6905a3d3635a4ef502547ff60598 13:58
144ccb03e6c82edae73cf12a496ebf060fed65f7 14:10
33b3c18774a994b3b05da4489b08f34115adbf48 14:02
c0367fdf0885493af6a5446b38eb77405a6230e1 13:59
6b0f516557811eac82c17b852efb82a35892b022 14:35
29c2703c715ebdb47bbd4bcc811db340eae530e5 13:00
058f07361b7f53a76e4cbb057aaebbbefffd34e5 13:00
f9c91d3df80a8536cf2a226fac5d826889e55c17 13:02
547be6d379f5b213b47eb3eacc9c5211fb95b6ab 13:29
be4544ee28f8fad7bc2cdb207dc62f89c4aa2bb2 07:30
db1781a05dce1125cfe17f8324650674640f0a9e 07:59
f3ce1698b00bc1039f73f662e9e107f9c424201f 07:51
e3750c28119deec4b133cacca81d49ba62b2670c 07:28
bad50ef5023860c992b75cb72722cba9bb428ceb 07:38
2ccc549348da37aac51ab44482174dff2bb2912d 07:40
38a29a62461cb1f9bf530420d5bc2f73a4650724 07:48
25ee7f6ca087ee19991e684a3c83e451921d5770 07:35
706219146401bec7a29e7384eb1a642392ca47fe 07:44
6a4e09e94773ffa39ce7ab6a54a885efada91f21 06:15
9ae4f54282e00df8c8ec68c883905f49b8d5d826 06:02
1c4fcf1ca75fa24326bd2af857dafa2f51347506 05:57
94a3cd7f6afa1091bad6b8f57cdc5b7712849dfb 06:07
fcc9f8461db5eafbfd1f080da61ea79156ca0145 05:56

This generates actual useful information, although no pretty pictures. Commit hash 547be6d clearly doubled the run time. If we had seen that at the time, we would have held the PR to determine the cause.

BruceKropp-Raytheon · 2025-01-10T18:21:46Z

@DeniseWorthen
Thank you for the clarification. I suspect the metrics requested from #2527 differ from the metrics EPIC is trying to report, so maybe this PR doesn't add value for #2527.
Simply, this PR is to provide a new Jenkinsfile that can collect metrics from Regression Tests (RT), save JSON formed data, so that they can be post-processed and presented in a chart on the EPIC dashboard.

Relevant JSON will look something like:

{
 "name": "Test" ,
 "type": "CI" ,
 "run": {
 "dateTime": "1734512640548",
 "builtOn": "all",
 "platform": "hercules",
 "compiler": "intel",
 "branch": "develop",
 "steps": [ 
   {
   "id": "512",
   "name": "scripts/wm_test.sh",
   "type": "STEP",
   "startTime": "2024-12-18T09:14:41.524+0000",
   "result": "SUCCESS",
   "durationInMillis": 1107792,
   "displayName": "Shell Script",
   "displayDescription": null,
   "tests": [ 
      {
       "ExperimentName": "control_p8 intel",
       "Status": "COMPLETE",
       "tasks": [
          {
           "type": "COMPILE",
           "task": "atm_dyn32_intel",
           "WallTime": "11:11",
           "Duration": "09:17",
           "mbytes": 0,
           "Status": "PASS",
           "Reason": ""
          },
          { 
           "type": "TEST",
           "task": "control_p8_intel",
           "WallTime": "05:53",
           "Duration": "03:16",
           "mbytes": 1889,
           "Status": "PASS",
           "Reason": ""
          } 
        ] 
      } 
    ] 
  } 
],
 "result": "SUCCESS" 
}
} 
}

We wouldn't expect to produce any images from this here, as those would be derived from the JSON as part of EPIC web dashboard effort.

BruceKropp-Raytheon and others added 2 commits December 11, 2024 19:04

add CICD support to collect performace metrics during Build and Test …

e696e9b

…stages Signed-off-by: Bruce Kropp <[email protected]>

Merge branch 'ufs-community:develop' into feature/epic_metrics

f1e8984

BruceKropp-Raytheon requested review from kbooker79, MichaelLueken and zach1221 December 12, 2024 18:40

BruceKropp-Raytheon added the hera-RT Run Hera regression testing label Dec 12, 2024

[AutoRT] Hera Job Completed.

df31e0b

on-behalf-of @ufs-community <[email protected]>

epic-cicd-jenkins removed the hera-RT Run Hera regression testing label Dec 12, 2024

BruceKropp-Raytheon added the hercules-RT Run Hera regression testing label Dec 13, 2024

[AutoRT] Hercules Job Completed.

1aabf1d

on-behalf-of @ufs-community <[email protected]>

epic-cicd-jenkins removed the hercules-RT Run Hera regression testing label Dec 13, 2024

BruceKropp-Raytheon added the orion-RT label Dec 13, 2024

epic-cicd-jenkins removed the orion-RT label Dec 13, 2024

BruceKropp-Raytheon added the derecho-RT Run regression tests on Derecho label Dec 13, 2024

[AutoRT] Derecho Job Completed.

6ab5520

on-behalf-of @ufs-community <[email protected]>

epic-cicd-jenkins removed the derecho-RT Run regression tests on Derecho label Dec 13, 2024

BruceKropp-Raytheon added 2 commits December 13, 2024 12:33

set logic to remove PR labels on success or failure

b9f4611

Signed-off-by: Bruce Kropp <[email protected]>

Merge branch 'feature/epic_metrics' of github.com:BruceKropp-Raytheon…

dedd6b8

…/ufs-weather-model into feature/epic_metrics sync with PR build results

BruceKropp-Raytheon added the hera-RT Run Hera regression testing label Dec 13, 2024

[AutoRT] Hera Job Completed.

77567ae

on-behalf-of @ufs-community <[email protected]>

epic-cicd-jenkins removed the hera-RT Run Hera regression testing label Dec 13, 2024

Merge branch 'develop' into feature/epic_metrics

cad931b

kbooker79 previously approved these changes Dec 18, 2024

View reviewed changes

Merge branch 'develop' into feature/epic_metrics

4917a3e

BruceKropp-Raytheon dismissed kbooker79’s stale review via 4917a3e December 19, 2024 20:58

make sure Jet uses NAGAPE

2a5f88c

Signed-off-by: Bruce Kropp <[email protected]>

BruceKropp-Raytheon requested a review from kbooker79 December 19, 2024 21:18

BruceKropp-Raytheon requested a review from jkbk2004 January 8, 2025 19:41

BruceKropp-Raytheon requested review from FernandoAndrade-NOAA and DeniseWorthen January 8, 2025 19:41

DeniseWorthen requested review from DusanJovic-NOAA and BrianCurtis-NOAA January 8, 2025 19:56

DusanJovic-NOAA approved these changes Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature epic metrics #2536

Feature epic metrics #2536

BruceKropp-Raytheon commented Dec 12, 2024 •

edited

Loading

BruceKropp-Raytheon commented Dec 12, 2024

BruceKropp-Raytheon commented Dec 13, 2024 •

edited

Loading

DusanJovic-NOAA commented Jan 8, 2025

DeniseWorthen commented Jan 8, 2025

DeniseWorthen commented Jan 9, 2025

BruceKropp-Raytheon commented Jan 10, 2025

Feature epic metrics #2536

Are you sure you want to change the base?

Feature epic metrics #2536

Conversation

BruceKropp-Raytheon commented Dec 12, 2024 • edited Loading

Commit Queue Requirements:

Description:

Commit Message:

Priority:

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

Changes

Regression Test Changes (Please commit test_changes.list):

Input data Changes:

Library Changes/Upgrades:

Testing Log:

BruceKropp-Raytheon commented Dec 12, 2024

BruceKropp-Raytheon commented Dec 13, 2024 • edited Loading

DusanJovic-NOAA commented Jan 8, 2025

DeniseWorthen commented Jan 8, 2025

DeniseWorthen commented Jan 9, 2025

BruceKropp-Raytheon commented Jan 10, 2025

BruceKropp-Raytheon commented Dec 12, 2024 •

edited

Loading

BruceKropp-Raytheon commented Dec 13, 2024 •

edited

Loading