Developing large, parallel, scalable applications is arguably the most demanding effort that end-users of HPC systems like Frontier face. However, once an application is ready for production runs, a strong understanding of and familiarity with the user environment can be just as critical for a team to be productive.
The user environment includes interfaces to the batch scheduler, parallel job laucher, structure of available file systems, along with any user-configurable parts of the system. This challenge will rely on interaction with Frontier's batch scheduler, SchedMD’s Slurm Workload Manager.
As you may have already seen in Basic_Workflow, Slurm provides the fundamental mechanisms to submit batch jobs and moderate submitted jobs after they've been enqueued.
We won't be submitting any new jobs here, but rather looking at others that have already been run and gathering information about them. To do this, we'll primarily use the sacct
command.
(See the sacct
manual page by running man
sacct
for a full list of command options.)
-
How many jobs were completed on Frontier between 00:00 (midnight) on June 1, 2023 and 23:59 on June 15, 2023?
-
How many unique users did the jobs from question 1 belong to?
- Of the jobs found in question 1, what's the job ID of the longest running job?
- How long was it pending (pre-execution), and how long did it run (actual execution time)?
- When was it submitted?