Skip to content

mvsjober/gpu-energy-amd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPU energy usage counter for AMD/ROCm

Reads the current GPU energy counters for AMD GPU cards using the ROCm SMI library.

Installation

To compile just run make.

NOTE: on LUMI it is pre-installed in /appl/local/csc/soft/ai/bin/gpu-energy.

Usage

Print current counter values for all visible devices:

gpu-energy

Save counters to a temporary file for later use:

gpu-energy --save [filename]

if no filename is given, it will try to figure out a good name based on the Slurm environment.

Print energy usage difference since last save:

gpu-energy --diff [filename]

Typical usage in a Slurm script:

gpu-energy --save

# run job here

gpu-energy --diff

Multi node job:

srun --ntasks=$SLURM_NNODES --ntasks-per-node=1 gpu-energy --save

# run job here

srun --ntasks=$SLURM_NNODES --ntasks-per-node=1 gpu-energy --diff

If you're using a module (like CSC's pytorch) that sets the SLURM_MPI_TYPE environment variable, you need to run it like this (otherwise it will not detect MPI and will not calculate the energy sum over nodes).

srun --mpi=cray_shasta --ntasks=$SLURM_NNODES --ntasks-per-node=1 gpu-energy --save

# run job here

srun --mpi=cray_shasta --ntasks=$SLURM_NNODES --ntasks-per-node=1 gpu-energy --diff

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published