Skip to content
Markus Battarbee edited this page Nov 8, 2022 · 18 revisions

Vlasiator on Vorna

Logging in from outside the university network.

The login host, turso.cs.helsinki.fi, is only available inside the university network. To access it from the outside, you need to go via a jump host. To make matters easy, you can add a block like this to your ~/.ssh/config (create the file if it does not exist yet):

Host vorna
   ProxyJump login.physics.helsinki.fi
   Hostname turso.cs.helsinki.fi
   User <universityUsername>

Remember to use ssh-copy-id <user>@<host> to copy your ssh-key to a target machine, enabling passwordless logins. The ProxyJump is the easiest way to pass through one or more jump hosts, but for a fully passwordless login, ssh-key needs to be copied for each proxy host.

Building Vlasiator

As of September 2021 following the new setup deployment started on turso03, the following applies.

Building should be done in an interactive session. That is obtained using

/usr/bin/srun -M vorna --interactive -n 16 --mem=4G -p short -t 0:10:0 --pty bash

and you can modify the core count, memory, partition and wall time to suit your needs.

You should use the MAKE/Makefile/vorna_gcc Makefile and load the modules as suggested in there, i.e.

export VLASIATOR_ARCH=vorna_gcc
module purge
module load GCC/10.2.0
module load OpenMPI/4.0.5-GCC-10.2.0

It is suggested to place these commands in your ~/.bashrc file (if you want) and job scripts (definitely!). The intel makefile and others that may be in the repo are not recommended at the moment and can cause all sorts of problems.

NOTE: Do not load other modules like Python on top of this or your loaded module system can change due to dependencies!

If you get messages of the module command not being supported, add the following to your ~/.bashrc:

if [ -f /etc/bashrc ]; then
    . /etc/bashrc
fi

Running Vlasiator jobs on Vorna

IMPORTANT: Place your executables/binaries in $PROJ (/proj/username/) or a subdirectory of that. Some applications SEGFAULT if binary is striped over multiple OST's. Alternatively, you can create a stripe=1 directory:

 mkdir $WRKDIR/bindir
 lfs setstripe -c 1 $WRKDIR/bindir

However, $PROJ is a better place for the binary. Run your jobs in $WRKDIR (/wrk/user/username/), not in $PROJ. When the job has started, check the slurm output. If you see an error message about not being able to allocate disk space (0B available), cancel the job (scancel JOBID) and re-submit. a wait command was added to the jobscript to attempt to mitigate this LUSTRE quota reporting error.

Here is an example job script

 #!/bin/bash                                                                                                                           
 #SBATCH --time=0-00:10:00                                                                                                             
 #SBATCH --job-name=Vlas_jobname                                                                                                     
 #SBATCH --partition=test                                                                                                              
 ##test short medium long   20min 1d 3d 7d                                                                                                             
 #SBATCH --exclusive                                                                                                                   
 #SBATCH --nodes=1                                                                                                                     
 #SBATCH -c 8                  # CPU cores per task                                                                                    
 #SBATCH -n 4                  # number of tasks (4xnodes)
 #SBATCH --ntasks-per-node=4
 #SBATCH --mem-per-cpu=1200M             # memory per core
 
 #Vorna has 2 x 8 cores                                                                                                                
 cores_per_node=16
 
 module purge
 module load GCCcore/10.2.0
 module load OpenMPI/4.0.5-GCC-10.2.0
 executable="/proj/username/vlasiator"
 configfile="./Magnetosphere_small.cfg"
 umask 007
 cd $SLURM_SUBMIT_DIR
 wait
 
 #--------------------------------------------------------------------                                                                 
 #---------------------DO NOT TOUCH-----------------------------------                                                                 
 nodes=$SLURM_NNODES
 ##threads per job (equal to -c == SLURM_CPUS_PER_TASK)                                                                                                      
 t=$SLURM_CPUS_PER_TASK
 #Hyperthreading                                                                                                                      
 ht=2
 #Change PBS parameters above + the ones here                                                                                          
 total_units=$(echo $nodes $cores_per_node $ht | gawk '{print $1*$2*$3}')
 units_per_node=$(echo $cores_per_node $ht | gawk '{print $1*$2}')
 tasks=$(echo $total_units $t  | gawk '{print $1/$2}')
 tasks_per_node=$(echo $units_per_node $t  | gawk '{print $1/$2}')
 export OMP_NUM_THREADS=$t

 srun --mpi=pmix_v2 -n 1 -N 1 $executable --version
 srun --mpi=pmix_v2 -n $tasks -N $nodes $executable --run_config=$configfile

Checking on in running jobs

It is also possible to do debugging by logging into nodes where Vlasiator is running. Some of these commands may prove helpful:

 srun -M ukko --overlap --pty --jobid=$SLURM_JOBID bash
 srun --jobid=$SLURM_JOBID --nodelist=node_name -N1 --pty /bin/bash

You can pass -w to select multiple nodes, as in

 srun --overlap --pty --jobid=$SLURM_JOBID -w $( squeue --jobs $SLURM_JOBID -o "%N" | tail -n 1 )

Background on building libraries on Vorna

Zoltan and Boost need to be built for GCC. For other required libraries, follow the standard Vlasiator install instructions. This building has already been done and users should be able to directly use the libraries found in /proj/groups/spacephysics. These instructions are in case they need to be rebuilt for some reason.

First download and unpack the source code

 wget http://cs.sandia.gov/Zoltan/Zoltan_Distributions/zoltan_distrib_v3.83.tar.gz
 wget http://freefr.dl.sourceforge.net/project/boost/boost/1.69.0/boost_1_69_0.tar.bz2
 tar xvf zoltan_distrib_v3.83.tar.gz 2> /dev/null
 tar xvf boost_1_69_0.tar.bz2

Zoltan

Make and install

 mkdir zoltan-build
 cd zoltan-build
 ../Zoltan_v3.83/configure --prefix=/proj/group/spacephysics/libraries/gcc/8.3.0/zoltan/ --enable-mpi --with-mpi-compilers --with-gnumake --with-id-type=ullong
 make -j 8
 make install

Clean up

 cd ..
 rm -rf zoltan-build Zoltan_v3.83

Boost

Make and install

 cd boost_1_69_0
 ./bootstrap.sh
 echo "using gcc : 8.3.0 : mpicxx ;" >> ./tools/build/src/user-config.jam
 echo "using mpi : mpicxx ;" >> ./tools/build/src/user-config.jam
 ./b2 -j 8
 ./b2 --prefix=/proj/group/spacephysics/libraries/gcc/8.3.0/boost/ install

Clean up

 cd ..
 rm -rf boost_1_69_0

GPU Acceleration

To request a GPU from a gpu node, run e.g.

/usr/bin/srun --interactive -n1 -c8 --mem=4G -t00:15:00 -Mukko --constraint=v100 -pgpu --pty bash

This requests 1 node with 8 CPU cores, 4GB of memory, and 1 V100 GPU, from the GPU partition on Ukko, for 15 minutes. You can alternatively request A100 and P100 GPUs on this partition. Detailed partition information here.

CUDA

For basic compiling and running CUDA applications you need to load a CUDA module (citation needed), for instance with the following commands:

module purge
module load GCC/10.2.0
module load CUDA/11.1.1-GCC-10.2.0

This loads GCC version 10.2.0 and CUDA version 11.1.1 for GCC 10.2.0.

HIP

Two versions of HIP have been built for Ukko, v4.5 and v5.2. These can be used by running

module use /proj/group/spacephysics/modules/hip
module load hip-[4.5|5.2]

This module will load all prerequisite modules needed by HIP and set the correct environment variables for compilation. To compile, run

hipcc $(HIP_IFLAGS) <rest of the compiler options and filenames>

The HIP_IFLAGS environment variable contains the flag -I and the path to the HIP header files.