Skip to content

Brainstorm meeting (June 9th 2020)

Kenneth Hoste edited this page Jun 9, 2020 · 4 revisions
  • attending: Alan, Caspar, Kenneth

EasyBuild configuration

  • installation dir names == EasyBuildMNS
    • already the default in EasyBuild 4.x
    • uniqueness
    • allows different module trees for same installations
      • does require making --module-only more robust
        • requires different implementation in framework
      • allow end users to pick module naming scheme (easier to hop between systems)
  • RPATH
    • or RUNPATH?
    • set LD_LIBRARY_PATH/LD_PRELOAD in modules?
      • filter-env-vars=LD_LIBRARY_PATH/LD_PRELOAD
      • to make it easier for users to compile their own software
      • does leave the door open to break OS binaries (see Caspar's experience at SURF)
      • only for stuff not provided by OS (Prefix)
        • to avoid breaking stuff in OS, like Slurm's sbatch
      • does ComputeCanada make the GCC look in Prefix locations?
  • what is provided by Prefix layer?
    • libreadline (since it's required by tools like vim) => goes into filter-deps
  • --with-sysroot needs to be set when compiling GCCcore to ensure it Prefix stuff
  • keep in mind that users may want to build additional stuff on top of the central installations

Filesystem layout

  • versioned gentoo dirs?
    • fixed package versions vs security?
  • what goes where?
    • in place updates => Gentoo?
    • anything security related should be in Gentoo layer too
  • archspec labels/code names
    • pick best match out of available
    • 'avx512' is not enough, see venn diagram with AVX512 flavors
  • is the long prefix a concern?
    • cfr. problems with Python shebangs (fixed), Trilinos (fix in easyblock), FSL (not fixed yet)
    • workaround: mount as /cvmfs/ when installing + user the stack
    • does that imply the datestamp needs to move into e.g. intel/cascadelake?
/cvmfs/pilot.eessi-hpc.org/gentoo/2020/  # fixed versions? update in-place?
------------------------------------------/usr/{bin,lib}
/cvmfs/pilot.eessi-hpc.org/easybuild/
------------------------------------------x86_64/
------------------------------------------------|intel/sse3
------------------------------------------------|intel/haswell/2020.06
------------------------------------------------------------------|software
                                                                  |modules
------------------------------------------------|intel/skylake
------------------------------------------------|intel/cascadelake
------------------------------------------------|intel/cascadelake-nvidia   # only GPU capable software (multiple compute capabilities)
------------------------------------------------|intel/cascadelake-pascal
------------------------------------------------|intel/cascadelake-volta
------------------------------------------------|amd/rome
------------------------------------------------|amd/rome-ampere
------------------------------------------aarch64/{a64fx,thunderx2}
------------------------------------------power/power9

Easyconfigs

  • maintain own stack of easyconfigs vs reuse as much as possible provided by EasyBuild
  • pros of own stack:
    • robustness
  • cons:
    • maintainability
  • customizations through hooks as much as possible
  • new software vs software updates
    • add to own repo first, issue PR, cleanup once included in EasyBuild release?
    • needs some scripting to follow up on easyconfigs
  • start off with relying on EasyBuild as much as possible, try to actively clean up with every EasyBuild release, see how it goes

Definition of software stacks

  • Which easyconfigs are installed for which architectures?
  • symlinks?
easybuild-layer/easyconfigs/  # actual files
cvmfs/x86_64/intel/haswell.yaml  # list of easyconfig filenames to install for this architecture
# contents of x86_64/intel/haswell.yaml
- GROMACS-2020-foss-2020a.eb
- TensorFlow-2.2.0-foss-2019b-Python-3.7.4.eb
  env:
     ENV_VAR1: foo
  • how to add missing extensions?
- R-4.0.0-foss-2020a.eb
  eb_args:
     skip: 1   # to install missing extensions that may have been added to installations

Testing

  • collection of tests to ensure robustness of software stack w.r.t. changes, updates, etc.
  • ReFrame as driver
  • tests should be easy to define: simple shell script that sets up environment, runs test and produces proper exit code

Workflow

  • for now, stick to pilot repo
  • everything via PRs, never merge your own PR
    • two-pairs-of-eyes rule
    • enforced by GitHub configuration in repo
  • testing CVMFS repo vs production CVMFS repo
    • PRs to test branch in easybuild-layer GitHub repo that get merged result in installation in test CVFMS repo
    • 2nd party verifies installation, and then opens PR to production branch
      • testing preferably using provided test scripts
      • also performance?
  • installations into CVMFS repos should be triggered automatically, no humans involved
  • policy w.r.t. making changes in existing software stack:
    • adding extensions (should be OK)
    • replacing broken installations (should be OK)
    • reinstalling software
      • only when there's a very good reason
      • careful with common deps (Python, GCCcore) since may affect lots of other installations
      • may warrant starting a new software stack "release" dir

Sources

  • separate CVMFS repo for sources?
    • only relevant when we have test + production CVMFS repos

Licensed software

  • special care is needed here
  • we are not allowed to redistribute Intel compilers, etc.
  • distributing runtime libraries required to run software installed with Intel compilers is fine (cfr. ComputeCanada)
  • motivation to make it easy to integrate local software stacks with EESSI stack

Starting point

  • pilot CVMFS repo
  • link Gentoo Prefix layer (use local install for now?)
    • $EESSI_PREFIX/gentoo/2020
    • define $EESSI_PREFIX to the location where you want to play around
    • all scripts we collect honor this prefix
    • easy to change later to /cvmfs/pilot.eessi-hpc.org/
  • target software stack:
    • OpenFOAM (MPI)
      • included examples
    • Python
    • TensorFlow CPU/GPU
      • PRACE benchmarks
    • bioinformatics pipeline
      • something COVID related?
      • phylogentic trees
  • EasyBuild config (RPATH, etc.)
  • installation prefixes via archspec
  • init script (Python)
    • set up environment (module use)
    • automatic vs provide some control
    • /etc/profile.d/050_eessi-init.sh
    • /etc/profile.d/049_eessi-my-init.sh # customisations
  • limited processor CPUs architectures + GPUs
    • ivybridge (SURF)
    • haswell (UGent, SURF)
    • skylake_avx512 (Xeon Gold; SURF, UGent)
    • ivybridge-kepler (SURF, VUB)
    • cascadelake-volta (UGent)
    • ivybridge-nvidia (fat GPU builds)
  • action points:
    • Kenneth: init script using archspec
    • Caspar: local Gentoo Prefix + some EasyBuild installations on top
Clone this wiki locally