Skip to content

Proceedings 2024 ESPResSo meetings

Jean-Noël Grad edited this page Dec 17, 2024 · 13 revisions

Proceedings of the 2024 ESPResSo meetings

2024-12-17

What to run where, and how long it takes

Hardware at the ICP:

  • 1152 cores and 36 GPUs on the Ant cluster
  • 500 cores and 40 GPUs on the HTCondor infrastructure

Benchmarks:

  • single-core performance: cores on HTCondor have 30% more performance than cores on Ant
  • multi-core performance: on HTCondor, performance stops improving after 4 cores for simulations with 10k particles, and after 8 cores for 100k particles, while for GPU simulations 1 core is usually sufficient

Parallel performance update

  • use shared memory parallelism on the node level (rather than distributed memory) to reduce footprint of ghost particles calculations
  • use struct-of-arrays for particle lists
  • LJ simulation prototype implemented using Kokkos+Cabana

Improve build system user experience

  • feature name change: CUDA -> ESPRESSO_CUDA, etc. (#4974)
  • reduce number of CMake configuration options, e.g. with a unified -D ESPRESSO_BUILD_WITH_ARCHS="scalar,avx2,cuda" option
  • unify Python classes for CPU, AVX2 and GPU kernels with an extra argument arch:
    • arch="cpu:auto": default value, maximal portability, selects the fastest kernel available for the current hardware
    • arch="cpu:avx2": selects AVX2 kernels if supported by the hardware (Intel, AMD), otherwise raise error
    • arch="cpu:neon": selects Neon kernels if supported by the hardware (ARM), otherwise raise error
    • arch="cpu:scalar": selects scalar kernels (no vectorization, slowest)
    • arch="gpu:auto": selects the GPU kernels against which ESPResSo was built if a matching GPU is available, otherwise raise error
    • arch="gpu:cuda": selects the CUDA GPU kernels if an Nvidia GPU is available, otherwise raise error
    • arch="gpu:rocm": selects the ROCm GPU kernels if an AMD GPU is available, otherwise raise error

2024-11-05

Multi-GPU LB

  • for a column system, where each MPI rank contains a cubic slice of the column, optimal performance is achieved by orienting the column main axis along the z-direction
  • for a cuboid system, slicing along the z-direction is also the optimal communication pattern, since slicing along 2 or 3 directions introduces communication with extra partners
  • the default MPI Cartesian topology in ESPResSo is in descending order, for multi-GPU LB the user must manually set it to ascending order
  • due to padding of GPU fields, the memory footprint is minimized when the size of the rank-local LB domain in agrid units along the x-direction is an integer multiple of 64 (single-precision) or 32 (double-precision)
    • the formula is relatively simple to derive and involves a stepwise function
    • every time we enter a new step, increasing the size along the x-direction is essentially free since the new data replaces the existing padding, until we reach the next step in the curve

2024-09-24

Summer school tutorials report

  • Alex: validated, one missing link to the user guide
  • JN: validated, one missing link to the book chapter
  • Julian: validated
  • Sam: still a work in progress
  • not present: Keerthi, David

LB performance improvements

  • on CPU, the development branch of ESPResSo outperforms the 4.2.2 release (#4921)
  • on GPU, GDRcopy is needed to remove a performance bottleneck in multi-GPU simulations

other topics

2024-08-13

Coding day

  • progress was made during and after the coding day in improving the script interface, introducing the ZnDraw visualizer in tutorials, fixing a corner case of the Lees-Edwards collision operator in LB, and fixing regressions in the Python implementation of Monte Carlo
  • there is a recurring issue with the difficulty level of C++ tasks
  • the core team needs to improve onboarding of C++ developers

MetaTensor

  • MetaTensor integration in ESPResSo is challenging due to dependencies
  • need to find test cases based on current ML research done with ESPResSo

2024-07-23

Global variables progress report

  • now encapsulated: non-bonded and bonded interactions, collision detection, particle list, cluster structure analysis, OIF, IBM, auto-update accumulators, constraints, MPI-IO, MMM1D (#4950)
  • new API: several features now take an ESPResSo system as argument: Cluster Structure, MPI-IO
  • in the future, more features will take a system or particle slice as argument, e.g. Observables (#4954)

New propagation API

  • system.thermostat and system.integrator will be removed in favor of system.propagation
  • possible API: JSON data structure
    • easy to read from a parameter file
    • conveys the hierarchical nature of mixed propagation modes
    • avoids the ambiguity of similarly named parameters, e.g. "gamma" for both Langevin and Brownian, but "gamma0" and "gammaV" for NpT
    system.propagation.set(kT=1.,
                           translation={"Langevin": {"gamma": 1.}, "LB": {"gamma": 2.}},
                           rotation={"Euler"})
  • more details in #4953 and in an upcoming announcement on the mailing list

Coding day

  • Tuesday, August 6, 2024

2024-07-03

Migration to C++20 and CUDA 12

  • see mailing list for more details
  • version requirements of many dependencies were updated

Porting to ARM A64FX

2024-06-04

Multi-GPU support

  • experimental support for multi-GPU LB is underway
    • requires a suitable CUDA-aware MPI library
    • for now, use one GPU device per MPI rank
    • long-term plan: use one GPU device and multiple OpenMP threads per MPI rank
  • planned removal/replacement of the GPU implementations of long-range solvers

ZnVis new features

  • vector field visualization (LB velocities)
  • bacterial growth simulation (non-constant number of particles)
  • raytracing of porous media
  • red blood cell transport in capillary blood vessels

2024-05-14

GPU LB performance improvements

  • LB GPU now works in parallel
  • CUDA-aware MPI is still a work-in-progress
  • work on multi-GPU support has just started
  • long-term plan: multi-GPU support with 1 GPU per MPI rank and multiple shared memory threads per MPI rank via OpenMP

Multi-system ESPResSo simulations

  • multi-system simulations are now possible for almost all features
    • caveats: two systems cannot have particles with the same particle ids, Monte Carlo not yet supported
  • can be enabled with a one-liner change to system.py (see last commit in jngrad/multiverse)

2024-04-23

Planned work with Cabana

  • convert particle cells from AoS to SoA (#4754), i.e. one array per particle property
  • improves cache locality and CPU optimizations
  • use Cabana to hide optimizations

New ESPResSo requirements

  • bump all version requirements (#4905)
  • on ICP workstations, only need to update formatting and linter tools with pip3 install -r requirements.txt autopep8 pycodestyle pylint pre-commit

2024-04-02

Implementing pressure waves in LB

  • ModEMUS project: nanoparticle diffusion in hydrogel network, by Pablo Blanco (NTNU) (see Ma et al. 2018)
  • currently implemented with Langevin, plan is to use LB instead to improve accuracy
  • ultrasound streaming could be modeled with a gradient pressure via pressure boundary conditions

GPU LB

  • GPU LB with particle coupling implemented in #4734
  • requires CUDA>=12.0 to make double-precision atomicAdd() available
  • performance is degraded when using more than 1 MPI rank to communicate to the same GPU, need to look into CUDA-aware MPI

EESSI

2024-02-21

Bee 2.0 cluster

  • Thursday: HPC team meeting to discuss software stack
  • Monday: set up the software stack
  • Tuesday: online meeting with the company

MultiXscale review highlights

  • main objectives of the CoE MultiXscale:
    • EESSI: "app store" for scientific software
    • multiscale simulations with 3 pilot cases: helicopter blades turbulent flow, ultrasound imaging of living tissues, energy storage devices
    • make software pre-exascale ready
    • training on using these software
  • ongoing projects for the ICP:
    • improve scaling efficiency of ESPResSo: LJ simulations have only 50% efficiency at 1024 cores
    • teaching people how to use this software: CECAM Flagship Schools 1229 and 1324

Interactive generative modeling

LB boundaries bug report

  • LB boundaries in the waLBerla version of ESPResSo are broken when using 2 or more MPI ranks
  • the LB ghost layer doesn't contain any information about boundaries, so fluid can flow out into the neighboring LB domain, where it gets trapped into the boundary
  • more details can be found in the bug report #4859
  • the solution will require a ghost update after every call to a node/slice/shape boundary velocity setter function

2024-01-30

Propagation refactoring

  • it is now possible to choose the exact equations of motions solved for each particle
  • more combinations of integrators and thermostats are allowed
  • one can mix different types of virtual sites in the same simulation
  • a new Python interface needs to be designed (target for next meeting)

Software and hardware changes

  • CUDA 12.0 is now the default everywhere at the ICP
  • GTX 980 GPUs are being removed from ICP workstations (GTX 1000 and higher are needed for double-precision atomicAdd)
  • Python 3.12 changes the way we use the unittest module (#4852)
  • Python virtual environments becomes mandatory for pip install in Ubuntu 24.04
    • user guide needs to reflect that change

2024-01-09

New live visualizer

  • ZnDraw bridge is currently being developed for ESPResSo
  • supports live visualization in the web browser
  • the plan is to re-implement all features available in the ESPResSo OpenGL visualizer

Dropping CUDA 11 support

  • CUDA 12 will be the default in Ubuntu 24.04, due April 2024
  • CUDA 12 is required for C++20, which makes contributing to ESPResSo significantly easier (#4846)
  • compute clusters and supercomputers where ESPResSo is currently used already provide compiler toolchains compatible with CUDA 12