Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building LAMMPS-2Aug2023_update2-foss-2023a-kokkos.eb may fail for --optarch=GENERIC #545

Open
trz42 opened this issue Apr 16, 2024 · 1 comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io aarch64 related to Arm 64-bit targets (aarch64) bug Something isn't working

Comments

@trz42
Copy link
Collaborator

trz42 commented Apr 16, 2024

In NorESSI#323 building failed for aarch64/generic. The build job was run on a compute node with ThunderX2 CPU. kokkos_arch was not explicitly set.

If kokkos_arch is not explicitly, the LAMMPS easyblock tries to determine the CPU architecture of the host. See https://github.com/easybuilders/easybuild-easyblocks/blob/develop/easybuild/easyblocks/l/lammps.py#L577-L595

    if kokkos_arch:
        if kokkos_arch not in KOKKOS_CPU_ARCH_LIST:
            warning_msg = "Specified CPU ARCH (%s) " % kokkos_arch
            warning_msg += "was not found in listed options [%s]." % KOKKOS_CPU_ARCH_LIST
            warning_msg += "Still might work though."
            print_warning(warning_msg)
        processor_arch = kokkos_arch

    else:
        warning_msg = "kokkos_arch not set. Trying to auto-detect CPU arch."
        print_warning(warning_msg)

        processor_arch = kokkos_cpu_mapping.get(get_cpu_arch())

        if not processor_arch:
            error_msg = "Couldn't determine CPU architecture, you need to set 'kokkos_arch' manually."
            raise EasyBuildError(error_msg)

        print_msg("Determined cpu arch: %s" % processor_arch)

It runs get_cpu_arch() which uses archspec with

python -c 'from archspec.cpu import host; print(host())'

archspec returned thundex2 in the PR for NESSI. When running this for EESSI it would return neoverse_n1 at the moment (compute node used to build for aarch64/generic on AWS has a neoverse_n1 CPU). The CPU architecture is then mapped via

processor_arch = kokkos_cpu_mapping.get(get_cpu_arch())

to an architecture identifier used in Kokkos. This works for EESSI, because https://github.com/easybuilders/easybuild-easyblocks/pull/3036/files#diff-bdb538abf869738e5431974debc2503a1b160370b86938bfc02729de69d5689b dynamically adds a mapping for neoverse_n1. For thunderx2 such a mapping is missing.

However, in case we would map to the correct value for thunderx2 (probably ARMV81) the built software may not function correctly on an aarch64/generic CPU, for example, a Raspberry Pi 3/4.

In NorESSI#323, we therefore opted to extend an existing parse_hook to set kokkos_arch to ARMV80 when we build for aarch64 and the build option optarch is set to GENERIC.

Possibly this explicit setting of kokkos_arch may need to be done too when building for x86_64/generic.

@trz42 trz42 added bug Something isn't working aarch64 related to Arm 64-bit targets (aarch64) 2023.06-software.eessi.io 2023.06 version of software.eessi.io labels Apr 16, 2024
@ocaisa
Copy link
Member

ocaisa commented May 17, 2024

Given that we have shifted to archdetect within EESSI, I agree the implication here is that we should probably always be setting the kokkos_arch for the generic cases (or indeed any case where we are not doing a native build)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2023.06-software.eessi.io 2023.06 version of software.eessi.io aarch64 related to Arm 64-bit targets (aarch64) bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants