Skip to content

ARM Tech Preview (1.2)

Reese Baird edited this page Aug 23, 2017 · 2 revisions

Introduction

This area houses the info/tips regarding the ARM tech preview included in the OpenHPC 1.2 release (November 2016). This document has been updated to reflect 1.3.1 here: https://github.com/openhpc/ohpc/wiki/ARM-Tech-Preview-(1.3.1)

The provided packages are targeted at 64-bit server platforms, however it is being released as a Tech Preview initially as there are some known issues around provisioning and a subset of development libraries when exercised on the target OS distributions versions and tested hardware platforms.

The information on this page is intended to supplement the aarch64 OpenHPC Installation Guides for SLES-12-SP1 and CentOS-7.2. In particular, the provisioning steps outlined with Warewulf (most of the steps in sections 4.3 thru 4.9) are not directly usable without additional modification to the PXE boot configuration.

Known Package Issues

  • GSL: a small subset of tests performed with the GSL library failed precision related tests. This is currently attributed to the fact that the tests included in GSL are tuned for x86 which does 80-bit extended precision.
  • PAPI: hardware counter availability may not be available depending on the underlying ARM platform.
  • MPI: available hardware for this Tech Preview release was ethernet only. The available MPI stacks reflect this test environment.
  • mpiP: appears to have trouble collecting certain information in certain scenarios causing it to fail integration tests
  • Nagios and Ganglia: don't work on SLES-12-SP1 due to missing PHP5 dependencies
  • Warewulf: the ARM Standard Base Boot Requirements and Standard Base System Architecture requires specific UEFI support during the boot process which doesn't seem to be compatible with the way warewulf currently provisions worker nodes. There is a work-around, but it requires some manual intervention during installation and deployment of the nodes.

Distro Notes

SLES 12 SP1

SLES 12 SP1 AARCH64 support was a Beta release which has now been superseded by the commercial release of 64-bit ARM support in SLES 12 SP2. However, in order to match the x86_64 versions of the packages we built and tested against this beta for the 1.2 release of OpenHPC. If you were not already part of the SLES 12 SP1 beta, you may have trouble getting access to the necessary base OS installation. We are in the process of validating that these packages may be used on SLES 12 SP2 and hope to move to the commercially available release in the next version of OpenHPC.

CentOS 7.2

64-bit ARM support has been made available as an altarch variation of the CentOS 7.2 release. There is some information here and ISO images available here. You will also need to acquire access to certain EL7 packages in order to complete a full installation of OpenHPC, these can be found here.

Package Notes

PAPI

On the three tested system configurations we found disparity with respect to the availability of performance counters. Since all ARM 64-bit hardware has performance counters and PAPI has support for ARM64 performance counter hardware, this is likely a problem with either the kernel or the device tree passed to the kernel from the firmware. You can determine whether or not you have access on your platform's configuration by running papi_avail(1):

# module load papi
# papi_avail
  ...
Number Hardware Counters : 0  (Xgene Mustang)
  -- or --
Number Hardware Counters : 6  (Softiron Seattle)
  -- or --
Number Hardware Counters : 0  (Cavium ThunderX)

The other thing to note is that while the ARM Architecture specifies a core set of performance counters, many more may be available depending on the microarchitecture. We are in the process of working with the various silicon partners to make sure support for these is available in PAPI. As we discover workarounds enabling additional counter support for various platforms we will include them here.

Lustre

Lustre client support has been available for a while on both the 32-bit and 64-bit ARM platforms. However, since different ARM platforms require different kernels than the standard ones found in the SLES-12-SP1 and CentOS-7.2 distributions we couldn't easily build a lustre that would work for specific platform configurations. As better ARM support is added to commercial distributions (like the support now in SLES-12-SP2), this will become easier. For now, you'll have to build your own kernel if you want lustre support.

Hypre & Superlu_dist

While both of these libraries build, we discovered anomalies during testing that we have not yet been able to resolve. Once we have a workaround we will include it here.

MVAPICH2

MVAPICH packages compiled but we did not have InfiniBand hardware support in our testbeds at the time of these release to validate the packages and/or any instructions relating to them. We are working with platform vendors to acquire sufficient hardware to test this in the future. If you have working InfiniBand support on your ARM platform you may be able to get existing libraries to work on your own.

Nagios & Ganglia

SLES 12 SP1 did not contain PHP5 packages which were required for Nagios and Ganglia, CentOS-7.2 works fine. You may be able to work around this by building PHP5 yourself, or finding a compatible PHP5 package that can be installed on SLES 12 SP1.

Warewulf Provisioning

Network booting is a bit different on ARM platforms - ARM servers all must use UEFI firmware, so in order to network boot them at the moment you must netboot a GRUB2 EFI netboot image which then tftpboots kernel and RAMFS from the server. It is also important to remember that current ARM servers may use a different kernel than the one provided by a distribution. The best chance of success is to use the kernel and modules that come installed on the server and use those for network booting with a warewulf created ramdisk. Basic PXE boot instructions can be found on the Linaro website: https://wiki.linaro.org/LEG/Engineering/Kernel/UEFI/UEFI_Network_Booting . However, the Linaro instructions are specific to running on ARM emulators, specific instructions for the OpenHPC test platforms follow:

Kernel and Drivers

Obtain kernel and modules from your current platform and place in /warewulf/bootstrap directory and /lib/modules You'll need an Image kernel versus a vmlinuz kernel, if your current platform doesn't have one you'll need to build it from source.

Obtain a net bootable grub2 instance & configure

Obtain a bootnetaa64.efi GRUB2 image from your distribution or build it yourself and put in directory

  1. Install grub2 packages:

    % rpm -ihv http://build.openhpc.community/home:/eric/SLE_12_SP1/aarch64/grub2-2.02~beta3-1.1.aarch64.rpm http://build.openhpc.community/home:/eric/SLE_12_SP1/aarch64/grub2-arm64-efi-2.02~beta3-1.1.aarch64.rpm

  2. Create a working grub2 EFI binary and copy it into /srv/tftpboot/aarch64/grub.efi:

    % grub2-mkimage -O arm64-efi -o grub.efi -p /aarch64/boot/grub2 `ls /usr/lib/grub2/arm64-efi/*.mod | cut -d . -f 1`

  3. Edit /srv/tftpboot/aarch64/boot/grub2/grub.cfg, adjust bootstrap path to match your warewulf generated setup

    echo Now booting ${net_efinet0_hostname} with Warewulf bootstrap
    echo Loading kernel...
    linux (tftp)/warewulf/bootstrap/6/Image ro wwhostname=$net_efinet0_hostname quiet wwmaster=$net_default_server \ wwipaddr=$net_efinet0_ip wwnetmask=255.255.0.0 wwnetdev=eth0
    echo Done!
    echo Loading initrd...
    initrd (tftp)/warewulf/bootstrap/6/initfs.gz
    echo Done!

Setup TFTP to point to grub and configuration file by adding this line to dhcpd.conf

# Override for ARM Servers
if substring (option vendor-class-identifier, 15, 5) = "00011" {
    filename "aarch64/grub.efi";
}

Configure your firmware to PXE boot

Right now it doesn't appear ipmi pxe commands effect the UEFI boot configuration settings, so you'll have to interrupt boot on the serial console and configure PXE manually on each worker. This is also a good time to capture the hardware MAC address to give to DHCP and warewulf if you don't know it already.

Platform Notes

  • Cavium ThunderX uArchitecture, armv8
  • ThunderX (version a1)
  • 2 socket, 48-core, 128GB of Memory
  • Linux Version 4.4.21-64-default
  • Tested against SLES-12-SP1 install
  • EFI v2.40 by Cavium Thunder cn88xx EFI ThunderX-Firmware-Release-1.22.9-15-gcc66a09 Aug 4 2016 16:55:45
  • APM X-Gene uArchitecture, armv8
  • APM X-Gene-1
  • 1 socket, 8 core, 16GB of memory
  • Linux Version 4.4.11-reference.135.aarch64
  • Tested against CentOS-7.2 install
  • EFI v2.40 by X-Gene Mustang Board EFI Nov 24 2015 13:22:41
  • ARM Cortex-A57 uArchitecture, armv8
  • AMD Seattle Processor (Rev.B0)
  • 1 socket, 8 core, 16GB of Memory
  • Linux version 4.4.21-64-default
  • Tested against SLES-12-SP1 install
  • EFI v2.40 by American Megatrends

Frequently Asked Questions

Please feel free to email any questions related to this Tech Preview to the OpenHPC mailing list ([email protected] & https://groups.io/g/openhpc-users) and we will endeavor to do our best to answer them and include the response for others to benefit from.

Clone this wiki locally