Releases: ROCm/aomp
AOMP Release 14.0-0
These are the release notes for AOMP 14.0-0. This release uses modifications to the LLVM development trunk called the "amd-stg-open" branch. This is found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. Some AMD modifications are experimental and/or under review for the LLVM mono-repo. The AOMP release is a snapshot of amd-stg-open and supporting repositories to build various components.
For AOMP 14.0-0, the last trunk commit is a0633f5ccb04e4b1613eeb23af10ad729dace2b5 on Nov 8. The last amd-only commit is 8a48924725f0c53217d108b1d4b95f6ba0038031 on Nov 8. This forms a frozen branch now called "aomp-14.0-0". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-14.0-0 . The difference from the upstream LLVM trunk is found in the patch below. It is 35563 lines on 345 files. not including test directories.
Changes from aomp 13.0-6:
- AOMP is now based on amd-stg-open branch
- Most components are build from ROCm release 4.5 sources
- Components are now cloned using a manifest file. The script clone_aomp.sh is still used to clone and update repos.
- New hip build method
- Support for unified shared memory on gfx90a
- Support for atomic hint clause to enable fast floating point atomics
- Support for LLVM IR code generation with updated device RTL (deviceRTLs)
- Support for target ID with XNACK settings
- Support for cross-platform offload device identification LLVM library and tool (offload-arch).
- Fixed many reduction problems and nested parallelism
Known Issues:
- Slow CPU device-to-host data transfer speeds
- Miniqmc, Kokkos, Raja fail to build
- Non-deterministic failures in qmcpack deterministic tests
- Possible incorrect linking of libclang-cpp.so in the build of libomptarget.so
Check later for more updates...
AOMP Release 13.0-6
These are the release notes for AOMP_13.0-6. The source code base for this release is the upstream LLVM 11 monorepo main branch as of April 1, 2021 with hash value 0889181625bb570e463362ab8f53f9a14c886b2e. Updates from the aomp-stg-open repo were added as one commit per different file as of April 6, 2021.
Update to ROCm 4.3 sources.
Flang cmake race condition fixed.
AOMP Release 13.0-5
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
These are the release notes for AOMP_13.0-5. The source code base for this release is the upstream LLVM 11 monorepo main branch as of April 1, 2021 with hash value 0889181625bb570e463362ab8f53f9a14c886b2e. Updates from the aomp-stg-open repo were added as one commit per different file as of April 6, 2021.
This release includes a demo of a new LLVM library called libLLVMOffloadArch.cpp. The clang tool offload-arch is now built with this library. The libomptarget runtime no longer calls the binary "offload-arch -c" and traps the stdout. Instead a library call is made to libLLVMOffloadArch.cpp to determine current capabilities. The tool offload-arch is still created with the llvm build and the sources are in llvm-project/llvm/lib/OffloadArch/offload-arch . Updates were made so offload-arch returns the first VISIBLE gpu which could be the result of setting ROCM_VISIBLE_DEVICES for amdgpus.
This release starts to deprecate the use of mygpu in favor of offload-arch. A new version of mygpu calls offload-arch. The tables used to drive mygpu have been deleted. All pci-id tables for offloading identification are now in llvm library OffloadArch.
Added a new command line option -offload-usm which turns on OpenMP pragma requires unified_shared_memory and sets toolchain flags appropriately. This saves having to change every source file to turn on unified shared memory.
Build changes:
Update list of gfx names to include gfx1030 and gfx1031
Known Issues:
9 Clang lit test failures
Long build times when large numbers of archive libraries are needed because toolchain must unbundle the archive for device linking.
AOMP Release 13.0-4
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
These are the release notes for AOMP_13.0-4. The source code base for this release is the upstream LLVM 13 monorepo main branch as of April 1, 2021 with hash value 0889181625bb570e463362ab8f53f9a14c886b2e. Updates from the aomp-stg-open repo were added as one commit per different file as of April 6, 2021. This release is primarily a bug fix of the regressions in 13.0-3. 13.0-3 had significant driver changes to support multiple images. which caused a number of regressions that were fixed in this release. We strongly recommend deleting 13.0-3 and using this release.
Features:
- Support larger CU masks up to 128 bit or up to 128 CUs.
- Provide warning when HIP tries to use OpenMP offloading that this is not supported and target constructs will be ignored.
Fixes:
- Fixed examples/cloc/vector_copy_hip and vector_copy_hip_omp to use HIP_PLATFORM_AMD from deprecated HIP_PLATFORM_HCC
- Fixed examples/hip/device_lib to unbundle openmp library since toolchain is looking for hip library.
- Fixed examples/hip/writeIndex.
- Fixed test/hip-openmp/aomp_hip_launch_test. Bug in driver.cpp and new name for a.out file.
- Fixed test/hip-openmp/hip_host_register.
- Fixed hipcc when using noroot installs.
- Fixed host compilation picking up the wrong libomp.so on some systems.
- Fixed RPATH to include lib64.
- Fixed RAJA build issue.
- Fixed issue where targetID was not handled properly with march/fopenmp-targets.
Build changes:
- Update list of gfx names to include gfx90a
- Update list of NVPTX GPU names to : 30,35,37,50,52,53,60,61,62
Known Issues:
- 9 Clang lit test failures
- Long build times when large numbers of archive libraries are needed because toolchain must unbundle the archive for device linking.
AOMP Release 13.0-3
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
These are the release notes for AOMP_13.0-3 The source code base for this release is the upstream LLVM 13 monorepo main development branch as of April 1, 2021 with hash value 0889181625bb570e463362ab8f53f9a14c886b2e. Updates from the AMD public repository " amd-stg-open" were added to this base with one commit per file difference as of April 6, 2021. Log messages from amd-stg-open were harvested and merge into the per-file commits. Then aomp development commits from previous releases of AOMP that were not yet upstreamed or merged with amd-stg-open were cherry-picked to start the development of AOMP 13.0-3. During development some upstream (trunk) commits were merged to resolve issues. These are visible from the llvm-project commit log for AOMP 13.0-3.
This release of AOMP has significant enhancements for multi-image support and target-id. Much of which is not yet committed upstream, some of which are not yet tested and released through the ROCm compiler.
The AOMP compiler is a standalone complete source build of the compiler and supporting components. The sources for the supporting components were taken from ROCM 4.2. public sources.
Features:
- support for the --offload-arch option. Both cuda and hip offload kinds already support --offload-arch flag. One could think if it as an alias for --fopenmp-targets=TRIPLE -Xopenmp-target=TRIPLE -march=GPUNAME which greatly simplifies command line specification for OpenMP offloading. One only needs to specify --offload-arch=GPUNAME.
- A new cross-architecture clang tool called 'offload-arch' will determine the current GPUNAME of an active system with a supported GPU card. There are a number of options. However, with no options it prints to stdout the GPUNAME. So this command could be used to compile foo.c for openmp offloading "clang -fopenmp --offload-arch=
offload-arch
" Assuming $AOMP/bin is in your $PATH environment variable. - New runtime checking now looks at available images and selects an image whose requirements are satisfied with runtime capabilities. Requirements include the GPUNAME and special compilation features that create an image that only works when a system is operating with the special capabilities. These special requirements form what is known as target-id.
- Archive files on the command line are now checked to see if they contain offloading code for the specified offload target which are subsequently used to link the GPU image.
- Support for 16.04 was dropped
Fixes:
- [flang] Resolved: cannot compile modules with private allocatable structs.
- Removed invalid header definitions for log2 affecting Kokkos.
- QMCPACK unit tests now at 100% pass rate.
- Move hostcall required test to rtl, fix minor race.
- Centos 7 system headers had math functions with various return types for isnan/isinf, which caused conflicts with clang headers. Clang headers now use omp variants to resolve issue.
System Build
- Moved the construction of offloading libm from aomp-extras to openmp/libomptarget/deviceRTLs/libm Currently the libm device library is only needed for FORTRAN because clang system headers have architecture specific definitions for math functions for c and c++. The build of libm builds bc files for GPU linking in $AOMP/lib/libdevice/libm--.bc . These are built using the clang system headers by changing each static definition to external definitions by setting BUILD_MATH_BUILTINS_LIB .
Known issues to be resolved in 13.0-4
- aomp/examples failures:
- cloc/vector_copy_hip
- cloc/vector_copy_hip_omp
- hip/device_lib
- hip/writeIndex
- hip-openmp failures:
- aomp_hip_launch_test
- hip_host_register
- 36 Clang lit tests failing
- raja build failure
- -march option used with -fopenmp-targets fails to handle targetID in some cases (offload-arch should work fine)
- flang driver hardwires to amd triple if the --offload-arch= option is used.
- flang driver needs to handle target-id flags such as xnack.
AOMP Release 13.0-2
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
This release requires at minimum the rock-dkms from ROCm 4.1.
- Rebase all changes to the current LLVM development trunk which is the development of LLVM 13.
- Set the default number of teams to 4 times the number of computation units for improving GPU occupancy
- Support for adjusting the number of threads in a team based on VGPR usage of a kernel
- Enhanced the kernel trace (when LIBOMPTARGET_KERNEL_TRACE is set) with register usage information. This already includes the requested and actual number of teams and threads used for a kernel.
- Updated ROCm components to 4.1.x branches.
- Default Code Object v4
Known Issues:
- Regression with QMCPACK deterministic tests
- pow(double, int) is returning -inf
AOMP Release 11.12-0
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
These are the release notes for AOMP_11.12-0.
- Add new testing from openlibm testing. This tests the libm functions used for offloading in either openmp target region or hip kernel. It tests accuracy and exception situations.
- Integration of roctracer and rocprofiler.
- Integration of changes to support HPC Toolkit.
- Fix for Devito . This handles Lvalue References that are typically pointer to pointer kernel args.
- Support for fprintf.
- Fix a latent race between host runtime and devicertl.
- Initial support for gpu malloc and free. The internal (device rtl) need for malloc and free is for nested parallelism. This is used in two of our applications, possibly by accident. The initial implementation of malloc and free was broken. This change uses a malloc and free that uses hostrpc which is very slow. Eventually we will develop the malloc and free hostrpc stubs to use a smart heap allocation scheme and only go to the host when the heap is consumed. This change will slow down the applications (snap) till a smarter malloc and free are implemented. This change just removes the broken malloc and free and improves the device memory footprint.
- Increase detail of debug printing controlled by LIBOMPTARGET_KERNEL_TRACE environment variable.
- Support for Ubuntu 20.04.
- Move to ROCm 3.10 sources.
AOMP Release 11.11-2
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
- Move to ROCm 3.9 sources.
- Removal of hsa install directory. This required minor changes to build scripts of components that referenced hsa/include and hsa/lib.
- Minor build script updates including fix for lit checking on non-production builds.
- Support multiple devices with malloc and hostrpc
- Tested A+A (AOMP11 and AOCC2.3) with smoke test lto_teams. This test case shows how to plugin the AOCC linker to the AOMP compiler which provides proprietary CPU link time optimizations only available in AOCC. It relies on a clang driver option to specify an alternative linker. The Makefile in this test case was updated to check for LLVM 12 compiler and use the correct clang option for the alternative linker. In LLVM 11 and before the alternative linker option was -fuse-ld=. In LLVM 12 and later it will be --ld-path=
- Improve hostrpc version check
- Added max reduction offload feature to flang
- Developer documentation updates including some cleanup of the source-build prerequisites. We now require a specific version of cmake built from source. We also require the mesa-common-dev package needed by rocclr (vdi).
- This release has packages for ppc64le. These packages have not been tested. Some fixes were applied to flang to support build on ppc64le. The rocclr (vdi) component is very x86_64 specific. So it would not build on ppc64le. Since HIP and OpenCL depend on rocclr, these components were not build for the ppc64le build. Only OpenMP should work on ppc64le.
AOMP Release 11.11-1
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
This is a minor bug fix to 11.11-0. These are the updates:
- Fix problems with device math functions being ambiguous, especially the pow function.
- Added smoke test called math_pow for checking all variants of pow and doing a sqrt(integer).
- New Makefile for KOKKOS example to properly clean up KOKKOS install
- Fix the slices.cpp and math_modf.cpp Makefile so they call clang++. Using clang on a cpp file causes the c++ personality not to be defined which results in an resolved reference at link time.
- Update shared variable linkage which further reduces differences between nvptx and amdgcn LLVM IR codegen.
AOMP Release 11.11-0
THIS IS AN OLD RELEASE. DO NOT DOWNLOAD. PLEASE DOWNLOAD THE LATEST RELEASE.
These are the release notes for AOMP_11.11-0. The source code base for this release is the upstream LLVM 11 monorepo release/11.x sources as October 6, 2020 with hash value 176249bd6732a8044d457092ed932768724a6f06
-
Major fixes to internal clang math headers:
- This set of changes applies to clang internal headers to support OpenMP C, C++, and FORTRAN and for HIP C. This establishes consistency between NVPTX and AMDGCN offloading and between OpenMP, HIP, and CUDA. OpenMP uses function variants and header overlays to define device versions of functions. This causes clang LLVM IR codegen to mangled names of variants in both the definition and callsites of functions defined in the internal clang headers. These changes apply to headers found in the installation subdirectory lib/clang/11.0.0/include.
- These changes temporarily eliminates the use of the libm bitcode libraries for C and C++. Although math functions are now defined with internal clang headers, a bitcode library of the C functions defined in the headers is still built for FORTRAN toolchain linking because FORTRAN cannot use c math headers. This bitcode library is installed in lib/libdevice/libm-.bc. The source build of this bitcode library is done with the aomp-extras repository and the component built script build_extras.sh. In the future, we will introduce across the board changes to eliminate massive header files for math libraries and replace it with linking to bitcode libraries.
-
Usability updates:
- Add support for -gpubnames in Flang Driver
-
Performance updates:
- Runtime performance improvements for synchronous memory copy between host and device.
- Added a performant "integer to integer" pow function for c++ that does not convert to float.
-
Bug fixes:
- Fixed hostrpc cmake race condition in the build of openmp
- Add a fatal error if missing -Xopenmp-target or -march options when -fopenmp-targets is specified. However, we do forgive this requirement for offloading to host when there is only a single target and that target is the host.
- Fix a bug in InstructionSimplify pass where a compare of two constants of different sizes found in the optimization pass. This fixes issue #182 which was causing kokkos build failure.
- Fix openmp error message output for no_rocm_device_lib, was asserting.
- Changed linkage on constant per-kernel symbols from external to weaklinkageonly to prevent duplicate symbols when building kokkos.
-
New Feature:
- Added an example category for Kokkos. The Kokkos example makefile detects if Kokkos is installed and builds Kokkos from the web if not. See the script kokkos_build.sh in the bin directory for how we build Kokkos. Kokkos now builds cleanly with the OpenMP backend. However, only simple test cases are working.
-
Development changes:
- Switch branch naming scheme to aomp11. It was amd-stg-openmp. This change applies to repos amd-llvm-project, aomp-extras, and flang .
- Add environment variables ROCM_LLD_ARGS ROCM_LINK_ARGS ROCM_SELECT_ARGS to test driver options without compiler rebuild. This is not an upstream change.
-
Code reorganization in preparation for upstream:
- Rename libomptarget plugin from hsa to amdgpu following upstream.
- Merge hostrpc host and device library with libomptarget.
- Remove some memory management overhead in libomptarget plugin.
- Eliminate need for -lhostrpc during application link by merging with libomptarget.
- Remove multiple inessential differences from upstream across the toolchain.
-
Document Changes
- Moved prerequisites to independent file.
- Changed cmake instructions to recommend version 3.13.4.