Releases: ROCm/aomp
rocm-5.4.2
ROCm release v5.4.2
rocm-5.4.1
ROCm release v5.4.1
AOMP Release 16.0-3
These are the release notes for AOMP 16.0-3. This release uses modifications to the LLVM development trunk called the "amd-stg-open" branch. This is found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. Some AMD modifications are experimental and/or under review for the LLVM upstream mono-repo. The AOMP release is a snapshot of amd-stg-open and supporting repositories to build various components.
For AOMP 16.0-3, the last trunk commit is 11e86868c1a1ee67a1d88ef84b68193d06dc996 on Nov 14, 2022. This is the 4th AOMP release for LLVM 16 development. The last amd-only commit is b642bb5cf84bbbdcc3e8748c5ceeb72c7bb07144 on Dec 2, 2022. This forms a frozen branch now called "aomp-16.0-3". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-16.0-3.
AOMP is a "standalone" build of all necessary ROCm components with the exception of the kernel module and libdrm. The non llvm-project components for this release were built with ROCM 5.4.0 sources.
These are the changes from 16.0-2 to 16.0-3 include:
- Build includes gfx90c, gfx1035, and gfx1036.
- Fix to rocm_agent_enumerator to correctly identify gfx90c.
- Fix issue #435 "abs undefined within device block #435".
- More enhancements to xteam reductions .
- Ignore map clause option with USM.
- Additional support for OMPT functions "get_device_time" and "get_record_type".
- NUM_QUEUES_PER_DEVICE default to 1.
- Fixed clang-build-select-link to honor -fdisable-host-devmem.
- Fixed openmp lib-debug build overwriting release libraries/plugins.
- Updated cmake version to 3.22.1.
- Added Ubuntu 22.04 package.
Errata:
(potential regressions from 16.0-2):
- Smoke test failures:
clang-337336 - Performance decrease, may cause test to timeout after 1 min. 16.0-2 showed 30-40 secs.
(potential regressions from 16.0-1):
- Smoke test failures (issue at -O0):
clang-ifaces: core dump (gfx908)
clang-337336: core dump gfx908)
clang-325070: core dump (gfx908)
(potential regressions from 16.0-0):
- Performance decrease with lulesh
- Performance decrease with Nekbone (performance improved in 16.0-3, but still not at 16.0-0 levels.)
- Smoke test failures:
flang-315870: (resolved by building this test case with cov5)
managed_memory: segfault, when 2+ devices are present - Hip example failure:
device-lib
rocm-5.4.0
ROCm release v5.4.0
rocm-5.3.3
ROCm release v5.3.3
AOMP Release 16.0-2
These are the release notes for AOMP 16.0-2. This release uses modifications to the LLVM development trunk called the "amd-stg-open" branch. This is found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. Some AMD modifications are experimental and/or under review for the LLVM upstream mono-repo. The AOMP release is a snapshot of amd-stg-open and supporting repositories to build various components.
For AOMP 16.0-2, the last trunk commit is 0ccff030f3b4145bd658e362a63db9aae2942bee on Oct 15 2022. This is the 3rd AOMP release for LLVM 16 development. The last amd-only commit is 863a830a66c8bdb5371e56030961449df24d5c48 on Nov 2, 2022. This forms a frozen branch now called "aomp-16.0-2". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-16.0-2. Currently, the amd-only content differs from the trunk by 62480 lines in 459 files.
AOMP is a "standalone" build of all necessary ROCm components with the exception of the kernel module and libdrm. The non llvm-project components for this release were built with ROCM 5.3.x sources.
These are the changes from 16.0-1 to 16.0-2 include:
- Dropped support for Ubuntu 18.0-4.
- Fix for early USM failure - openfoam does not hang
- Enhance xteam reductions (with codegen). Support for integer data types.
- Fix for double _Complex scalars in target region, test/smoke-fails/double_complex_scalar now works. This test will move to smoke in next release.
- Support for target teams loop directive
- Force synchronous execution of regions controlled via OMPX_FORCE_SYNC_REGIONS
- New environment variable GPU_MAX_HW_QUEUES controls number of HSA queues created, default is 4. This change actually occurred in 16.0-1 but was not listed in release notes.
Errata:
(potential regressions from 16.0-1):
- Smoke test failures:
clang-ifaces: core dump (gfx908)
clang-337336: core dump gfx908)
clang-325070: core dump (gfx908)
(potential regressions from 16.0-0):
- Performance decrease with lulesh
- Performance decrease with Nekbone
- Smoke test failures:
flang-315870: (resolved by building this test case with cov5)
managed_memory: segfault, when 2+ devices are present - Hip example failure:
device-lib
rocm-5.3.2
ROCm release v5.3.2
rocm-5.3.1
ROCm release v5.3.1
AOMP Release 16.0-1
These are the release notes for AOMP 16.0-1. This release uses modifications to the LLVM development trunk called the "amd-stg-open" branch. This is found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. Some AMD modifications are experimental and/or under review for the LLVM upstream mono-repo. The AOMP release is a snapshot of amd-stg-open and supporting repositories to build various components.
For AOMP 16.0-1, the last trunk commit is aa89f08afad7ee0581c39638abd8ee0df9ba1c65 on Oct 17 2022. This is the 2nd AOMP release for LLVM 16 development. The last amd-only commit is 16791f61b04f07a7968a67c18ed41388279018d5 on Oct 13, 2022. This forms a frozen branch now called "aomp-16.0-1". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-16.0-1 . Currently, the amd-only content differs from the trunk by 63,997 lines in 467 files.
AOMP is a "standalone" build of all necessary ROCm components with the exception of the kernel module and libdrm. The non llvm-project components for this release were built with ROCM 5.3.x sources.
The changes from 16.0-0 to 16.0-1 include:
- Enhanced xteam reductions, no codegen.
- Optimized wait for signals (perf gain).
- Fix aompcc and mark for deprecation.
- Added switch for code object version 5. Version 4 is still default.
- Support for gfx1100 - gfx1103.
- Support for order(concurrent).
- Build OpenMP warnings cleaned up.
- Support atomic min/max on MI200.
- Support for device new and delete.
- Bumped cmake version to 3.18.5.
- Switch to ROCm 5.3 sources.
Errata: (potential regressions from 16.0-0)
- Performance decrease with lulesh
- Performance decrease with Nekbone
- Smoke test failure: flang-315870
- Hip example failure: device-lib
rocm-5.3.0
ROCm release v5.3.0