From 9a7c6c5181d64f13925088a21daf7e95822e7ebe Mon Sep 17 00:00:00 2001 From: Chris Austen Date: Tue, 9 Jul 2024 19:05:32 -0400 Subject: [PATCH 1/2] Updates to the Changelog for 6.2 --- CHANGELOG.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 60127e0209b..de9de0006c2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,82 @@ Full documentation for MIGraphX is available at [https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/](https://rocmdocs.amd.com/projects/AMDMIGraphX/en/latest/). +## MIGraphX 2.10 for ROCm 6.2.0 + +### Additions + +* Added support for ONNX Runtime MIGraphX EP on Windows +* Added FP8 Python API +* Added examples for SD 2.1 and SDXL +* Improved Dynamic Batch to support BERT +* Added a `--test` flag in migraphx-driver to validate the installation +* Added support for ONNX Operator: Einsum +* Added uint8 support in ONNX Operators +* Introduced Split-K as a performance option +* Added fusion for group convolutions +* Added rocMLIR conv3d support +* Added rocgdb to the Dockerfile + + +### Optimizations + +* Improved ONNX Model Zoo coverage +* Reorganized memcpys with ONNX Runtime to improve performance +* Replaced scaler multibroadcast + unsqueeze with just a multibroadcast +* Improved MLIR kernel selection for multibroadcasted GEMMs +* Improved details of the perf report +* Enable mlir by default for GEMMs with small K +* Allow specifying dot or convolution fusion for mlir with environmental flag +* Improve performance on small reductions by doing multiple reduction per wavefront +* Add additional algebraic simplifications for mul-add-dot sequence of operations involving constants +* Use MLIR attention kernels in more cases +* Enables MIOpen and CK fusions for MI300 gfx arches +* Support for QDQ quantization patterns from Brevitas which have explicit cast/convert nodes before and after QDQ pairs +* Added Fusion of "contiguous + pointwise" and "layout + pointwise" operations which may result in performance gains in certain cases +* Added Fusion for "pointwise + layout" and "pointwise + contiguous" operations which may result in performance gains when using NHWC layout +* Added Fusion for "Pointwise + concat" operation which may help in performance in certain cases +* Fixes a bug in "concat + pointwise" fusion where output shape memory layout wasn't maintained +* Simplifies "slice + concat" pattern in SDXL UNet +* eliminates ZeroPoint/Shift in QuantizeLinear or DeQuantizeLinear ops if zero points values are zeros +* Improved inference performance by fusing Reduce to Broadcast +* Added additional information when printing the perf report +* Improve scalar fusions when not all strides are 0 +* Added support for multi outputs in pointwise ops +* Improve reduction fusion with reshape operators +* Use the quantized output when an operator is used again +* Enabled Split-k GEMM perf configs for rocMLIR based GEMM kernels for better performance on all Hardware + + +### Fixes + +* Super Resolution model verification failed with FP16 +* Suppressed confusing messages when compiling the model +* Mod operator failed to compile with int8 and int32 inputs +* Prevented spawning too many threads for constant propagation when parallel STL is not enabled +* Fixed a bug when running migraphx-driver with the --run 1 option +* Layernorm Accuracy fix: calculations in FP32 +* Update Docker generator script to ROCm 6.1 to point at Jammy +* Floating Point exception fix for dim (-1) in reshape operator +* Fixed issue with int8 accuracy and models which were failing due to requiring a fourth bias input +* Fixed missing inputs not previously handled for quantized bias for the weights, and data values of the input matrix +* Fixed order of operations for int8 quantization which were causing inaccuracies and slowdowns +* Removed list initializer of prefix_scan_sum which was causing issues during compilation and resulting in the incorrect constructor to be used at compile +* Fixed the MIGRAPHX_GPU_COMPILE_PARALLEL flag to enable users to control number of threads used for parallel compilation + + + +### Changes + +* Changed default location of libraries with release specific ABI changes +* Reorganized documentation in GitHub + + +### Removals + +* Removed the `--model` flag with migraphx-driver + + + ## MIGraphX 2.9 for ROCm 6.1.0 ### Additions From 12fd6ed02a79752aba6ff8187c576adf0a32ddff Mon Sep 17 00:00:00 2001 From: Chris Austen Date: Wed, 10 Jul 2024 09:45:05 -0400 Subject: [PATCH 2/2] Update CHANGELOG.md --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index de9de0006c2..91843ae6216 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,7 +14,7 @@ Full documentation for MIGraphX is available at * Added a `--test` flag in migraphx-driver to validate the installation * Added support for ONNX Operator: Einsum * Added uint8 support in ONNX Operators -* Introduced Split-K as a performance option +* Enabled Split-k kernel configurations for performance improvements * Added fusion for group convolutions * Added rocMLIR conv3d support * Added rocgdb to the Dockerfile