Releases: ROCm/AMDMIGraphX
rocm-5.4.1
ROCm release v5.4.1
rocm-5.4.0
ROCm release v5.4.0
788ce62 Use minimum block size of 64 threads (#1427)
4d471bd Add JIT pad (#1411) (#1441)
360b180 Updates for RC1 (#1425)
83784c5 memset fix (#1414)
01d0ecf Fix rank 2 batch norm (#1412)
32f6388 Refactor dynamic padding mode (#1387)
be309bf Rewrite TF batch norm; remove batch_norm_inference (#1371)
4f3cc41 Simplify unit algebraic ops (#1281)
f7d987b Stream sync Changset (#1358)
a9a4740 Fast softmax (#1290)
c9ffb38 Add output_alias and runs_on_offload_target flags for the custom ops (#1309)
e19f78a Use find_2.0 API for the convolution (#1346)
c2842c1 Fix invalid program in debug mode from find_splits (#1390)
70e6396 Add compute_fp32 flag for quant_gemm tests (#1360)
4011819 Add onnx mod operator gpu cpu (#1306)
c00f820 Rewrite ONNX parse batch norm (#1362)
492c4a6 Use larger vector size instead of preloading for broadcasted inputs (#1389)
66bbff1 Upgrade cppcheck to 2.9 (#1400)
94bc41d check concurrency on PR level with one running and one pending performance tests (#1401)
1b575b5 update codecov version (#1402)
8ea8473 Remove unused device functions (#1394)
d9578ba Parameterize epsilon for layernorm kernel (#1367)
9a70050 Multibroadcast find_mul_conv (#1384)
97a1ed2 Improve layernorm and reductions performance (#1348)
34c08db Disabled concurrency, queue added to perf-test.yml (#1386)
10f37f4 Fix typo for add_sigmoid (#1385)
255fb11 Update deprecated Pybind constructor (#1382)
e1e36cd [mlir] Replaced find_library
with find_package
to locate MLIR static library (#1373)
333860c Reduce problem size of unbatched_gemm tests (#1383)
4b76dd0 Fix split_reshape for slice len of 1 (#1379)
7662d9c Implement concat using jit compilation (#1356)
827baee expose underlying migraphx::argument data pointer in pybind (#1376)
a10a8ef Use rocblas_gemm_ex for batched gemms with broadcasted B (#1354)
d78bcdf Bump version to 2.4 (#1375)
ed2c73a Remove unused headers (#1363)
f266705 Fix TF literal parsing for relu6 (#1370)
60aa0e4 Fix accuracy bug when vectorizing slices (#1364)
d37a4df Enable cppcheck rule for 'not', 'or' keywords (#1361)
794a433 Add pass to rewrite gelu as fast gelu (#1299)
ed7973d Insert contiguous for reshape as necessary (#1351)
349635c Show kernel time when using gpu-driver (#1289)
8752875 Improvements to handling and add constant passed to dot operator (#1280)
af7f22d Fix test suite compile in Ubuntu 22.04 (#1353)
1704bb0 fix bug size_t -> std::size_t (#1350)
fa3c21f Dynamic ref NMS (#1288)
79e15ca Update is_supported (#1334)
b691abd Enable tidy for fpga backend (#1347)
3c133f8 Remove print (#1345)
ac507c6 Fix json strings in driver models (#1341)
8045f7c pybind updates for torch_migraphx library (#1323)
7c8f269 run performance benchmarks on types (#1343)
1784584 Add jit layernorm fusion (#1301)
18e4a2c Improve horizontal fusion of contiguous (#1292)
0e17a72 Fix softmax accuracy issues (#1342)
cb53687 formatting (#1339)
bab9502 Remove prints (#1338)
55cb7d3 Enable switching to bare pointer ABI for MLIR (#1333)
7ecb2de onnxruntime renamed master to main (#1336)
rocm-5.3.3
ROCm release v5.3.3
rocm-5.3.2
ROCm release v5.3.2
rocm-5.3.1
ROCm release v5.3.1
rocm-5.3.0
Improvements include...
Accuracy update (#1374)
Final performance improvements for release (#1369)
53merge v2 (#1357)
Allow license_stamper.py to be ran from any directory (#1332)
Explicitly set rocblas_pointer_mode in examples (#1331)
Imply type of literal returned based on input protobuff for zero elem… (#1326)
Dynamic ref convolution op (#1224)
Update README.md (#1327)
Improve help and error reporting in driver (#1258)
Add support for tuning db access in mlir kernel (#1307)
Add accuracy checker tool (#1315)
Avoid registering host buffer ptr multiple times during hip copies (#1245)
Add node name to debug output of PARSE_IF (#1318)
Fix literal type in the instance_norm parsing (#1317)
Add onnx mod operator (#1302)
Add fpga target (#1304)
Add performance testing yamls (#1313)
Improve error reporting in the API (#1274)
Change ownership to company email (#1310)
Dynamic check_shapes (#1295)
Fix TF parsing for creating literals and Fix name lookups for input params (#1298)
Dynamic dimension input onnx parser (#1249)
Fix op includes (#1308)
Fix test case for min & max operators (#1305)
Reduce header inclusion in op headers (#1271)
Add tests for C API (#1266)
create the dev package (#1293)
change to a cached github repo for blaze prereq (#1291)
Use current device when constructng context (#1294)
Add restrict to jit kernel params (#1300)
Improve kernel code generation (#1285)
Update perf report to show the number of operators and per operator avg time in summary (#1287)
Add env var to enable debug symbols for gpu kernels (#1284)
Add is_supported and get_target_assignments (#1269)
Dyn shape update (#1199)
Add a step to unsqeeze axis (#1242)
Verify load and save (#1265)
Add jit softmax (#1243)
Horizontally fuse contiguous operators (#1232)
Add mlir fusion (#1251)
Add method to insert multiple instructions (#1178)
Invalid parameter for yolov4 example (#1275)
NMS refactor, enable nonstandard shape (#1257)
Update driver models to use json strings (#1244)
Custom Op example using MIOpen calls (#1208)
Custom Op example using rocBLAS calls (#1211)
Custom Op example using HIP kernel (#1200)
Get parent module in the pass manager (#1181)
bug fix: register the miopen_fusion op. (#1267)
Use jit for contiguous operator (#1217)
Adding in check_stamped.py to tools/ (#1255)
Add compute_method for the experimental custom op (#1194)
remove eliminate_workspace pass (#1254)
Fix code block issue with .ipynb files. (#1263)
Update license files (#1248)
Fixing misspelled macro to enable MIOpen hidden find mode API (#1250)
Update lowering of Dot operator (#1247)
Update tf_parser to have add_common_op() for parse_relu6 (#1241)
Create allocate op and replace_allocate pass (#1183)
Instruction distance check fix (#1237)
Use env var for creds
Add vectorized reduce (#1202)
Prioritizing int8 over int8x4 when it is applicable (#1218)
Group code objects by kernel name in perf report summary (#1234)
Fix compilation on Debian bookworm/sid (#1229)
Fix dangling reference with gemm add fusion (#1233)
Update protobuf version (#1228)
Bump tensorflow from 2.6.4 to 2.7.2 in /examples/nlp/python_bert_squad (#1227)
Improve eliminate contiguous pass (#1223)
renamed to main from master (#1226)
Parallelize evaluations in propagate_constant (#1220)
Upgrade to cppcheck 2.8 and fix new issues found (#1225)
Used wrong path to download the bertsquad-10.onnx model (#1221)
Bump tensorflow from 2.5.3 to 2.6.4 in /examples/nlp/python_bert_squad (#1219)
Improve applicable batched gemms (#1214)
Remove std references in runtime compilation (#1186)
Fuse gemm add with pointwise fusions (#1213)
Fix onnx mean parsing for integral inputs (#1209)
Rename pointwise ops (#1145)
Improve matching with has_value when there are convert operators (#1212)
renamed variables for module from p to m (#1204)
Update install_prereqs.sh for individual use (#1197)
Prefuse layernorm for gpu (#1190)
Updated a path to the bert-squad onnx file after upstream changed path (#1201)
Expose add_literal
in C and Python API (#1173)
Refactor vectorization and preloading for pointwise fusions (#1184)
upgrade docker images to ROCm 5.0.2 (#1133)
Add compile tests for gpu math functions (#1182)
Cppcheck fixes (#1195)
Extend lifetimes in C++ API (#1139)
Bumping version to support next ROCm release (#1192)
rocm-5.2.3
No changes from rocm-5.2.1
rocm-5.2.1
Enabled the devel migraphx package
Resolved a bug where migraphx could not run its own binary output files
rocm-5.2.0
Improvements include...
Add GatherND operator (#1089)
Add lane reduction (#1180)
Expose get_queue method for context in API (#1161)
ReverseSequence op (#1177)
Refactor Pooling and implement ONNX LpPool and GlobalLpPool (#1152)
Reduce with runtime compilation (#1150)
Half2 overloads (#1157)
Fix file download for resnet50 example (#1164)
Fix problem with incomplete types with older clang versions (#1174)
Fix out-of-bounds access when generate uses nonpacked tensors (#1160)
parallelize the ref implementation of the gemm operator (#1142)
scatter operator refactoring to include reduction (#1124)
fix a bug in create tensor_view with vec data type (#1155)
Fix comparisons in migraphx::value class (#1146)
Python Binding for the Manual Graph Buidling (#1143)
rocm-5.1.3
Identical to 5.1.1