Add support for Scan operator #2936

music-dino · 2024-04-01T14:32:10Z

Implement ONNX parsing support for the Scan operator
Resolves Scan operator is unsupported migraphx-benchmark/AMDMIGraphX#116

…e loop operator

… inputs

migraphx-bot · 2024-04-01T16:54:23Z

Test	Batch	Rate new 9017c1	Rate old b4c29f	Diff	Compare
torchvision-resnet50	64	1,750.14	1,750.01	0.01%	✅
torchvision-resnet50_fp16	64	4,178.45	4,178.72	-0.01%	✅
torchvision-densenet121	32	1,465.30	1,466.12	-0.06%	✅
torchvision-densenet121_fp16	32	2,551.06	2,554.71	-0.14%	✅
torchvision-inceptionv3	32	889.00	888.83	0.02%	✅
torchvision-inceptionv3_fp16	32	1,493.41	1,494.20	-0.05%	✅
cadene-inceptionv4	16	412.16	412.24	-0.02%	✅
cadene-resnext64x4	16	419.52	419.57	-0.01%	✅
slim-mobilenet	64	4,015.61	4,016.09	-0.01%	✅
slim-nasnetalarge	64	101.02	101.00	0.01%	✅
slim-resnet50v2	64	1,679.93	1,680.51	-0.03%	✅
bert-mrpc-onnx	8	616.30	616.41	-0.02%	✅
bert-mrpc-tf	1	279.06	279.98	-0.33%	✅
pytorch-examples-wlang-gru	1	333.33	321.71	3.61%	🔆
pytorch-examples-wlang-lstm	1	292.93	293.21	-0.10%	✅
torchvision-resnet50_1	1	469.75	471.24	-0.32%	✅
cadene-dpn92_1	1	246.48	246.78	-0.12%	✅
cadene-resnext101_1	1	203.92	204.01	-0.05%	✅
onnx-taau-downsample	1	206.17	206.06	0.05%	✅
dlrm-criteoterabyte	1	22.89	22.90	-0.05%	✅
dlrm-criteoterabyte_fp16	1	43.86	43.85	0.01%	✅
agentmodel	1	6,140.73	6,285.51	-2.30%	✅
unet_fp16	2	34.28	34.32	-0.10%	✅
resnet50v1_fp16	1	586.72	601.67	-2.48%	✅
resnet50v1_int8	1	567.91	567.26	0.12%	✅
bert_base_cased_fp16	64	646.54	646.81	-0.04%	✅
bert_large_uncased_fp16	32	198.87	198.83	0.02%	✅
bert_large_fp16	1	116.81	116.93	-0.10%	✅
distilgpt2_fp16	16	1,211.27	1,210.32	0.08%	✅
yolov5s	1	301.63	301.46	0.05%	✅
tinyllama	1	23.31	23.32	-0.02%	✅
vicuna-fastchat	1	133.46	134.89	-1.06%	✅
whisper-tiny-encoder	1	244.11	244.10	0.01%	✅
whisper-tiny-decoder	1	256.06	256.24	-0.07%	✅

Check results before merge 🔆

migraphx-bot · 2024-04-01T16:54:25Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

codecov · 2024-04-02T06:52:18Z

Codecov Report

Attention: Patch coverage is 96.47887% with 5 lines in your changes missing coverage. Please review.

Project coverage is 92.27%. Comparing base (ee8f12e) to head (cb59a5e).
Report is 159 commits behind head on develop.

Files with missing lines	Patch %	Lines
src/onnx/parse_scan.cpp	96.29%	4 Missing ⚠️
src/include/migraphx/run_loop.hpp	83.33%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2936      +/-   ##
===========================================
+ Coverage    92.24%   92.27%   +0.03%     
===========================================
  Files          495      497       +2     
  Lines        19849    19985     +136     
===========================================
+ Hits         18309    18441     +132     
- Misses        1540     1544       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/onnx/parse_scan.cpp

umangyadav · 2024-06-12T12:40:52Z

src/onnx/parse_scan.cpp

+    }
+
+    std::vector<int64_t>
+    parse_dirs(onnx_parser::node_info& info, const std::string& name, long expected_size) const


Suggested change

parse_dirs(onnx_parser::node_info& info, const std::string& name, long expected_size) const

parse_dirs(onnx_parser::node_info& info, const std::string& name, size_t expected_size) const

for consistency

test/ref/scan_slice.cpp

umangyadav · 2024-06-12T12:43:21Z

docs/dev/onnx_operators.rst

+| Scan                     | ✅        | UINT8, UINT16,  | ``identity``,                |
+|                          |           | UINT32, UINT64, | ``sequence``                 |


Why identity and sequence are mentioned here ?

They're mentioned in the Loop op entry as well. Since scan relies on Loop, I thought I'd carry over the info.

Okay. I do not know what that means though. @attila-dusnoki-htec can you comment this limitation is about ?

src/include/migraphx/op/scan_slice.hpp

umangyadav · 2024-06-12T12:49:16Z

src/onnx/parse_scan.cpp

+        for(auto i = 0; i < n; ++i)
+            new_params.push_back(
+                mod->add_parameter("state_var" + std::to_string(i), params[i]->get_shape()));


you can use std::transform here

I use the loop iterator value in the loop body within the std::to_string as well, so this isn't that well suited to std::transform.

src/targets/gpu/lowering.cpp

src/include/migraphx/run_loop.hpp

src/onnx/parse_scan.cpp

umangyadav · 2024-06-12T16:13:59Z

src/onnx/parse_scan.cpp

+    {
+        std::vector<int64_t> perm(rank);
+        std::iota(perm.begin(), perm.end(), 0);
+        std::copy(perm.begin() + 1, perm.begin() + 1 + axis, perm.begin());


I wonder if begin() + 1 + axis would go out of bound if axis is the last axis.

For rank r, the last axis would be r-1, meaning the range would be [begin + 1, begin + r). Since the perm vector has r elements, begin + r is the first element beyond the last element, making it the valid end iterator.

umangyadav · 2024-06-12T18:59:58Z

src/onnx/parse_scan.cpp

+            // Loop scan_outputs are concatenated along axis 0, so it must be transposed to the
+            // index specified by the corresponding scan_output_axis
+            auto perm = make_perm_for_scan_out(o->get_shape().ndim(), scan_output_axes[i]);
+            ret.push_back(info.add_instruction(make_op("transpose", {{"permutation", perm}}), o));


I am missing something here. Can you explain why the transpose is necessary ? Transpose will probably not result in correct element order.

I've expanded the comment, hopefully it's more helpful now.
There are a couple of test cases that cover this scenario, and the correct result is produced.

umangyadav · 2024-06-12T19:00:30Z

src/onnx/parse_scan.cpp

+        mod->replace_return(returns);
+    }
+
+    std::vector<int64_t> make_perm_for_scan_out(int64_t rank, int64_t axis) const


I don't understand this. can you add a docstring on what this is trying to do ?

I've added a comment.

umangyadav · 2024-06-12T19:02:12Z

src/include/migraphx/op/loop.hpp

+                    int64_t iter,
+                    int64_t iter_num) const


better to name them as curr_iter and num_iters

test/py/onnx_backend_test.py

causten

This is hitting a failure. Build this and run "make check".

[2024-07-17T00:46:28.905Z] 358/358 Test #357: test_py_3.10_backend ......................................................***Failed 965.59 sec

[2024-07-17T00:46:28.905Z] .s.s.s.s.sssssssss.s.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssssssss.sssssssssssssssssss.s.sssssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.sssssssssssssssssssss.s.s.s.sssssssssssssssssssssssss.s.s.s.sssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssss.s.s.s.sssssss.sss.s.s.s.s.s.s.s.s.s.sssssssssssssssssss.s.s.s.sssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.sssss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sssssssss.sssssssssss.s.s.sssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.sssssssssssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.sssssssssssssssssssssssssssss.s.sssssssssssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.sEsss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.s.sssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.sssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.s.sss.sssssss.sssss.sssssssssss.sssssssssssssssssssssssssssssss.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.s.s.s.sss.sss.s.s.s.sss.sss.sss.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.sssssssssss.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.sssssssssssssssssssssssssssss.s.s.sssssssssssss

[2024-07-17T00:46:28.905Z] ======================================================================

[2024-07-17T00:46:28.905Z] ERROR: test_scan9_sum_cpu (main.OnnxBackendNodeModelTest)

[2024-07-17T00:46:28.905Z] ----------------------------------------------------------------------

[2024-07-17T00:46:28.905Z] Traceback (most recent call last):

[2024-07-17T00:46:28.905Z] File "/usr/local/lib/python3.10/dist-packages/onnx/backend/test/runner/init.py", line 290, in device_test_func

[2024-07-17T00:46:28.905Z] return test_func(*args, device=device, **kwargs)

[2024-07-17T00:46:28.905Z] File "/usr/local/lib/python3.10/dist-packages/onnx/backend/test/runner/init.py", line 382, in run

[2024-07-17T00:46:28.905Z] prepared_model = self.backend.prepare(model, device)

[2024-07-17T00:46:28.905Z] File "/home/jenkins/workspace/AMDMIGraphX_PR-2936/build/lib/onnx_migraphx/backend.py", line 125, in prepare

[2024-07-17T00:46:28.905Z] return cls.prepare(bin, device, **kwargs)

[2024-07-17T00:46:28.905Z] File "/home/jenkins/workspace/AMDMIGraphX_PR-2936/build/lib/onnx_migraphx/backend.py", line 112, in prepare

[2024-07-17T00:46:28.905Z] inf = migraphx.parse_onnx_buffer(model)

[2024-07-17T00:46:28.905Z] RuntimeError: /home/jenkins/workspace/AMDMIGraphX_PR-2936/src/onnx/parse_scan.cpp:124: parse: Slice: Sliced scan input 0 shape {float_type, {3}, {1}} does not match corresponding body input shape {float_type, {2}, {1}}

[2024-07-17T00:46:28.905Z]

[2024-07-17T00:46:28.905Z] ----------------------------------------------------------------------

[2024-07-17T00:46:28.905Z] Ran 2634 tests in 963.388s

[2024-07-17T00:46:28.905Z]

[2024-07-17T00:46:28.905Z] FAILED (errors=1, skipped=1896)

[2024-07-17T00:46:28.905Z] Default GPU device is used ....

music-dino · 2024-07-22T09:10:05Z

This is hitting a failure. Build this and run "make check".

[2024-07-17T00:46:28.905Z] 358/358 Test #357: test_py_3.10_backend ......................................................***Failed 965.59 sec

[2024-07-17T00:46:28.905Z] .s.s.s.s.sssssssss.s.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sss.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.sssssssss.sssssssssssssssssss.s.sssssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.sssssssssssssssssssss.s.s.s.sssssssssssssssssssssssss.s.s.s.sssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssss.s.s.s.sssssss.sss.s.s.s.s.s.s.s.s.s.sssssssssssssssssss.s.s.s.sssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.s.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.sssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssss.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.sssss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sss.sssssssss.sssssssssss.s.s.sssssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.sssssssssssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.sssssssssssssssssssssssssssss.s.sssssssssssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.sEsss.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.s.s.s.s.s.sssssssssssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssssssssssssssssssssss.s.sssssssssssssssssssss.s.s.s.s.s.s.s.s.s.s.sssssssssssssss.s.s.s.s.s.sssssssssssssssssssssss.s.s.s.s.s.s.s.sss.sssssss.sssss.sssssssssss.sssssssssssssssssssssssssssssss.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sssss.s.s.s.s.s.s.s.s.s.s.s.sss.sss.s.s.s.sss.sss.sss.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.sssssssssss.s.s.s.s.s.sss.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.s.sss.s.s.s.s.s.sssssssssssssssssssssssssssss.s.s.sssssssssssss

[2024-07-17T00:46:28.905Z] ======================================================================

[2024-07-17T00:46:28.905Z] ERROR: test_scan9_sum_cpu (main.OnnxBackendNodeModelTest)

[2024-07-17T00:46:28.905Z] ----------------------------------------------------------------------

[2024-07-17T00:46:28.905Z] Traceback (most recent call last):

[2024-07-17T00:46:28.905Z] File "/usr/local/lib/python3.10/dist-packages/onnx/backend/test/runner/init.py", line 290, in device_test_func

[2024-07-17T00:46:28.905Z] return test_func(*args, device=device, **kwargs)

[2024-07-17T00:46:28.905Z] File "/usr/local/lib/python3.10/dist-packages/onnx/backend/test/runner/init.py", line 382, in run

[2024-07-17T00:46:28.905Z] prepared_model = self.backend.prepare(model, device)

[2024-07-17T00:46:28.905Z] File "/home/jenkins/workspace/AMDMIGraphX_PR-2936/build/lib/onnx_migraphx/backend.py", line 125, in prepare

[2024-07-17T00:46:28.905Z] return cls.prepare(bin, device, **kwargs)

[2024-07-17T00:46:28.905Z] File "/home/jenkins/workspace/AMDMIGraphX_PR-2936/build/lib/onnx_migraphx/backend.py", line 112, in prepare

[2024-07-17T00:46:28.905Z] inf = migraphx.parse_onnx_buffer(model)

[2024-07-17T00:46:28.905Z] RuntimeError: /home/jenkins/workspace/AMDMIGraphX_PR-2936/src/onnx/parse_scan.cpp:124: parse: Slice: Sliced scan input 0 shape {float_type, {3}, {1}} does not match corresponding body input shape {float_type, {2}, {1}}

[2024-07-17T00:46:28.905Z]

[2024-07-17T00:46:28.905Z] ----------------------------------------------------------------------

[2024-07-17T00:46:28.905Z] Ran 2634 tests in 963.388s

[2024-07-17T00:46:28.905Z]

[2024-07-17T00:46:28.905Z] FAILED (errors=1, skipped=1896)

[2024-07-17T00:46:28.905Z] Default GPU device is used ....

The issue is resolved.

music-dino added 12 commits March 12, 2024 15:04

Implementation start

74333b0

Implement scan base case

a3011f3

Implement scan output direction and axes attribute support

72d3716

Implement scan_input_axes attribute support

f7a7e9f

Implement ScanSlice operator

7793cb0

Implement via adapting subgraph to a what a loop expects and using th…

3fcbb68

…e loop operator

Add support for scan_output_axes and code refactoring

81e0804

Refactoring

18aabad

Add support for scan_output_directions, modify tests to have two scan…

ed1382a

… inputs

Implement additional tests, add comments, fix cppcheck and tidy issues

3e57489

Implement negative tests

4824f77

Merge remote-tracking branch 'upstream/develop' into scan_op_support

a4c2e28

music-dino requested a review from causten as a code owner April 1, 2024 14:32

Update onnx_operators.rst

495965f

music-dino requested a review from a team as a code owner April 1, 2024 14:47

music-dino added 2 commits April 2, 2024 06:40

Fix format issues and test failures

5a106dd

Merge remote-tracking branch 'upstream/develop' into scan_op_support

45a04c5

Add deduction guides to vectors

51bf14f

causten requested review from umangyadav and pfultz2 April 8, 2024 12:05

TedThemistokleous self-requested a review April 17, 2024 13:07