Addition of 4 pytorch onnx-export models. #278

saienduri · 2024-07-03T12:11:17Z

This commit adds testing for 4 models that we have noticed to be unstable and want to keep passing through CI:

mit-b0
mobilebert-uncased
t5-base
t5-large

These models are run through the onnx mode and pytorch framework where the pytorch model is exported to onnx, imported to torch IR using the onnx importer, and then fed to IREE to lower to linalg and run the module.

We currently don't support the use of external parameters through the onnx flow thus the lack of parameter files and splat testing

Also added the --iree-input-demote-i64-to-i32 flag to the cpu models config file so that we are in line with what we run in e2eshark for any cpu code generation

ScottTodd

Thanks! Generally looks great. A few ideas for improvement, but could proceed without all addressed.

ScottTodd · 2024-07-03T16:16:32Z

iree_tests/configs/models_cpu_llvm_task.json

    "iree_compile_flags" : [
      "--iree-hal-target-backends=llvm-cpu",
-      "--iree-llvmcpu-target-cpu-features=host"
+      "--iree-llvmcpu-target-cpu-features=host",
+      "--iree-input-demote-i64-to-i32"
    ],


Also added the --iree-input-demote-i64-to-i32 flag to the cpu models config file so that we are in line with what we run in e2eshark for any cpu code generation

I'm not sure I agree with this. I'm worried about the number of flags we have developers using, and I'd like to have more visibility into which models require which flags and why. One way we can do that is by having users of the test suite test with default flags.

In this case, the --iree-input- flags are sort of special in that they are generic across backends so I do see a case for using some of them... maybe in a _compatibility_flags test suite config, then we could diff the XFAILs between that and the _default_flags config.

ScottTodd · 2024-07-03T16:24:09Z

iree_tests/pytorch/models/onnx-export/mit-b0/model.mlirbc

These models are run through the onnx mode and pytorch framework where the pytorch model is exported to onnx, imported to torch IR using the onnx importer, and then fed to IREE to lower to linalg and run the module.

That seems rather roundabout... I guess ONNX there is used as a serialization format?

IMO if we're adding tests for these models we should add both the direct and onnx-export versions.

ScottTodd · 2024-07-03T16:25:45Z

iree_tests/configs/models_cpu_llvm_task.json

+    "expected_run_failures": [
+      "pytorch/models/onnx-export/mobilebert-uncased",
+    ]


This commit adds testing for 4 models that we have noticed to be unstable and want to keep passing through CI

Can we add models that we think are stable too? :) I'd like to prefetch all models from e2eshark and get them running here if possible, rather than cherrypick a few after we notice issues.

ScottTodd · 2024-07-03T16:28:58Z

iree_tests/configs/models_gpu_rocm_gfx90a.json

      // error: 'builtin.module' op failed to run transform dialect passes
      // (might need to drop the iree-codegen-transform-dialect-library flag)


Can drop this comment since the attention_and_matmul_spec is removed.
I also saw error: a handle passed as operand #0 and consumed by this operation points to a payload entity more than once and updated the comment in my draft PR: #277

How about we just leave the reason comments off? Also on my PR there, if we run with -rA (instead of -rpfE, see https://docs.pytest.org/en/stable/how-to/output.html), we'll get full error output for even XFAIL'd tests in CI logs, so having an possibly outdated comment in the source isn't as useful.

ScottTodd · 2024-07-03T16:31:47Z

iree_tests/pytorch/models/onnx-export/t5-base/test_cases.json

+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-base/inference_output.46.bin",
+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-base/inference_output.47.bin",
+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-base/inference_output.48.bin",
+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-base/inference_output.49.bin",


o_o that's a lot of outputs. Fine for now, but let's file an issue to use archive files (.zip or .tar.gz). If we use something like Hugging Face repositories, we could also lean on their download API to fetch an entire repo

ScottTodd · 2024-07-03T16:32:19Z

iree_tests/pytorch/models/onnx-export/t5-large/test_cases.json

+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-large/inference_output.95.bin",
+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-large/inference_output.96.bin",
+        "https://sharkpublic.blob.core.windows.net/sharkpublic/shark-test-suite/iree-tests/pytorch/models/onnx-export/t5-large/inference_output.97.bin",


Oh, that's even more outputs :P

saienduri added 10 commits July 3, 2024 03:41

try adding the 4 pytorch models with onnx export

a45c418

comment out jobs that we don't care about for validation purposes now

9b9f7b1

try with added flag

10d44e7

update mobilebert shape output

a5c0f77

remove all expected fails to see if works as intended

a088614

fix all shapes

a0d2d52

fix remote download json files

7501f7b

update tolerances and xfails

8411a4a

xfails

bb90c56

update xfail

3d9bd2c

saienduri requested a review from ScottTodd July 3, 2024 13:23

uncomment out tests now that verified

34aff57

ScottTodd reviewed Jul 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of 4 pytorch onnx-export models. #278

Addition of 4 pytorch onnx-export models. #278

saienduri commented Jul 3, 2024 •

edited

Loading

ScottTodd left a comment

ScottTodd Jul 3, 2024

ScottTodd Jul 3, 2024

ScottTodd Jul 3, 2024

ScottTodd Jul 3, 2024

ScottTodd Jul 3, 2024

ScottTodd Jul 3, 2024

		// error: 'builtin.module' op failed to run transform dialect passes
		// (might need to drop the iree-codegen-transform-dialect-library flag)

Addition of 4 pytorch onnx-export models. #278

Are you sure you want to change the base?

Addition of 4 pytorch onnx-export models. #278

Conversation

saienduri commented Jul 3, 2024 • edited Loading

ScottTodd left a comment

Choose a reason for hiding this comment

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

ScottTodd Jul 3, 2024

Choose a reason for hiding this comment

saienduri commented Jul 3, 2024 •

edited

Loading