Add in-tree special_models test suite using reworked iree-tooling. #17883

saienduri · 2024-07-12T04:48:10Z

With this, we move away from using all the specialized json config files and complex workflows.
Instead, we use python scripts which allow us to use custom flags, tolerances, and configurations based on the backend/model.
Related PR in TestSuite: nod-ai/SHARK-TestSuite#271

This PR also removes all dependencies on SHARK-TestSuite tooling. Reworked the tools here so that downloading, caching, testing, and benchmarking occurs as intended with tools solely from this repo for iree_special_models. Whenever we are adding test files here, the goal is for an IREE user to be able to clone the repo and run the run tests knowing nothing about the SHARK-TestSuite .

Also didn't realize, but ireers here already has a process of stamping here to check if a file is already produced. I think we have to remove this because it will skip even if there is a newer version of the file available and there's really no point when downloading to a cache because once it's there, it is never removed so not a valuable signal.

(Third times the charm. Had to close the last two versions of this PR because couldn't get passed a pre-commit check that led me to rebase and add a bunch of commits that weren't mine 🤦 )

ci-exactly: build_all, test_amd_mi300, build_packages, regression_test

Signed-off-by: saienduri <[email protected]>

…o clearly distinct packages Signed-off-by: saienduri <[email protected]>

Signed-off-by: saienduri <[email protected]>

.github/workflows/pkgci_regression_test.yml

experimental/benchmarks/sdxl/conftest.py

ScottTodd · 2024-07-12T17:10:10Z

experimental/benchmarks/sdxl/conftest.py

+def pytest_addoption(parser):
+    parser.addoption(
+        "--goldentime-rocm-e2e-ms",
+        action="store",
+        type=float,
+        help="Golden time to test benchmark",
+    )


(This can be deferred to a follow-up refactoring, to keep this PR more incremental)

We might not need flags for these options now that the files are living in the same repo.

I'd prefer for the workflow files to be relatively frozen, with expected results stored in source files in the tests (or very close to the tests). We could pull from a source file like this: https://github.com/iree-org/iree/blob/main/build_tools/benchmarks/common/benchmark_thresholds.py

This puts default values in a source file but then none of the defaults actually matter, since CI jobs always override them with CLI flags. When a developer runs the test they shouldn't also need to match exactly the flags used in the workflow file.

experimental/regression_suite/ireers_tools/artifacts.py

ScottTodd · 2024-07-12T17:42:16Z

experimental/regression_suite/shark-test-suite-models/conftest.py

+class VmfbManager:
+    sdxl_clip_cpu_vmfb = None
+    sdxl_vae_cpu_vmfb = None
+    sdxl_unet_cpu_vmfb = None


Why does this file exist? Can't these just be inlined into the files that use them?

We need this file to share data between tests. There is no cleaner way to inline and use the result from one test in another test

Which tests share data? I couldn't tell from a high level glance through. A few cases that I looked seemed to only use the vmfb from within the same file, so wouldn't a global in those files serve the same purpose as this separate file?

I want to avoid having a file like this needing to know about every configuration of every test.

Yeah I tried the global variable method. But, the way it works with those is that you can only consume global variables in all the independent tests. But, we can't change the value in one test for the vmfb and then have the run_module test consume that vmfb generated. The cleanest way I was able to find was this.

(By tests sharing data I mean every iree_run_module test is trying to use a vmfb generated by the respective iree_compile test)

ScottTodd · 2024-07-12T17:49:09Z

experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py

+sd3_clip_inference_input_0 = fetch_source_fixture(
+    "https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd3-prompt-encoder/inference_input.0.bin",
+    group="sd3_clip",
+)
+
+sd3_clip_inference_input_1 = fetch_source_fixture(
+    "https://sharkpublic.blob.core.windows.net/sharkpublic/sai/sd3-prompt-encoder/inference_input.1.bin",
+    group="sd3_clip",
+)


Fine for now, but we should zip these (nod-ai/SHARK-TestSuite#285) or use an array somehow here. This is a lot of boilerplate to download and use individual loose files.

ScottTodd · 2024-07-12T17:50:03Z

experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py

+@pytest.mark.depends(on=["test_compile_clip_rocm"])
+def test_run_clip_rocm(SD3_CLIP_COMMON_RUN_FLAGS, sd3_clip_real_weights):


Should add pytest marks for cpu, rocm, etc.

Currently, we are using -k command line arg similar to SHARK-TestSuite/iree_tests to check for the backend pattern, but yeah should mark here too

Yeah, fine to do what is most similar to the current setup in this PR, but marks will be more flexible. We couldn't as easily use marks before since the test cases were generated.

ScottTodd · 2024-07-12T17:50:35Z

experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py

Nice, this test style is a big improvement over the cross-repo style with json files.

ScottTodd · 2024-07-12T17:53:25Z

experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py

+def test_compile_clip_cpu(sd3_clip_mlir):
+    VmfbManager.sd3_clip_cpu_vmfb = iree_compile(
+        sd3_clip_mlir, "cpu", CPU_COMPILE_FLAGS
+    )


How does this differ from the fixture approach taken here?

iree/experimental/regression_suite/tests/pregenerated/test_llama2.py

Lines 97 to 107 in 6df0372

@pytest.fixture

def llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb(llama2_7b_f16qi4_stripped_source):

return iree_compile(

llama2_7b_f16qi4_stripped_source,

"rdna3_rocm",

flags=COMMON_FLAGS

+ [

"--iree-hal-target-backends=rocm",

"--iree-rocm-target-chip=gfx1100",

],

)

iree/experimental/regression_suite/tests/pregenerated/test_llama2.py

Lines 220 to 240 in 6df0372

@pytest.mark.presubmit

@pytest.mark.unstable_linalg

@pytest.mark.plat_rdna3_rocm

def test_step_rdna3_rocm_stripped(llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb):

iree_benchmark_module(

llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb,

device="rocm",

function="first_vicuna_forward",

args=[

"--input=1x1xi64",

],

)

iree_benchmark_module(

llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb,

device="rocm",

function="second_vicuna_forward",

args=[

"--input=1x1xi64",

]

+ (["--input=1x32x1x128xf16"] * 64),

)

In the previous approach, we only had the iree_run_module part as the tests. Now, we make the iree_compile part tests also. They wouldn't show as tests in the log if we make compilation a fixture, which I think is pretty valuable for anyone running these tests.

ScottTodd · 2024-07-12T18:05:14Z

I think we should aim to land this as an incremental move and then continue to iterate on the design after getting past the cross-repository work hurdles. Minimally, we should:

Ensure that compilation tests are working as expected: changes to the compiler should re-run .vmfb generation
Ensure that the right tests are running on each CI machine/job: check that tags and filters are all properly assigned

Both of those seem like small but nontrivial tasks, so I think we should still revert the test suite PR and get the other updates (including the torch-mlir bump) running in parallel.

Beyond that, I think focusing on the development experience with these tests will guide answers to the other review comments. The tests should be runnable locally using filters appropriate for whatever system they are run on. If a test fails, it should be obvious how to mark it as xfail in that configuration. If a benchmark is outside of expected values, it should be obvious how to update the values for that configuration (I'm not sure if CLI flags or source files is best for that yet).

Signed-off-by: saienduri <[email protected]>

ScottTodd

Okay, good enough to land with a few nits. A bit rough around the edges so starting in experimental/ is reasonable. Let's aim to iterate on the code and get it out of experimental/ in O(1-2 weeks).

.github/workflows/pkgci_regression_test.yml

experimental/regression_suite/ireers_tools/artifacts.py

experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py

ScottTodd · 2024-07-12T21:32:29Z

experimental/benchmarks/sdxl/benchmark_sdxl_rocm.py

+    if return_code == 0:
+        return 0, proc.stdout
+    logging.getLogger().info(f"Command failed with error: {proc.stderr}")
+    return 1, proc.stdout


This has the same bug from nod-ai/SHARK-TestSuite#286. Once landed we can iterate on a fix in this repo.

Signed-off-by: saienduri <[email protected]>

…ree-org#17883) With this, we move away from using all the specialized json config files and complex workflows. Instead, we use python scripts which allow us to use custom flags, tolerances, and configurations based on the backend/model. Related PR in TestSuite: nod-ai/SHARK-TestSuite#271 This PR also removes all dependencies on SHARK-TestSuite tooling. Reworked the tools here so that downloading, caching, testing, and benchmarking occurs as intended with tools solely from this repo for iree_special_models. Whenever we are adding test files here, the goal is for an IREE user to be able to clone the repo and run the run tests knowing nothing about the SHARK-TestSuite . Also didn't realize, but ireers here already has a process of stamping here to check if a file is already produced. I think we have to remove this because it will skip even if there is a newer version of the file available and there's really no point when downloading to a cache because once it's there, it is never removed so not a valuable signal. (Third times the charm. Had to close the last two versions of this PR because couldn't get passed a pre-commit check that led me to rebase and add a bunch of commits that weren't mine 🤦 ) ci-exactly: build_all, test_amd_mi300, build_packages, regression_test --------- Signed-off-by: saienduri <[email protected]> Signed-off-by: Lubo Litchev <[email protected]>

special models testing suite changes

1ee08e4

Signed-off-by: saienduri <[email protected]>

saienduri requested review from ScottTodd, benvanik and stellaraccident as code owners July 12, 2024 04:48

saienduri marked this pull request as draft July 12, 2024 04:48

saienduri added 10 commits July 11, 2024 22:44

update ref to after special models changes

0de85bf

Signed-off-by: saienduri <[email protected]>

only download if hashes don't match

36d11d9

Signed-off-by: saienduri <[email protected]>

change download paths and update name of ireers package so we have tw…

a758c88

…o clearly distinct packages Signed-off-by: saienduri <[email protected]>

add azure-storage-blob dep

4de158d

Signed-off-by: saienduri <[email protected]>

try all testing framework, downloading changes

28ebdce

Signed-off-by: saienduri <[email protected]>

different fixtures for each model and pre-commit/formatting

547a8cc

Signed-off-by: saienduri <[email protected]>

add pytest.fixture tag for mmdit common run flags fixture

0a89635

Signed-off-by: saienduri <[email protected]>

have to pass rocm chip to other methods too now

c512f44

Signed-off-by: saienduri <[email protected]>

try expanding user in artifacts dir path for benchmarking

c37461c

Signed-off-by: saienduri <[email protected]>

remove stamping checks

b04899e

Signed-off-by: saienduri <[email protected]>

ScottTodd added infrastructure Relating to build systems, CI, or testing infrastructure/benchmark Relating to benchmarking infrastructure labels Jul 12, 2024

ScottTodd reviewed Jul 12, 2024

View reviewed changes

saienduri added 3 commits July 12, 2024 11:52

license header for file

12e2520

Signed-off-by: saienduri <[email protected]>

address comments

e6751a4

Signed-off-by: saienduri <[email protected]>

add return so it doesn't download anyways

8e28935

Signed-off-by: saienduri <[email protected]>

saienduri changed the title ~~Update PckgCI regression test to use new special_models test suite.~~ Add in-tree special_models test suite using reworked iree-tooling. Jul 12, 2024

saienduri requested a review from ScottTodd July 12, 2024 20:29

saienduri marked this pull request as ready for review July 12, 2024 20:36

ScottTodd approved these changes Jul 12, 2024

View reviewed changes

.github/workflows/pkgci_regression_test.yml Outdated Show resolved Hide resolved

experimental/regression_suite/ireers_tools/artifacts.py Show resolved Hide resolved

experimental/regression_suite/shark-test-suite-models/sd3/test_clip.py Outdated Show resolved Hide resolved

ScottTodd reviewed Jul 12, 2024

View reviewed changes

saienduri added 2 commits July 12, 2024 14:58

address final comments

6181b4d

Signed-off-by: saienduri <[email protected]>

lint check

2700536

Signed-off-by: saienduri <[email protected]>

ScottTodd approved these changes Jul 12, 2024

View reviewed changes

saienduri merged commit 44808e1 into iree-org:main Jul 12, 2024
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in-tree special_models test suite using reworked iree-tooling. #17883

Add in-tree special_models test suite using reworked iree-tooling. #17883

saienduri commented Jul 12, 2024 •

edited

Loading

ScottTodd Jul 12, 2024

ScottTodd Jul 12, 2024

saienduri Jul 12, 2024 •

edited

Loading

ScottTodd Jul 12, 2024 •

edited

Loading

saienduri Jul 12, 2024 •

edited

Loading

ScottTodd Jul 12, 2024

ScottTodd Jul 12, 2024

saienduri Jul 12, 2024 •

edited

Loading

ScottTodd Jul 12, 2024

ScottTodd Jul 12, 2024

ScottTodd Jul 12, 2024

saienduri Jul 12, 2024 •

edited

Loading

ScottTodd commented Jul 12, 2024

ScottTodd left a comment

ScottTodd Jul 12, 2024

		@pytest.mark.depends(on=["test_compile_clip_rocm"])
		def test_run_clip_rocm(SD3_CLIP_COMMON_RUN_FLAGS, sd3_clip_real_weights):

	@pytest.fixture
	def llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb(llama2_7b_f16qi4_stripped_source):
	return iree_compile(
	llama2_7b_f16qi4_stripped_source,
	"rdna3_rocm",
	flags=COMMON_FLAGS
	+ [
	"--iree-hal-target-backends=rocm",
	"--iree-rocm-target-chip=gfx1100",
	],
	)

	@pytest.mark.presubmit
	@pytest.mark.unstable_linalg
	@pytest.mark.plat_rdna3_rocm
	def test_step_rdna3_rocm_stripped(llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb):
	iree_benchmark_module(
	llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb,
	device="rocm",
	function="first_vicuna_forward",
	args=[
	"--input=1x1xi64",
	],
	)
	iree_benchmark_module(
	llama2_7b_f16qi4_stripped_rdna3_rocm_vmfb,
	device="rocm",
	function="second_vicuna_forward",
	args=[
	"--input=1x1xi64",
	]
	+ (["--input=1x32x1x128xf16"] * 64),
	)

Add in-tree special_models test suite using reworked iree-tooling. #17883

Add in-tree special_models test suite using reworked iree-tooling. #17883

Conversation

saienduri commented Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

ScottTodd Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

saienduri Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri Jul 12, 2024 • edited Loading

Choose a reason for hiding this comment

ScottTodd commented Jul 12, 2024

ScottTodd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saienduri commented Jul 12, 2024 •

edited

Loading

saienduri Jul 12, 2024 •

edited

Loading

ScottTodd Jul 12, 2024 •

edited

Loading

saienduri Jul 12, 2024 •

edited

Loading

saienduri Jul 12, 2024 •

edited

Loading

saienduri Jul 12, 2024 •

edited

Loading