[Tracker] All the issue related with e2e shark test suite #812

pdhirajkumarprasad · 2024-08-27T16:00:31Z

Full ONNX FE tracker is at: #564

Running model

In alt_e2e test suite:

setenv CACHE_DIR "some Path where model will be downloaded"

If building torch-mlir and iree from source:

source /path/to/iree-build/.env && export PYTHONPATH
export PYTHONPATH=/path/to/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir:/path/to/torch-mlir/test/python/fx_importer:$PYTHONPATH
export PATH=/path/to/iree-build/tools/:/path/to/torch-mlir/build/bin/:$PATH

python ./run.py --mode=cl-onnx-iree -v --torchtolinalg -t ModelName

For onnx/models/

critical issues

CPU

#	device	issue type	issue no	#model impacted	list of model	assignee
1	CPU	'stream.async.dispatch' op has invalid Read access range [0 to 7375872 for 7375872] of resource %15 with size 150528; length > resource size	380	67	modelList	@jinchen62 (file issue with smaller reproducer)
2	CPU	"onnx.Resize" failed to legalize operation 'torch.operator' that was explicitly marked illegal	599	11	modelList	@zjgarvey
3	CPU	failed to legalize operation 'torch.aten.or.bool'	873	6	modelList	@zjgarvey
4	CPU	'linalg.generic' op inferred input/output operand #1 has shape's dimension #2 to be 3137, but found 313	825	5	modelList	@zjgarvey
5	CPU	failed to legalize unresolved materialization from ('i64') to 'index' that remained live after conversion	18899	4	modelList	@zjgarvey
6	CPU	'torch.prim.If' op along control flow edge from Region #0 to parent results: source type #0	696	3	modelList	@AmosLewis
7	CPU	crash	3841	2	modelList	@zjgarvey
8	CPU	'func.func' op exceeded stack allocation limit of 32768 bytes for function. Got 1048576 bytes	875	2	modelList
9	CPU	'tensor.reshape' op source and destination tensor should have the same number of elements		1	modelList
10	CPU	onnx.LSTM		1	modelList
11	CPU	onnx.Conv		1	modelList
12	CPU	One or more operations with large vector sizes (8192 bytes) were found		1	modelList
13	CPU	'tensor.dim' op unexpected during shape cleanup; dynamic dimensions must have been resolved prior to leaving the flow dialect	876	1	modelList
14	CPU	failed to legalize operation onnx.NonZero	820	1	modelList	@renxida
15	CPU	type of return operand 0 ('!torch.vtensor<[?,384],f32>') doesn't match function result type ('!torch.vtensor<[1,384],f32>')	877	1	modelList	@Shukla-Gaurav
16	CPU	boolean indexing ops: AtenNonzeroOp, AtenIndexTensorOp, AtenMaskedSelectOp	3293			@renxida
17	CPU	Add TorchToLinalg lowering for MaxUnpool operation	718			@jinchen62
18	CPU	Fix Onnx.DFT Torch->Linalg lowering	800			@PhaneeshB

import and setup failures

#	device	issue type	issue no	#model impacted	list of model
3	N/A	OOM during ORT	#862	3	model list
4	N/A	OOM import, missing dim_params, ORT PASS	#860 #861	21	model list
5	N/A	Unable to update opset ver due to BatchNormalization, ORT PASS	#859	5	model list
6	N/A	Unable to update opset ver due to BN, OOM import, ORT PASS	#859 #861	1	model list
7	N/A	duplicate metadata_prop keys, ORT PASS	#863	1	model list
8	N/A	OOM import, ORT PASS	#861	25	model list

iree-compile

IREE project tracker: https://github.com/orgs/iree-org/projects/8/views/3

#	device	issue type	issue no	#model impacted	list of model	assignee	Status
3	GPU	func.func' op uses 401920 bytes of shared memory; exceeded the limit of 65536 bytes	18603	100+

iree runtime

#	device	issue type	issue no	#model impacted	list of model	assignee	Status
1	CPU	Abort	18741	515+	modelList

numerics

#	device	issue type	issue no	#model impacted	list of model	assignee
1	CPU	numeric	need_to_analyze	101	modleList
2	[numerics]: element at index 0 (0.332534) does not match the expected (0.308342); for LSTM ops	2	18441

IREE EP only issues

iree-compile fails with ElementsAttr does not provide iteration facilities for type 'mlir::Attribute' on int8 models at QuantizeLinear op

low priority

issue no 828 Turbine Camp
Issue no 797 Ops not in model

The text was updated successfully, but these errors were encountered:

zjgarvey · 2024-08-27T19:08:32Z

Can you update the model List links?

jinchen62 · 2024-08-27T21:00:44Z

Could you also attach the issue links you referred to so we would know if we cover all model paths. Also it seems not including #801 right?

pdhirajkumarprasad · 2024-08-28T04:38:56Z

@zjgarvey the model list contain the updated link only.

@jinchen62 Yes, so far the report is based on onnx model of e2e shark test suite

jinchen62 · 2024-08-29T23:31:02Z

@pdhirajkumarprasad I think it would be helpful to attach more details of the error message.

I feel like the onnx.Transpose one in onnx to torch is the shape inference issue that I was dealing with. I fixed it by setting opset version to 21 with locally built torch-mlir in shark testsuite llvm/torch-mlir#3593. @zjgarvey I realized that this seems not working for the CI job, right? Any ideas?

nod-ai deleted a comment Aug 27, 2024

nod-ai deleted a comment from yiweifengyan Aug 27, 2024

kumardeepakamd mentioned this issue Aug 29, 2024

[Tracker] Onnx FE Support #564

Open

kumardeepakamd mentioned this issue Sep 12, 2024

Turbine Camp #828

Open

25 tasks

PhaneeshB mentioned this issue Sep 12, 2024

Fix Onnx.DFT Torch->Linalg lowering #800

Open

vinayakdsci mentioned this issue Oct 9, 2024

failed to legalize operation 'hal.interface.constant.load' iree-org/iree#18487

Open

pdhirajkumarprasad mentioned this issue Oct 28, 2024

removing model which are not valid and duplicates nod-ai/SHARK-TestSuite#379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracker] All the issue related with e2e shark test suite #812

[Tracker] All the issue related with e2e shark test suite #812

pdhirajkumarprasad commented Aug 27, 2024 •

edited

Loading

zjgarvey commented Aug 27, 2024

jinchen62 commented Aug 27, 2024

pdhirajkumarprasad commented Aug 28, 2024

jinchen62 commented Aug 29, 2024 •

edited

Loading

[Tracker] All the issue related with e2e shark test suite #812

[Tracker] All the issue related with e2e shark test suite #812

Comments

pdhirajkumarprasad commented Aug 27, 2024 • edited Loading

Running model

For onnx/models/

critical issues

CPU

import and setup failures

iree-compile

iree runtime

numerics

IREE EP only issues

low priority

zjgarvey commented Aug 27, 2024

jinchen62 commented Aug 27, 2024

pdhirajkumarprasad commented Aug 28, 2024

jinchen62 commented Aug 29, 2024 • edited Loading

pdhirajkumarprasad commented Aug 27, 2024 •

edited

Loading

jinchen62 commented Aug 29, 2024 •

edited

Loading