Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yolov3-10 and yolov3-12 models from the modelzoo run into segmentation fault when used for inference. #2973

Closed
Sunny-Anand opened this issue Oct 11, 2024 · 5 comments

Comments

@Sunny-Anand
Copy link
Collaborator

Sunny-Anand commented Oct 11, 2024

Models with a Segmentation fault.

yolov3-10 and yolov3-12
Tests.zip

Steps to reproduce on the community docker image

  1. Pull in the latest community onnx-mlir-dev image docker pull onnxmlir/onnx-mlir-dev:390x
  2. Run the podman command to enter the community docker image, note while doing this mount the model folder which contains the above listed models and .test files. This also mounts the DLC modelzoo clients.
    podman run --rm --entrypoint bash -v /devfield/sunny/models:/data:Z -v /devfield/sunny/zdlc-main/dlc-automation/:/code/:Z -it --name onnx-mlir-sunny-y3-10 onnx-mlir-dev:sunny-yolov3
  3. The model.so file is generated in the /data folder where also the model and .tests files are present. Please ensure the full model file is imported, not the git lfs file pointer.
root@0b9851e27fe3:/workdir# ./onnx-mlir/build/Debug/bin/onnx-mlir --O3 --EmitLib --mtriple=s390x-ibm-loz --mcpu=z16 --maccel=NNPA  /data/yolov3-10.onnx --onnx-op-stats TXT
[1/6] Fri Oct 11 16:46:51 2024 (0s) Importing ONNX Model to MLIR Module from "yolov3-10.onnx"
[2/6] Fri Oct 11 16:46:53 2024 (2s) Compiling and Optimizing MLIR Module
Operations encountered:
-----------------------
   func.func               , 1
   func.return             , 1
   onnx.Add                , 18
   onnx.Cast               , 51
   onnx.Ceil               , 7
   onnx.Concat             , 26
   onnx.Constant           , 179
   onnx.Conv               , 75
   onnx.Dim                , 9
   onnx.Div                , 9
   onnx.Exp                , 3
   onnx.LeakyRelu          , 72
   onnx.Loop               , 6
   onnx.Mul                , 17
   onnx.NoValue            , 1
   onnx.NonMaxSuppression  , 1
   onnx.ReduceMinV13       , 1
   onnx.Reshape            , 15
   onnx.Resize             , 2
   onnx.Sigmoid            , 9
   onnx.Slice              , 46
   onnx.Squeeze            , 7
   onnx.Sub                , 6
   onnx.Tile               , 6
   onnx.Transpose          , 4
   onnx.Unsqueeze          , 2
   onnx.Yield              , 6
  zhigh.Add                , 22
  zhigh.Div                , 4
  zhigh.Stick              , 29
  zhigh.StickifiedConstant , 1
  zhigh.Sub                , 1
  zhigh.Unstick            , 21
[3/6] Fri Oct 11 16:47:16 2024 (25s) Translating MLIR Module to LLVM and Generating LLVM Optimized Bitcode
[4/6] Fri Oct 11 16:47:44 2024 (53s) Generating Object from LLVM Bitcode
[5/6] Fri Oct 11 16:47:51 2024 (60s) Linking and Generating the Output Shared Library
[6/6] Fri Oct 11 16:47:52 2024 (61s) Compilation completed
-----------------------------

  1. Execute the inference using the below command where the `code/client/bin/modelzoo is a DLC client.
root@0b9851e27fe3:/workdir# /code/client/bin/modelzoo --iterations 1 --validate --msg-level INFO --file /data/yolov3-10.tests --lib /data/yolov3-10.so --fc-parms 0.01,10.830992,5,10
Iteration 0 dataset 0: Running
Segmentation fault (core dumped)
@AlexandreEichenberger
Copy link
Collaborator

I assume that this is when using -maccel=NNPA.

@Sunny-Anand
Copy link
Collaborator Author

@AlexandreEichenberger yes -maccel=NNPA and -mcpu=z16

@AlexandreEichenberger
Copy link
Collaborator

Currently fails with the conjunction of dynamic shapes and enable-compiler-stick-unstick. So looking for that root cause now.

@AlexandreEichenberger
Copy link
Collaborator

@Sunny-Anand can you confirm if the PRs pushed to address this issue fixed the problem? Tx

@Sunny-Anand
Copy link
Collaborator Author

@AlexandreEichenberger The issue was fixed by the prs #2980 #2981. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants