Sorted all invocations of alloca #2981

AlexandreEichenberger · 2024-10-17T20:49:15Z

Alloca was shown to be an issue with large benchmarks, where too much stack alloc ended up crashing the process. Alloca are typically used for temporaries needed within the computations of an onnx operation.

Went through all instances of alloca, and left alone the scalars (allocating a few bytes per operations is fine). For the larger data, I migrated it to alloc.

Because alloc results in a more expensive call to allocate and free, when not using parallelism, I moved such calls outside the loops. But when using parallelism, such call must be inside the parallel region as each parallel thread must have its version of temporaries, so in such cases the alloc stayed inside the parallel loop.

I ensured that the buffer hoisting did not bring the alloc outside of the scope_alloca, which seems to also catch the alloc operations. I ran individual checks on several key operations, as at this time the verifying to parallel operations in our CIs are still very sketchy.

  CheckONNXModel.py -m gemm.mlir -r="-O3 -march=arm64" -a="-parallel"
  CheckONNXModel.py -m reduceMin1.mlir -r="-O3 -march=arm64 -shapeInformation=0:1024x1024x1024" -a="-parallel"
  CheckONNXModel.py -m layernorm4d.mlir -r="-O3 -march=arm64 " -a="-parallel"

Signed-off-by: Alexandre Eichenberger <[email protected]>

tungld

LGTM.

I remember we encounted segfaults because of alloca in the past and added the hoisting pass to move it outside loops. It's a pity to hear that that pass does not work with scope_alloca.

I see this pass in MLIR to promote small alloc to alloca, not sure if it works or not.

tungld · 2024-10-18T01:19:00Z

src/Accelerators/NNPA/Conversion/ZLowToLLVM/ZLowToLLVMCommon.cpp

@@ -215,6 +217,7 @@ ZTensor ZTensorHelper::getZTensor(Value bufferPtr, zdnn_data_types dataType,
  Value transformedDescPtr =
      getTransformedDescPtr(preTransformedDescPtr, isConcat, concatInfo);
  // Create the input zTensor.
+  // TODO: evaluate if a heap alloc would not be better.


I often see in MLIR they use alloca to create a LLVM struct.

Make sense, it's still small. Want me to remove the comment? @tungld

Yes, it's ok to remove it. Thanks!

Changed it with an explanation of why it is good to have alloca here

tungld · 2024-10-18T01:21:14Z

src/Conversion/KrnlToAffine/KrnlMatmul.cpp

-    // eventually all the refs to alloca to be register/spill access, not memory
-    // load/stores.
-    Value TmpProd = create.mem.alignedAlloca(CTmpType, BUFFER_ALIGN);
+    Value TmpProd = create.mem.alignedAlloc(CTmpType, BUFFER_ALIGN);


If it is a good practice to use alignedAlloca for scalar type, should we add a verification in create.mem.alignedAlloca that checks type would be scalar. By that way, we can avoid the situation where users call create.mem.alignedAlloca for non-scalar type.

Left for another PR.

AlexandreEichenberger · 2024-10-18T01:36:08Z

I remember we encounted segfaults because of alloca in the past and added the hoisting pass to move it outside loops. It's a pity to hear that that pass does not work with scope_alloca.

scope_alloca stop the migration of alloc and alloca out of that scope. It is important for parallel regions, as having it stay in the parallel region means that each thread will get its own. So that works fine.

Now I also manually moved the case with no parallel out of the loops, just to make sure that is fine. Would you like me to check if it's really needed? That way we could have the call always inside, and if parallel is there, its blocked and otherwise it is not?

In one case, I really need it outside manually, because it was inside of an inner-loop if-then-else. So by having it outside of the if-then-else (allocating in all case), it allows it to migrate to the outside.

…ger/onnx-mlir into fix-alloca-v1

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger · 2024-10-18T02:31:37Z

Responded to comments, and removed the placement of most "alloc" before/inside the main loop depending on parallel. Placing it inside works in both cases. If its parallel, it remains inside because of the inserted scope_alloca; if its sequential, then there is no scope and it naturally migrate outside the main loop.

tungld · 2024-10-18T02:33:43Z

If its parallel, it remains inside because of the inserted scope_alloca; if its sequential, then there is no scope and it naturally migrate outside the main loop.

Great, thanks! It was what I expected.

jenkins-droid · 2024-10-18T03:37:49Z

Jenkins Linux amd64 Build #15879 [push] Sorted all invocations o... started at 22:37

jenkins-droid · 2024-10-18T03:37:50Z

Jenkins Linux s390x Build #15882 [push] Sorted all invocations o... started at 23:37

jenkins-droid · 2024-10-18T03:37:54Z

Jenkins Linux ppc64le Build #14909 [push] Sorted all invocations o... started at 23:50

jenkins-droid · 2024-10-18T04:51:57Z

Jenkins Linux amd64 Build #15879 [push] Sorted all invocations o... passed after 1 hr 14 min

jenkins-droid · 2024-10-18T05:12:06Z

Jenkins Linux s390x Build #15882 [push] Sorted all invocations o... passed after 1 hr 34 min

jenkins-droid · 2024-10-18T05:59:10Z

Jenkins Linux ppc64le Build #14909 [push] Sorted all invocations o... passed after 2 hr 21 min

AlexandreEichenberger added 2 commits October 17, 2024 16:43

sorted all invocations of alloca

ee39d52

Signed-off-by: Alexandre Eichenberger <[email protected]>

format

9e4c751

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger requested a review from tungld October 17, 2024 20:55

AlexandreEichenberger added 2 commits October 17, 2024 16:57

comments

2a42b40

Signed-off-by: Alexandre Eichenberger <[email protected]>

Merge branch 'main' into fix-alloca-v1

1e27acc

tungld approved these changes Oct 18, 2024

View reviewed changes

AlexandreEichenberger added 3 commits October 17, 2024 21:59

update

7646605

Merge branch 'fix-alloca-v1' of https://github.com/AlexandreEichenber…

ea57ac9

…ger/onnx-mlir into fix-alloca-v1

responding to comments

a2230f1

Signed-off-by: Alexandre Eichenberger <[email protected]>

AlexandreEichenberger merged commit 1435011 into onnx:main Oct 18, 2024
6 of 7 checks passed

Sunny-Anand mentioned this pull request Oct 21, 2024

yolov3-10 and yolov3-12 models from the modelzoo run into segmentation fault when used for inference. #2973

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sorted all invocations of alloca #2981

Sorted all invocations of alloca #2981

AlexandreEichenberger commented Oct 17, 2024

tungld left a comment

tungld Oct 18, 2024

AlexandreEichenberger Oct 18, 2024

tungld Oct 18, 2024

AlexandreEichenberger Oct 18, 2024 •

edited

Loading

tungld Oct 18, 2024

AlexandreEichenberger Oct 18, 2024

AlexandreEichenberger commented Oct 18, 2024

AlexandreEichenberger commented Oct 18, 2024

tungld commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

Sorted all invocations of alloca #2981

Sorted all invocations of alloca #2981

Conversation

AlexandreEichenberger commented Oct 17, 2024

tungld left a comment

Choose a reason for hiding this comment

tungld Oct 18, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Oct 18, 2024

Choose a reason for hiding this comment

tungld Oct 18, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Oct 18, 2024 • edited Loading

Choose a reason for hiding this comment

tungld Oct 18, 2024

Choose a reason for hiding this comment

AlexandreEichenberger Oct 18, 2024

Choose a reason for hiding this comment

AlexandreEichenberger commented Oct 18, 2024

AlexandreEichenberger commented Oct 18, 2024

tungld commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

jenkins-droid commented Oct 18, 2024

AlexandreEichenberger Oct 18, 2024 •

edited

Loading