Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

imaihal · 2024-05-01T07:01:14Z

Running set of multiple operations in parallel ("Operator-level parallelization") as shown in Figure 1 has potential to improve inference time. By using draft PR #2756, we confirmed this parallelization accelerated inference time of some actual models. We will split the PR into smaller PRs for step-by-step reviewing. This issue describes overall plan and status for the PRs.

We introduced new operations which are called ONNXParallelOp and ONNXForkOp.(PR #2810) . These operations are lowered to KrnlParallelOp, KrnlIterateOp, and SCFIfOp. We will create subsequent PR for lowering pass for ONNXParallelOp and ONNXForkOp. By taking this approach, we can use onnx-mlir existing OpenMP implementation and meet requirement about using common framework for threading described in issue #2497.

PR Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810 (Ready for review)
:
:

Figure 1. Operator-level parallelization	Figure 2. Implementation

imaihal self-assigned this May 1, 2024

imaihal mentioned this issue May 1, 2024

Definition and shape inference for ONNXParallelOp and ONNXForkOp #2810

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

imaihal commented May 1, 2024 •

edited

Loading

Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

Operator-level paralllization with OpenMP (Coase-grained parllelism) #2811

Comments

imaihal commented May 1, 2024 • edited Loading

imaihal commented May 1, 2024 •

edited

Loading