Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation for Arm Compute Library #22232

Open
wants to merge 1 commit into
base: gh-pages
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 13 additions & 64 deletions docs/build/eps.md
Original file line number Diff line number Diff line change
Expand Up @@ -396,75 +396,24 @@ The DirectML execution provider supports building for both x64 and x86 architect

---

## ARM Compute Library
## Arm Compute Library
See more information on the ACL Execution Provider [here](../execution-providers/community-maintained/ACL-ExecutionProvider.md).

### Prerequisites
{: .no_toc }

* Supported backend: i.MX8QM Armv8 CPUs
* Supported BSP: i.MX8QM BSP
* Install i.MX8QM BSP: `source fsl-imx-xwayland-glibc-x86_64-fsl-image-qt5-aarch64-toolchain-4*.sh`
* Set up the build environment
```
source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
```
* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices

### Build Instructions
{: .no_toc }

1. Configure ONNX Runtime with ACL support:
```
cmake ../onnxruntime-arm-upstream/cmake -DONNX_CUSTOM_PROTOC_EXECUTABLE=/usr/bin/protoc -Donnxruntime_RUN_ONNX_TESTS=OFF -Donnxruntime_GENERATE_TEST_REPORTS=ON -Donnxruntime_DEV_MODE=ON -DPYTHON_EXECUTABLE=/usr/bin/python3 -Donnxruntime_USE_CUDA=OFF -Donnxruntime_USE_NSYNC=OFF -Donnxruntime_CUDNN_HOME= -Donnxruntime_USE_JEMALLOC=OFF -Donnxruntime_ENABLE_PYTHON=OFF -Donnxruntime_BUILD_CSHARP=OFF -Donnxruntime_BUILD_SHARED_LIB=ON -Donnxruntime_USE_EIGEN_FOR_BLAS=ON -Donnxruntime_USE_OPENBLAS=OFF -Donnxruntime_USE_ACL=ON -Donnxruntime_USE_DNNL=OFF -Donnxruntime_USE_MKLML=OFF -Donnxruntime_USE_OPENMP=ON -Donnxruntime_USE_TVM=OFF -Donnxruntime_USE_LLVM=OFF -Donnxruntime_ENABLE_MICROSOFT_INTERNAL=OFF -Donnxruntime_USE_BRAINSLICE=OFF -Donnxruntime_USE_EIGEN_THREADPOOL=OFF -Donnxruntime_BUILD_UNIT_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
```
The ```-Donnxruntime_USE_ACL=ON``` option will use, by default, the 19.05 version of the Arm Compute Library. To set the right version you can use:
```-Donnxruntime_USE_ACL_1902=ON```, ```-Donnxruntime_USE_ACL_1905=ON```, ```-Donnxruntime_USE_ACL_1908=ON``` or ```-Donnxruntime_USE_ACL_2002=ON```;

To use a library outside the normal environment you can set a custom path by using ```-Donnxruntime_ACL_HOME``` and ```-Donnxruntime_ACL_LIBS``` tags that defines the path to the ComputeLibrary directory and the build directory respectively.
You must first build Arm Compute Library 24.07 for your platform as described in the [documentation](https://github.com/ARM-software/ComputeLibrary).
See [here](inferencing.md#arm) for information on building for Arm®-based devices.

```-Donnxruntime_ACL_HOME=/path/to/ComputeLibrary```, ```-Donnxruntime_ACL_LIBS=/path/to/build```
Add the following options to `build.sh` to enable the ACL Execution Provider:


2. Build ONNX Runtime library, test and performance application:
```
make -j 6
```

3. Deploy ONNX runtime on the i.MX 8QM board
```
libonnxruntime.so.0.5.0
onnxruntime_perf_test
onnxruntime_test_all
--use_acl --acl_home=/path/to/ComputeLibrary --acl_libs=/path/to/ComputeLibrary/build
```

### Native Build Instructions
{: .no_toc }

*Validated on Jetson Nano and Jetson Xavier*

1. Build ACL Library (skip if already built)

```bash
cd ~
git clone -b v20.02 https://github.com/Arm-software/ComputeLibrary.git
cd ComputeLibrary
sudo apt-get install -y scons g++-arm-linux-gnueabihf
scons -j8 arch=arm64-v8a Werror=1 debug=0 asserts=0 neon=1 opencl=1 examples=1 build=native
```

1. Cmake is needed to build ONNX Runtime. Because the minimum required version is 3.13,
it is necessary to build CMake from source. Download Unix/Linux sources from https://cmake.org/download/
and follow https://cmake.org/install/ to build from source. Version 3.17.5 and 3.18.4 have been tested on Jetson.

1. Build onnxruntime with --use_acl flag with one of the supported ACL version flags. (ACL_1902 | ACL_1905 | ACL_1908 | ACL_2002)

---

## ArmNN
## Arm NN

See more information on the ArmNN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).
See more information on the Arm NN Execution Provider [here](../execution-providers/community-maintained/ArmNN-ExecutionProvider.md).

### Prerequisites
{: .no_toc }
Expand All @@ -480,7 +429,7 @@ source /opt/fsl-imx-xwayland/4.*/environment-setup-aarch64-poky-linux
alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/share/cmake/OEToolchainConfig.cmake"
```

* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
* See [here](inferencing.md#arm) for information on building for Arm-based devices

### Build Instructions
{: .no_toc }
Expand All @@ -490,20 +439,20 @@ alias cmake="/usr/bin/cmake -DCMAKE_TOOLCHAIN_FILE=$OECORE_NATIVE_SYSROOT/usr/sh
./build.sh --use_armnn
```

The Relu operator is set by default to use the CPU execution provider for better performance. To use the ArmNN implementation build with --armnn_relu flag
The Relu operator is set by default to use the CPU execution provider for better performance. To use the Arm NN implementation build with --armnn_relu flag

```bash
./build.sh --use_armnn --armnn_relu
```

The Batch Normalization operator is set by default to use the CPU execution provider. To use the ArmNN implementation build with --armnn_bn flag
The Batch Normalization operator is set by default to use the CPU execution provider. To use the Arm NN implementation build with --armnn_bn flag

```bash
./build.sh --use_armnn --armnn_bn
```

To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the ArmNN home directory and build directory respectively.
The ARM Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.
To use a library outside the normal environment you can set a custom path by providing the --armnn_home and --armnn_libs parameters to define the path to the Arm NN home directory and build directory respectively.
The Arm Compute Library home directory and build directory must also be available, and can be specified if needed using --acl_home and --acl_libs respectively.

```bash
./build.sh --use_armnn --armnn_home /path/to/armnn --armnn_libs /path/to/armnn/build --acl_home /path/to/ComputeLibrary --acl_libs /path/to/acl/build
Expand All @@ -519,7 +468,7 @@ See more information on the RKNPU Execution Provider [here](../execution-provide


* Supported platform: RK1808 Linux
* See [Build ARM](inferencing.md#arm) below for information on building for ARM devices
* See [here](inferencing.md#arm) for information on building for Arm-based devices
* Use gcc-linaro-6.3.1-2017.05-x86_64_aarch64-linux-gnu instead of gcc-linaro-6.3.1-2017.05-x86_64_arm-linux-gnueabihf, and modify CMAKE_CXX_COMPILER & CMAKE_C_COMPILER in tool.cmake:

```
Expand Down
23 changes: 12 additions & 11 deletions docs/build/inferencing.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,8 @@ If you would like to use [Xcode](https://developer.apple.com/xcode/) to build th

Without this flag, the cmake build generator will be Unix makefile by default.

Today, Mac computers are either Intel-Based or Apple silicon(aka. ARM) based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate ARM binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac ARM computer, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:
Today, Mac computers are either Intel-Based or Apple silicon-based. By default, ONNX Runtime's build script only generate bits for the CPU ARCH that the build machine has. If you want to do cross-compiling: generate arm64 binaries on a Intel-Based Mac computer, or generate x86 binaries on a Mac
system with Apple silicon, you can set the "CMAKE_OSX_ARCHITECTURES" cmake variable. For example:

Build for Intel CPUs:
```bash
Expand Down Expand Up @@ -311,21 +312,21 @@ ORT_DEBUG_NODE_IO_DUMP_DATA_TO_FILES=1
```


### ARM
### Arm

There are a few options for building ONNX Runtime for ARM.
There are a few options for building ONNX Runtime for Arm®-based devices.

First, you may do it on a real ARM device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an ARM container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.
First, you may do it on a real Arm-based device, or on a x86_64 device with an emulator(like qemu), or on a x86_64 device with a docker container with an emulator(you can run an Arm-based container on a x86_64 PC). Then the build instructions are essentially the same as the instructions for Linux x86_64. However, it wouldn't work if your the CPU you are targeting is not 64-bit since the build process needs more than 2GB memory.

* [Cross compiling for ARM with simulation (Linux/Windows)](#cross-compiling-for-arm-with-simulation-linuxwindows) - **Recommended**; Easy, slow, ARM64 only(no support for ARM32)
* [Cross compiling for Arm-based devices with simulation (Linux/Windows)](#cross-compiling-for-arm-with-simulation-linuxwindows) - **Recommended**; Easy, slow, ARM64 only(no support for ARM32)
* [Cross compiling on Linux](#cross-compiling-on-linux) - Difficult, fast
* [Cross compiling on Windows](#cross-compiling-on-windows)

#### Cross compiling for ARM with simulation (Linux/Windows)
#### Cross compiling for Arm-based devices with simulation (Linux/Windows)

*EASY, SLOW, RECOMMENDED*

This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every ARM instruction to x86. This is much faster than compiling natively on a low-end ARM device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an ARM device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.
This method relies on qemu user mode emulation. It allows you to compile using a desktop or cloud VM through instruction level simulation. You'll run the build on x86 CPU and translate every Arm architecture instruction to x86. This is potentially much faster than compiling natively on a low-end device. The resulting ONNX Runtime Python wheel (.whl) file is then deployed to an Arm-based device where it can be invoked in Python 3 scripts. The build process can take hours, and may run of memory if the target CPU is 32-bit.

#### Cross compiling on Linux

Expand Down Expand Up @@ -364,12 +365,12 @@ This option is very fast and allows the package to be built in minutes, but is c

You must also know what kind of flags your target hardware need, which can differ greatly. For example, if you just get the normal ARMv7 compiler and use it for Raspberry Pi V1 directly, it won't work because Raspberry Pi only has ARMv6. Generally every hardware vendor will provide a toolchain; check how that one was built.

A target env is identifed by:
A target env is identified by:

* Arch: x86_32, x86_64, armv6,armv7,arvm7l,aarch64,...
* OS: bare-metal or linux.
* Libc: gnu libc/ulibc/musl/...
* ABI: ARM has mutilple ABIs like eabi, eabihf...
* ABI: Arm has multiple ABIs like eabi, eabihf...

You can get all these information from the previous output, please be sure they are all correct.

Expand Down Expand Up @@ -528,8 +529,8 @@ This option is very fast and allows the package to be built in minutes, but is c

**Using Visual C++ compilers**

1. Download and install Visual C++ compilers and libraries for ARM(64).
If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding ARM(64) compilers and libraries.
1. Download and install Visual C++ compilers and libraries for Arm(64).
If you have Visual Studio installed, please use the Visual Studio Installer (look under the section `Individual components` after choosing to `modify` Visual Studio) to download and install the corresponding Arm(64) compilers and libraries.

2. Use `.\build.bat` and specify `--arm` or `--arm64` as the build option to start building. Preferably use `Developer Command Prompt for VS` or make sure all the installed cross-compilers are findable from the command prompt being used to build using the PATH environmant variable.

Expand Down
6 changes: 3 additions & 3 deletions docs/execution-providers/Vitis-AI-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,9 @@ The following table lists AMD targets that are supported by the Vitis AI ONNX Ru
| **Architecture** | **Family** | **Supported Targets** | **Supported OS** |
|---------------------------------------------------|------------------------------------------------------------|------------------------------------------------------------|------------------------------------------------------------|
| AMD64 | Ryzen AI | AMD Ryzen 7040U, 7040HS | Windows |
| ARM64 Cortex-A53 | Zynq UltraScale+ MPSoC | ZCU102, ZCU104, KV260 | Linux |
| ARM64 Cortex-A72 | Versal AI Core / Premium | VCK190 | Linux |
| ARM64 Cortex-A72 | Versal AI Edge | VEK280 | Linux |
| Arm® Cortex®-A53 | Zynq UltraScale+ MPSoC | ZCU102, ZCU104, KV260 | Linux |
| Arm® Cortex®-A72 | Versal AI Core / Premium | VCK190 | Linux |
| Arm® Cortex®-A72 | Versal AI Edge | VEK280 | Linux |


AMD Adaptable SoC developers can also leverage the Vitis AI ONNX Runtime Execution Provider to support custom (chip-down) designs.
Expand Down
2 changes: 1 addition & 1 deletion docs/execution-providers/Xnnpack-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ nav_order: 9

# XNNPACK Execution Provider

Accelerate ONNX models on Android/iOS devices and WebAssembly with ONNX Runtime and the XNNPACK execution provider. [XNNPACK](https://github.com/google/XNNPACK) is a highly optimized library of floating-point neural network inference operators for ARM, WebAssembly, and x86 platforms.
Accelerate ONNX models on Android/iOS devices and WebAssembly with ONNX Runtime and the XNNPACK execution provider. [XNNPACK](https://github.com/google/XNNPACK) is a highly optimized library of floating-point neural network inference operators for Arm®-based, WebAssembly, and x86 platforms.

## Contents
{: .no_toc }
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,7 @@ redirect_from: /docs/reference/execution-providers/ACL-ExecutionProvider
# ACL Execution Provider
{: .no_toc }

The integration of ACL as an execution provider (EP) into ONNX Runtime accelerates performance of ONNX model workloads across Armv8 cores. [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary){:target="_blank"} is an open source inference engine maintained by Arm and Linaro companies.


## Contents
{: .no_toc }

* TOC placeholder
{:toc}
The ACL Execution Provider enables accelerated performance on Arm®-based CPUs through [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary){:target="_blank"}.


## Build
Expand All @@ -30,10 +23,44 @@ For build instructions, please see the [build page](../../build/eps.md#arm-compu
```
Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
Ort::SessionOptions sf;
bool enable_cpu_mem_arena = true;
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ACL(sf, enable_cpu_mem_arena));
bool enable_fast_math = true;
Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_ACL(sf, enable_fast_math));
```
The C API details are [here](../../get-started/with-c.html).

### Python
{: .no_toc }

```
import onnxruntime

providers = [("ACLExecutionProvider", {"enable_fast_math": "true"})]
sess = onnxruntime.InferenceSession("model.onnx", providers=providers)
```

## Performance Tuning
When/if using [onnxruntime_perf_test](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/perftest){:target="_blank"}, use the flag -e acl
Arm Compute Library has a fast math mode that can increase performance with some potential decrease in accuracy for MatMul and Conv operators. It is disabled by default.

When using [onnxruntime_perf_test](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/perftest){:target="_blank"}, use the flag `-e acl` to enable the ACL Execution Provider. You can additionally use `-i 'enable_fast_math|true'` to enable fast math.

Arm Compute Library uses the ONNX Runtime intra-operator thread pool when running via the execution provider. You can control the size of this thread pool using the `-x` option.

## Supported Operators

|Operator|Supported types|
|---|---|
|AveragePool|float|
|BatchNormalization|float|
|Concat|float|
|Conv|float, float16|
|FusedConv|float|
|FusedMatMul|float, float16|
|Gemm|float|
|GlobalAveragePool|float|
|GlobalMaxPool|float|
|MatMul|float, float16|
|MatMulIntegerToFloat|uint8, int8, uint8+int8|
|MaxPool|float|
|NhwcConv|float|
|Relu|float|
|QLinearConv|uint8, int8, uint8+int8|
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ nav_order: 2
redirect_from: /docs/reference/execution-providers/ArmNN-ExecutionProvider
---

# ArmNN Execution Provider
# Arm NN Execution Provider
{: .no_toc}

## Contents
Expand All @@ -16,14 +16,14 @@ redirect_from: /docs/reference/execution-providers/ArmNN-ExecutionProvider
* TOC placeholder
{:toc}

Accelerate performance of ONNX model workloads across Armv8 cores with the ArmNN execution provider. [ArmNN](https://github.com/ARM-software/armnn) is an open source inference engine maintained by Arm and Linaro companies.
Accelerate performance of ONNX model workloads across Arm®-based devices with the Arm NN execution provider. [Arm NN](https://github.com/ARM-software/armnn) is an open source inference engine maintained by Arm and Linaro companies.

## Build
For build instructions, please see the [BUILD page](../../build/eps.md#armnn).

## Usage
### C/C++
To use ArmNN as execution provider for inferencing, please register it as below.
To use Arm NN as execution provider for inferencing, please register it as below.
```
Ort::Env env = Ort::Env{ORT_LOGGING_LEVEL_ERROR, "Default"};
Ort::SessionOptions so;
Expand Down
Loading