PaddlePaddle Custom Device Implementaion for Cambricon MLU

Please refer to the following steps to compile, install and verify the custom device implementaion for Cambricon MLU.

Neuware Version

Module	Version
cntoolkit	3.10.2-1
cnnl	1.25.1-1
cnnlextra	1.8.1-1
cncl	1.16.0-1
mluops	1.1.1-1

Prepare environment and source code

# 1. pull PaddlePaddle Cambricon MLU development docker image
#    dockerfile of the image is in tools/dockerfile directory
docker pull registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310
docker pull registry.baidubce.com/device/paddle-mlu:ctr2.15.0-kylinv10-aarch64-gcc82-py310

# 2. refer to the following commands to start docker container
docker run -it --name paddle-mlu-dev -v $(pwd):/work \
  -w=/work --shm-size=128G --network=host --privileged  \
  --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \
  -v /usr/bin/cnmon:/usr/bin/cnmon \
  registry.baidubce.com/device/paddle-mlu:ctr2.15.0-ubuntu20-x86_64-gcc84-py310 /bin/bash

# 3. clone the source code
git clone https://github.com/PaddlePaddle/PaddleCustomDevice
cd PaddleCustomDevice

PaddlePaddle Installation and Verification

Install Wheel Pacakge

Install nighlty built PaddlePaddle wheel packages as following:

# Wheel packages for X86_64
https://paddle-device.bj.bcebos.com/0.0.0/mlu/paddlepaddle-0.0.0-cp310-cp310-linux_x86_64.whl
https://paddle-device.bj.bcebos.com/0.0.0/mlu/paddle_custom_mlu-0.0.0-cp310-cp310-linux_x86_64.whl

# Wheel packages for Aarch64
https://paddle-device.bj.bcebos.com/0.0.0/mlu/paddlepaddle-0.0.0-cp310-cp310-linux_aarch64.whl
https://paddle-device.bj.bcebos.com/0.0.0/mlu/paddle_custom_mlu-0.0.0-cp310-cp310-linux_aarch64.whl

# Install two wheel packages after download
pip install paddlepaddle*.whl paddle_custom_mlu*.whl

Source Code Compilation

# 1. navigate to implementaion for Cambricon MLU
cd backends/mlu

# 2. before compiling, ensure that PaddlePaddle (CPU version) is installed, you can run the following command
pip install paddlepaddle -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/

# 3. compile options, whether to compile with unit testing, default is ON
export WITH_TESTING=OFF

# 4. execute compile script - submodules will be synced on demand when compile
bash tools/compile.sh

# 5. install the generated whl package, which is under build/dist directory
pip install build/dist/paddle_custom_mlu*.whl

Verification

# 1. list available custom backends
python -c "import paddle; print(paddle.device.get_all_custom_device_type())"
# output as following
['mlu']

# 2. check installed custom mlu version
python -c "import paddle_custom_device; paddle_custom_device.mlu.version()"
# output as following
version: 0.0.0
commit: 83dfe3de33f0a915fb189161568fc3804b5f9c1b
cntoolkit: 3.10.2
cnnl: 1.25.1
cnnlextra: 1.8.1
cncl: 1.16.0
mluops: 1.1.1

# 3. health check
python -c "import paddle; paddle.utils.run_check()"
# output as following
Running verify PaddlePaddle program ...
PaddlePaddle works well on 1 mlu.
PaddlePaddle works well on 16 mlus.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

Train and Inference Demo

# demo for training, evaluation and inference
python tests/test_LeNet_MNIST.py

# training output as following
Epoch [1/2], Iter [01/14], reader_cost: 2.73611 s, batch_cost: 2.78069 s, ips: 1473.01483 samples/s, eta: 0:01:17
Epoch [1/2], Iter [02/14], reader_cost: 1.37505 s, batch_cost: 1.41733 s, ips: 2889.94454 samples/s, eta: 0:00:38
... ...
Epoch [2/2], Iter [14/14], reader_cost: 0.19809 s, batch_cost: 0.23765 s, ips: 17235.35966 samples/s, eta: 0:00:00
Epoch ID: 2, Epoch time: 3.46918 s, reader_cost: 2.77321 s, batch_cost: 3.32711 s, avg ips: 16529.56425 samples/s
Eval - Epoch ID: 2, Top1 accurary:: 0.86230, Top5 accurary:: 0.98950

# inference output as following
I0521 20:27:26.487897  2030 program_interpreter.cc:221] New Executor is Running.
I0521 20:27:26.499172  2030 analysis_predictor.cc:1850] CustomDevice is enabled
... ...
I0521 20:27:26.500521  2030 ir_params_sync_among_devices_pass.cc:142] Sync params from CPU to mlu:0
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [save_optimized_model_pass]
--- Running analysis [ir_graph_to_program_pass]
I0521 20:27:26.504653  2030 analysis_predictor.cc:2032] ======= ir optimization completed =======
I0521 20:27:26.504727  2030 naive_executor.cc:200] ---  skip [feed], feed -> inputs
I0521 20:27:26.504953  2030 naive_executor.cc:200] ---  skip [save_infer_model/scale_0.tmp_0], fetch -> fetch
Output data size is 10
Output data shape is (1, 10)

Environment variables

Name	Type	Desc	Default
PADDLE_MLU_ALLOW_TF32	Bool	Whether to enable tf32 computation	True
CNCL_MEM_POOL_MULTI_CLIQUE_ENABLE	Int	Whether to enlarge mem-pool of CNCL	1
CUSTOM_DEVICE_BLACK_LIST	String	op blacklist, force the operation to run in CPU mode	""
FLAGS_allocator_strategy	ENUM	paddlepaddle-doc	auto_growth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PaddlePaddle Custom Device Implementaion for Cambricon MLU

Neuware Version

Prepare environment and source code

PaddlePaddle Installation and Verification

Install Wheel Pacakge

Source Code Compilation

Verification

Train and Inference Demo

Environment variables

Files

README.md

Latest commit

History

README.md

File metadata and controls

PaddlePaddle Custom Device Implementaion for Cambricon MLU

Neuware Version

Prepare environment and source code

PaddlePaddle Installation and Verification

Install Wheel Pacakge

Source Code Compilation

Verification

Train and Inference Demo

Environment variables