Scale to many CPU cores #12239

qw42 · 2018-08-18T04:35:25Z

qw42
Aug 18, 2018

What is the best way to achieve best throughput (only forward runs) on a multi-cpu/many-core server?

Description

I am using CPP package. My problem is "embarrassingly parallel" - each forward pass can be run independently. It looks to me that mxnet doesn't scale well with number of CPUs.
My goal is to maximize overall throughput, not to minimize each forward passes' computation time.
I can achieve it by running multiple processes (one per core). Maybe I missed something, but I couldn't find a way to do it with multiple threads.

pengzhao-intel · 2018-08-18T10:15:37Z

pengzhao-intel
Aug 18, 2018
Collaborator

One practical approach is to launch many instances (mxnet itself) and each one uses several cores, say 4,8,16, for your inference. So you can maximize the overall throughput a lot.

BTW, MKLDNN backend will be much faster now. You need to specify the thread number and bind thread to physical cores explicitly.

https://github.com/apache/incubator-mxnet/blob/master/MKLDNN_README.md

Some of out-of -data performance in below link. You can see this method works very well.
Currently, the RN50 is about 200 images/sec w/ MKLDNN backend for the inference in the same machine of below link. The much better perf is on the way :)

https://issues.apache.org/jira/browse/MXNET-11?focusedCommentId=16394829&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16394829

0 replies

vdantu · 2018-08-20T17:28:50Z

vdantu
Aug 20, 2018

@mxnet-label-bot : [C++, Question]

0 replies

qw42 · 2018-08-20T21:10:39Z

qw42
Aug 20, 2018
Author

Hi pengzhao.
In that table from MXNET-11 does "28 core, 1 instance" means running 1 process with 28 threads?
Thank you.

0 replies

pengzhao-intel · 2018-08-20T22:22:57Z

pengzhao-intel
Aug 20, 2018
Collaborator

yes:)

0 replies

qw42 · 2018-08-21T00:09:45Z

qw42
Aug 21, 2018
Author

Hi pengzhao.
Can you elaborate how to do it? Create multiple executors and call them from multiple threads? Which version of mxnet? Thanks you.

0 replies

pengzhao-intel · 2018-08-21T01:27:33Z

pengzhao-intel
Aug 21, 2018
Collaborator

@qw42 , you can use master branch or release version >= 1.2.0.

A simple experiment by following commands.

export OMP_NUM_THREADS=${CORES_NUM}
mem_num=0
COMMAND="numactl -C 0-$[${CORES_NUM}-1] -m ${mem_num} python XXX.py"
for ((i=1;i<${INSTANCES};i++))
        do
              if [ $[${CORES_NUM}*$i] -ge 28 ];then
               mem_num=1
               fi
              COMMAND+=" & numactl -C $[${CORES_NUM}*$i]-$[${CORES_NUM}*$i+${CORES_NUM}-1] -m ${mem_num} python XXX.py"
        done
echo ${COMMAND} > run_instance.sh
bash -x run_instance.sh

0 replies

pengzhao-intel · 2018-08-22T14:17:51Z

pengzhao-intel
Aug 22, 2018
Collaborator

@qw42 did you have a chance to try and could this approach help for your case？

0 replies

qw42 · 2018-08-24T03:23:49Z

qw42
Aug 24, 2018
Author

Hi pengzhao.
It looks to me that your code above creates multiple processes. It is not the same as scaling with threads.
In C++ one can create multiple executors and run each executor in its own thread. This approach doesn't work, because there is a lock somewhere in the code that prevents scaling.

P.S. I am using MKL-DNN, but is not related to scaling.

0 replies

pengzhao-intel · 2018-08-24T03:45:28Z

pengzhao-intel
Aug 24, 2018
Collaborator

Thanks for the feedback. It needs the FW level supports or you can write your own code for your targets.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale to many CPU cores #12239

{{title}}

Replies: 9 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Scale to many CPU cores #12239

qw42 Aug 18, 2018

Description

Replies: 9 comments

pengzhao-intel Aug 18, 2018 Collaborator

vdantu Aug 20, 2018

qw42 Aug 20, 2018 Author

pengzhao-intel Aug 20, 2018 Collaborator

qw42 Aug 21, 2018 Author

pengzhao-intel Aug 21, 2018 Collaborator

pengzhao-intel Aug 22, 2018 Collaborator

qw42 Aug 24, 2018 Author

pengzhao-intel Aug 24, 2018 Collaborator

qw42
Aug 18, 2018

pengzhao-intel
Aug 18, 2018
Collaborator

vdantu
Aug 20, 2018

qw42
Aug 20, 2018
Author

pengzhao-intel
Aug 20, 2018
Collaborator

qw42
Aug 21, 2018
Author

pengzhao-intel
Aug 21, 2018
Collaborator

pengzhao-intel
Aug 22, 2018
Collaborator

qw42
Aug 24, 2018
Author

pengzhao-intel
Aug 24, 2018
Collaborator