-
Notifications
You must be signed in to change notification settings - Fork 24
Using Tensorflow as a back end
As of 2017-09-12, Nimble has experimental support for Tensorflow as a back-end.
Tensorflow is compute framework for medium sized tensor operations. Tensorflow is usually used for deep neural networks, but Nimble uses it only to accelerate tensor computations. In this sense, Nimble uses Tensorflow roughly as a GPU-compatible replacement for Eigen.
To use the Tensorflow backend for Nimble, you'll need to install RStudio's R wrapper for Tensorflow, and then use the R wrapper to install tensorflow
install.packages('tensorflow') # Installs RStudio's R wrapper library.
library('tensorflow')
tensorflow::install_tensorflow() # Installs Google's Tensorflow library.
The above lines install the latest stable CPU-only version of Tensorflow in a python virtualenv named r-tensorflow
. You can replace this version with a GPU-compatible version by manually installing the Python package:
$ workon r-tensorflow # You may need to install virtualenv before doing this.
$ pip install --ignore-installed tensorflow-gpu
To run on GPUs, you'll also need to install various CUDA libraries on your system. See the Installing Tensorflow docs for details.
For best performance, you may want to compile Tensorflow from source. This arduous process is well documemnted; we recommend performing it only if you're waiting on a very slow computation.
Nimble can use Tensorflow to accelerate nimbleFunctions.
Each nimbleFunction can currently use either Tensorflow or Eigen for vectorizable math, and this decision is made on a per-function basis.
To enable Tensorflow, set the experimentalUseTensorflow
option when compiling that function:
nimbleOptions(experimentalUseTensorflow = TRUE)
tf_fun <- compileNimble(fun)
It is generally safer to set this global variable temporarily using withNimbleOptions
:
tf_fun <- withNimbleOptions(list(experimentalUseTensorflow = TRUE),
compileNimble(fun))
- Only very simple functions can be compiled using tensorflow.
- Tensorflow graphs are currently created for each statement (line of code).
- Tensorflow is slower on small arrays due copy overhead.
- Tensorflow is only faster on very large arrays (like 1000 x 1000).
- To see benefit from CPUs, you'll probably need to compile tensorflow on your machine.
- To get GPU support you'll need to install a version of tensorflow with gpu support.
- Tensorflow does not support GPUs on OS X (but does parallelize over cores).
- Nimble can currently only use tensorflow on Linux:
- OSX support is waiting on an upstream fix;
- Windows support has not been tested and certainly will need fixes.
- Despite testing, some functions may give incorrect results.
For known failures, search
tests/mathTestLists.R
for 'tensorflow'. - The current implementation supports only double precision floating point arithmetic.
To run benchmarks, use the make benchmark
command
$ cd nimble/packages
$ make benchmarks
mkdir -p profile
NIMBLE_BENCHMARK_DIR=/home/fritz/nimble-dev/nimble/packages/profile \
TF_CPP_MIN_LOG_LEVEL=3 \
R --quiet --slave --vanilla --file=benchmark.R
Benchmarking Nimble code:
--------------------------------------------
Benchmarking Matrix Arithmetic
M N DSL ops/sec C++ ops/sec TF ops/sec
1 1 1.1e+06 7.2e+07 1.2e+03
10 10 5.3e+05 6e+06 1.2e+03
100 100 8.5e+03 6.5e+04 9.4e+02
1000 1000 16 3.4e+02 74
--------------------------------------------
-------------------------------------------------
Benchmarking Matrix Multiplication
K M N DSL ops/sec C++ ops/sec TF ops/sec
1 1 1 4.2e+06 1.6e+07 3e+03
10 10 10 8e+05 1.3e+06 3e+03
100 100 100 1.5e+03 3.8e+03 1.9e+03
1000 1000 1000 2.3 5.9 22
-------------------------------------------------
-----------------------------------------
Benchmarking Special Functions
N DSL ops/sec C++ ops/sec TF ops/sec
1 1e+06 7e+06 2e+03
10 3e+05 5e+05 2e+03
100 4e+04 5e+04 2e+03
1000 4e+03 5e+03 2e+03
10000 4e+02 4e+02 2e+03
100000 3e+01 3e+01 5e+02
-----------------------------------------
To compare cpu and gpu versions, I recommend simply installing different versions of the tensorflow library
$ workon r-tensorflow
$ pip install -I tensorflow # CPU version
$ make benchmark
$ pip install -I tensorflow-gpu
$ make benchmark
This is fairly quick.
To manually pin computations to CPU, you can try to using the tf$device
context manager:
nimbleOptions(experimentalUseTensorflow = TRUE)
tf_fun <- with(tf$device('/cpu:0'), compileNimble(fun))