Using Tensorflow as a back end

As of 2017-09-12, Nimble has experimental support for Tensorflow as a back-end.

Overview

Tensorflow is compute framework for medium sized tensor operations. Tensorflow is usually used for deep neural networks, but Nimble uses it only to accelerate tensor computations. In this sense, Nimble uses Tensorflow roughly as a GPU-compatible replacement for Eigen.

Installing Tensorflow

To use the Tensorflow backend for Nimble, you'll need to install RStudio's R wrapper for Tensorflow, and then use the R wrapper to install tensorflow

install.packages('tensorflow')    # Installs RStudio's R wrapper library.
library('tensorflow')
tensorflow::install_tensorflow()  # Installs Google's Tensorflow library.

The above lines install the latest stable CPU-only version of Tensorflow in a python virtualenv named r-tensorflow. You can replace this version with a GPU-compatible version by manually installing the Python package:

$ workon r-tensorflow  # You may need to install virtualenv before doing this.
$ pip install --ignore-installed tensorflow-gpu

To run on GPUs, you'll also need to install various CUDA libraries on your system. See the Installing Tensorflow docs for details.

For best performance, you may want to compile Tensorflow from source. This arduous process is well documemnted; we recommend performing it only if you're waiting on a very slow computation.

Using Tensorflow for Nimble Functions

Nimble can use Tensorflow to accelerate nimbleFunctions. Each nimbleFunction can currently use either Tensorflow or Eigen for vectorizable math, and this decision is made on a per-function basis. To enable Tensorflow, set the experimentalUseTensorflow option when compiling that function:

nimbleOptions(experimentalUseTensorflow = TRUE)
tf_fun <- compileNimble(fun)

It is generally safer to set this global variable temporarily using withNimbleOptions:

tf_fun <- withNimbleOptions(list(experimentalUseTensorflow = TRUE),
                            compileNimble(fun))

Limitations

Only very simple functions can be compiled using tensorflow.
Tensorflow graphs are currently created for each statement (line of code).
Tensorflow is slower on small arrays due copy overhead.
Tensorflow is only faster on very large arrays (like 1000 x 1000).
To see benefit from CPUs, you'll probably need to compile tensorflow on your machine.
To get GPU support you'll need to install a version of tensorflow with gpu support.
Tensorflow does not support GPUs on OS X (but does parallelize over cores).
Nimble can currently only use tensorflow on Linux:
- OSX support is waiting on an upstream fix;
- Windows support has not been tested and certainly will need fixes.
Despite testing, some functions may give incorrect results. For known failures, search tests/mathTestLists.R for 'tensorflow'.
The current implementation supports only double precision floating point arithmetic.

Benchmarking

To run benchmarks, use the make benchmark command

$ cd nimble/packages
$ make benchmarks
mkdir -p profile
NIMBLE_BENCHMARK_DIR=/home/fritz/nimble-dev/nimble/packages/profile \
  TF_CPP_MIN_LOG_LEVEL=3 \
  R --quiet --slave --vanilla --file=benchmark.R
Benchmarking Nimble code: 
--------------------------------------------
Benchmarking Matrix Arithmetic
   M    N DSL ops/sec C++ ops/sec TF ops/sec
   1    1     1.1e+06     7.2e+07    1.2e+03
  10   10     5.3e+05       6e+06    1.2e+03
 100  100     8.5e+03     6.5e+04    9.4e+02
1000 1000          16     3.4e+02         74
--------------------------------------------
-------------------------------------------------
Benchmarking Matrix Multiplication
   K    M    N DSL ops/sec C++ ops/sec TF ops/sec
   1    1    1     4.2e+06     1.6e+07      3e+03
  10   10   10       8e+05     1.3e+06      3e+03
 100  100  100     1.5e+03     3.8e+03    1.9e+03
1000 1000 1000         2.3         5.9         22
-------------------------------------------------
-----------------------------------------
Benchmarking Special Functions
     N DSL ops/sec C++ ops/sec TF ops/sec
     1       1e+06       7e+06      2e+03
    10       3e+05       5e+05      2e+03
   100       4e+04       5e+04      2e+03
  1000       4e+03       5e+03      2e+03
 10000       4e+02       4e+02      2e+03
100000       3e+01       3e+01      5e+02
-----------------------------------------

To compare cpu and gpu versions, I recommend simply installing different versions of the tensorflow library

$ workon r-tensorflow
$ pip install -I tensorflow   # CPU version
$ make benchmark
$ pip install -I tensorflow-gpu
$ make benchmark

This is fairly quick.

To manually pin computations to CPU, you can try to using the tf$device context manager:

nimbleOptions(experimentalUseTensorflow = TRUE)
tf_fun <- with(tf$device('/cpu:0'), compileNimble(fun))

Provide feedback

Saved searches