Skip to content

Install TensorFlow @ LN41

Li Jiang edited this page Apr 12, 2017 · 11 revisions

使用说明 :

用户可以通过module 加载或 source xxxx/active . 因为这是GPU 分区,就只配置 GPU 版本的 Module 咯。

module load TensorFlow/1.0.1-gpu-py2

Quick Example

[nscc_ts_gpu@ln41%tianhe2-G ~]$ module avail TensorFlow

------------------------ /BIGDATA/app/modulefiles ---------------------------------------
TensorFlow/1.0.1-gpu-py2   TensorFlow/1.0.1-gpu-py3.6

[nscc_ts_gpu@ln41%tianhe2-G ~]$ module load TensorFlow/1.0.1-gpu-py3.6
cmdPath.c(159):ERROR:11: Usage is 'prepend-path path-variable directory'
TensorFlow/1.0.1-gpu-py3.6(30):ERROR:102: Tcl command execution failed: prepend-path LD_LIBRARY_PATH/BIGDATA/app/Python/3.6.1/lib

[nscc_ts_gpu@ln41%tianhe2-G ~]$ vi /BIGDATA/app/modulefiles/TensorFlow/1.0.1-gpu-py3.6
[nscc_ts_gpu@ln41%tianhe2-G ~]$ exit
logout
Connection to ln41-ib0 closed.
[nscc-gz_yingzhong@ln2%tianhe2-C cudnn]$ ssh -i ~/keys/nscc_ts_gpu.id nscc_ts_gpu@ln41-ib0
Warning: Permanently added 'ln41-ib0' (RSA) to the list of known hosts.
Last login: Wed Apr 12 09:43:01 2017 from ln2-ib0
[nscc_ts_gpu@ln41%tianhe2-G ~]$ module load TensorFlow/1.0.1-gpu-py3.6
[nscc_ts_gpu@ln41%tianhe2-G ~]$ yhrun -n 1 python /BIGDATA/app/TensorFlow/testtf.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x25d7160
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x25dafa0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x25dede0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
Version of TensorFlow : 1.0.1
b'Hello, TensorFlow!'
[nscc_ts_gpu@ln41%tianhe2-G ~]$

基本版

Py 2.7.9 , CPU ONLY , tensorflow 1.0.1

安装Python

  • 解压python 安装包 ,配置并安装 ,配置环境变量
$ ./configure --prefix=/BIGDATA/app/Python/2.7.9  --with-ensurepip   --with-threads --enable-shared --enable-unicode=ucs4
$ make -j 12
$ make install
$ export PATH=/BIGDATA/app/Python/2.7.9/bin:$PATH
$ export LD_LIBRARY_PATH=/BIGDATA/app/Python/2.7.9/lib:$LD_LIBRARY_PATH

安装 TensorFlow

  • 安装 virtualenv , 并创建环境 并加装
$ pip install virtualenv
$ virtualenv --system-site-packages /BIGDATA/app/TensorFlow/python-venv/py2.9
$ source /BIGDATA/app/TensorFlow/python-venv/py2.9/bin/activate
  • 安装 tensorflow 并测试
(py2.9) $ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.1-cp27-none-linux_x86_64.whl

(py2.9) $ python  /BIGDATA/app/TensorFlow/testtf.py
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
Hello, TensorFlow!

可以看出 tensorflow 正常工作了,但 W 也指出了可以通过针对指令集编译 TensorFLow 的库 来提高运算性能 ,这方面需要之后再探究了 。

URL 等参考 TensorFlow 官方指南 安装过程 URL

其他

GPU 版

Py 2.7.9 , GPU , tensorflow 1.0.1

安装

$ virtualenv --system-site-packages /BIGDATA/app/TensorFlow/python-venv/py2.9-gpu
$ source /BIGDATA/app/TensorFlow/python-venv/py2.9-gpu/bin/activate
(py2.9-gpu) $ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp27-none-linux_x86_64.whl

测试

(py2.9-gpu) $ yhrun -n 1 python /BIGDATA/app/TensorFlow/testtf.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /BIGDATA/app/CUDA/8.0/lib64/stubs:/BIGDATA/app/CUDA/8.0/libnvvp:/BIGDATA/app/CUDA/8.0/libnsight:/BIGDATA/app/CUDA/8.0/lib64:/BIGDATA/app/CUDA/8.0/lib:/BIGDATA/app/Python/2.7.9/lib:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1f489d0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1f4c350
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1f4fcd0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
('Version of TensorFlow :', '1.0.1')
Hello, TensorFlow!

从测试结果可以看出GPU 版本能正常工作 ,可以提高性能之处除了针对指令集的编译外还有 使用 cuDNN DSO . 目前已经在module 中加了 cudnn/5.1-CUDA8.0 , 可以正常利用cuDNN 了 。

其他说明

  • 如果需要在环境中添加PYTHON 包可以和我联系
  • 如果需要的PYTHON 包太多,特别是有很多非通用的包的话可以在自己的账号目录下创建 virtualenv 环境
  • 自己的PYTHON 环境需要通过 PIP 安装包时可以联系我使用PROXY.

PY3 版本

安装过程和之前过程一致,安装GPU 版,直接上history :

$ wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tgz
$ tar xvf Python-3.6.1.tgz
$ cd Python-3.6.1/
$ ./configure --prefix=/BIGDATA/app/Python/3.6.1  --with-ensurepip   --with-threads --enable-shared --enable-unicode=ucs4
$ make -j 12
$ make install
$ export PATH=/BIGDATA/app/Python/3.6.1/bin:$PATH
$ export LD_LIBRARY_PATH=/BIGDATA/app/Python/3.6.1/lib:$LD_LIBRARY_PATH
$ which python3
/BIGDATA/app/Python/3.6.1/bin/python3
$ which pip3
/BIGDATA/app/Python/3.6.1/bin/pip3
$ pip3 install pandas scikit-learn virtualenv
$ which virtualenv
/BIGDATA/app/Python/3.6.1/bin/virtualenv
$ mkdir -p /BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu
$ source /BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu/bin/activate
(py3.6.1-gpu) $ which python
/BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu/bin/python
(py3.6.1-gpu) $ python --version
Python 3.6.1
(py3.6.1-gpu) $ which pip
/BIGDATA/app/TensorFlow/python-venv/py3.6.1-gpu/bin/pip
(py3.6.1-gpu) $ pip install  --upgrade  https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.1-cp36-cp36m-linux_x86_64.whl

测试


(py3.6.1-gpu) $ yhrun -n 1 python /BIGDATA/app/TensorFlow/testtf.py
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:04:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1b1e010
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:05:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1b21e50
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:84:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x1b25c90
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: Tesla K80
major: 3 minor: 7 memoryClockRate (GHz) 0.8235
pciBusID 0000:85:00.0
Total memory: 11.17GiB
Free memory: 11.11GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1:   Y Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3:   N N Y Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K80, pci bus id: 0000:05:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla K80, pci bus id: 0000:84:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla K80, pci bus id: 0000:85:00.0)
Version of TensorFlow : 1.0.1
b'Hello, TensorFlow!'
(py3.6.1-gpu) $
Clone this wiki locally