Tensorflow 1.8 with GPU on MacOS High Sierra 10.13.6

Time:2019-10-30

Tensorflow 1.8 with GPU on MacOS High Sierra 10.13.6Lao Xu

Thursday, 26 July 2018

Tensorflow 1.8 with GPU on macOS High Sierra 10.13.6

Tensorflow team announced that it would stop supporting tensorflow GPU version of MAC after 1.2.

Therefore, there is no way to install it directly. You can only compile it with source code.

Tensorflow 1.8 with CUDA on macOS High Sierra 10.13.6

CPU running tensorflow doesn’t feel fast enough, want to try GPU acceleration! At the same time, I have a video card that supports CUDA.

Tensorflow 1.8 with GPU on MacOS High Sierra 10.13.6

Edition

The important thing to say three times: the relevant driver and compiler environment tools must choose the matching version, or the compilation will not succeed!

Edition:

  • Tensorflow r1.8 source code, the latest 1.9 seems to have some problems.
  • Mac OS 10.13.6, it shouldn’t matter.
  • Video card driver 387.10.10.10.40.105, CUDA 9.1 supported
  • CUDA 9.2, this is CUDA driver, which can be higher than the CUDA version supported by the above graphics card, that is CUDA driver 9.2
  • Cudnn 7.2, corresponding to CUDA above, directly install the latest version
  • XCode 8.2.1, this is the key point, please downgrade to this version, otherwise there will be compilation errors or runtime errorsSegmentation Fault
  • bazel 0.14.0, this is the key point, please downgrade to this version
  • Python 3.6, this is the key point, do not use the latest version of Python 3.7. So far, there will be problems in compiling

Get ready

Need to download (some files need to be downloaded if they are large, please download them before you continue reading to save time):

  • Xcode 8.2.1

    https://developer.apple.com/d…

    Xcode_8.2.1.xip

  • bazel-0.14.0

    https://github.com/bazelbuild…

  • CUDA Toolkit 9.2

    https://developer.nvidia.com/…

  • cuDNN v7.2.1

    https://developer.nvidia.com/…

  • Tensorflow source code,333M

    $ git clone https://github.com/tensorflow/tensorflow -b r1.8

Python 3.6.5_1

The current installation is 3.7, please downgrade

$ brew unlink python
$ brew install https://raw.githubusercontent.com/Homebrew/homebrew-core/f2a764ef944b1080be64bd88dca9a1d80130c558/Formula/python.rb
$ pip3 install --upgrade pip setuptools wheel
# $ brew switch python 3.6.5_1

Don’t use Python 3.7.0, or there will be compilation problems

You can switch back after compiling

$ brew switch python 3.7.0

Xcode 8.2.1

Xcode needs to be downgraded to 8.2.1

Go to the Apple Developer website to download the package, https://developer.apple.com/d…

Extract and copy to/Applications/Xcode.app, and then point to

$ sudo xcode-select -s /Applications/Xcode.app

Confirm whether the installation is accurate

$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Command line tools, CC is clang

This is very important, otherwise, although the compilation is successful, a more complex project will appear.Segmentation Fault

environment variable

Because the Lib using CUDA is not under the system directory, you need to set the environment variable to point to

LD? Library? Path is invalid on MAC, dyld? Library? Path is used

Configure environment variable editing~/.bash_profileor~/.zshrc

export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=$CUDA_HOME/lib:$CUDA_HOME/extras/CUPTI/lib
export PATH=$CUDA_HOME/bin:$PATH

Install CUDA

CUDA is introduced by NVIDIA for its own GPUParallel computingFramework, which means CUDA can only run on NVIDIA’s GPU.And CUDA can only play a role when the computing problem to be solved is a large number of parallel computing.

Step 1: confirm whether the graphics card supports GPU calculation

Find your video card model here to see if it supports

https://developer.nvidia.com/…

My video card isNVIDIA GeForce GTX 750 Ti:

GPU Compute Capability
GeForce GTX 750 Ti 5.0

Step 2: install CUDA

If you have installed another version of CUDA, you need to uninstall

$ sudo /usr/local/bin/uninstall_cuda_drv.pl
$ sudo /usr/local/cuda/bin/uninstall_cuda_9.1.pl
$ sudo rm -rf /Developer/NVIDIA/CUDA-9.1/
$ sudo rm -rf /Library/Frameworks/CUDA.framework
$ sudo rm -rf /usr/local/cuda/

In order to be safe, it’s better to restart it.

First of all, it should be noted that the version of CUDA driver and GPU driver must be the same in order for CUDA to find the graphics card.

  • GPU driver is the video card driver

    • http://www.macvidcards.com/dr…
    • My Mac OS is 10.13.6, and the driver has the latest version installed.387.10.10.10.40.105

      https://www.nvidia.com/downlo…

      Version:    387.10.10.10.40.105
      Release Date:    2018.7.10
      Operating System:    macOS High Sierra 10.13.6
      CUDA Toolkit:    9.1
  • CUDA Driver

    • http://www.nvidia.com/object/…
    • Install CUDA driver separately. You can choose the latest version to see its support for video card driver
    • cudadriver_396.148_macos.dmg

      New Release 396.148
      CUDA driver update to support CUDA Toolkit 9.2, macOS 10.13.6 and NVIDIA display driver 387.10.10.10.40.105
      Recommended CUDA version(s): CUDA 9.2
      Supported macOS 10.13
  • CUDA Toolkit

    • https://developer.nvidia.com/…
    • You can choose the latest version, here you can choose 9.2
    • cuda_9.2.148_mac.dmg、cuda_9.2.148.1_mac.dmg

Check after installation:

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

Confirm whether the driver is loaded

$ kextstat | grep -i cuda.
  149    0 0xffffff7f838d3000 0x2000     0x2000     com.nvidia.CUDA (1.1.0) E13478CB-B251-3C0A-86E9-A6B56F528FE8 <4 1>

Test whether CUDA can operate normally:

$ cd /usr/local/cuda/samples
$ sudo make -C 1_Utilities/deviceQuery
$ ./bin/x86_64/darwin/release/deviceQuery
./bin/x86_64/darwin/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 750 Ti"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 2048 MBytes (2147155968 bytes)
  ( 5) Multiprocessors, (128) CUDA Cores/MP:     640 CUDA Cores
  GPU Max Clock rate:                            1254 MHz (1.25 GHz)
  Memory Clock rate:                             2700 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS

CUDA works normally if result = pass is displayed at the end

If the following error occurs

The version ('9.1') of the host compiler ('Apple clang') is not supported

Description Xcode version is too new, Xcode degradation required

Step 3: install cudnn

cuDNNCUDA deep Neural Network Library: it is an acceleration library for deep neural network built by NVIDIA, and it is a GPU acceleration library for deep neural network. If you want to use GPU to train the model, cudnn is not necessary, but this acceleration library is usually used.

cuDNN

  • https://developer.nvidia.com/…
  • Download the latest version of cudnn v7.2.1 for CUDA 9.2
  • cudnn-9.2-osx-x64-v7.2.1.38.tgz

After that, you can directly merge the decompression into the CUDA directory / usr / local / CUDA /.

$ tar -xzvf cudnn-9.2-osx-x64-v7.2.1.38.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
$ rm -rf cuda

Step 4: install cuda-z

Used to view CUDA operation

$ brew cask install cuda-z

Then you can run cuda-z from the application to check CUDA operation

Tensorflow 1.8 with GPU on MacOS High Sierra 10.13.6

Compile

If you have a compiled version, you can skip this chapter and go directly to the installation section

Next, compile tensorflow GPU version from source

CUDA preparation

Please refer to the previous section

Compiling environment preparation

Python

$ python3 --version
Python 3.6.5

Don’t use Python 3.7.0, or there will be compilation problems

Python dependency

$ pip3 install six numpy wheel

Coreutils,llvm,OpenMP

$ brew install coreutils llvm cliutils/apple/libomp

Bazel

It should be noted that this must be version 0.14.0. New or old can cause compilation failure. Download version 0.14.0, bazel release page

$ curl -O https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-installer-darwin-x86_64.sh
$ chmod +x bazel-0.14.0-installer-darwin-x86_64.sh
$ ./bazel-0.14.0-installer-darwin-x86_64.sh
$ bazel version
Build label: 0.14.0

Too low version may cause environment variable not to be found, so library not loaded

Check NVIDIA development environment

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

Check clang version

$ cc -v
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Source code preparation

Pull the tensorflow source release 1.8 branch and modify it to make it compatible with Mac OS

Here you can download the modified source code directly

$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/tensorflow-macos-gpu-r1.8-src.tar.gz

Or modify manually

$ git clone https://github.com/tensorflow/tensorflow -b r1.8
$ cd tensorflow
$ curl -O https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/tensorflow-macos-gpu-r1.8.patch
$ git apply tensorflow-macos-gpu-r1.8.patch
$ curl -o third_party/nccl/nccl.h https://raw.githubusercontent.com/SixQuant/tensorflow-macos-gpu/master/patch/nccl.h

Build

To configure

$ which python3
/usr/local/bin/python3
$ ./configure
Please specify the location of python. [Default is /usr/local/opt/[email protected]/bin/python2.7]: /usr/local/bin/python3

Found possible Python library paths:
  /usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/Cellar/python/3.6.5_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.2

Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.2

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.0,3.5,5.0,5.2,6.0,6.1

Do you want to use clang as CUDA compiler? [y/N]:n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:

Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl             # Build with MKL support.
    --config=monolithic      # Config for mostly static monolithic build.
Configuration finished

Be sure to enter the correct version

  • /usr/local/bin/python3
  • CUDA 9.2
  • cuDNN 7.2
  • Compute capability 3.0, 3.5, 5.0, 5.2, 6.0, 6.1. You must check the version supported by your graphics card. You can enter multiple versions.

The above actually generates the compilation configuration file.tf_configure.bazelrc

Start compilation

$ bazel clean --expunge
$ bazel build --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

Download may fail due to network problems during compilation. Try again several times

If the bazel version is not correct, it may cause dyld ﹣ library ﹣ path not to be passed, so that the library is not loaded

Compilation specification

–Config = opt should mean

build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true

-March = native indicates that the optimization instructions supported by the current CPU are used for compilation

View the instruction set supported by the current CPU

$ sysctl machdep.cpu.features
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
$ gcc -march=native -dM -E -x c++ /dev/null | egrep "AVX|SSE"

#define __AVX2__ 1
#define __AVX__ 1
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE3__ 1
#define __SSE4_1__ 1
#define __SSE4_2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __SSSE3__ 1

Compilation error dyld: library not loaded: @ rpath / libcudart.9.2.dylib

ERROR: /Users/c/Downloads/tensorflow-macos-gpu-r1.8/src/tensorflow/python/BUILD:1590:1: Executing genrule //tensorflow/python:string_ops_pygenrule failed (Aborted): bash failed: error executing command /bin/bash bazel-out/host/genfiles/tensorflow/python/string_ops_pygenrule.genrule_script.sh
dyld: Library not loaded: @rpath/libcudart.9.2.dylib
  Referenced from: /private/var/tmp/_bazel_c/ea0f1e868907c49391ddb6d2fb9d5630/execroot/org_tensorflow/bazel-out/host/bin/tensorflow/python/gen_string_ops_py_wrappers_cc
  Reason: image not found

The environment variable dyld? Library? Path was not passed due to a bug in bazel.

Solution: install the correct version of bazel

Compilation error pystring ﹣ asstrinandsize

external/protobuf_archive/python/google/protobuf/pyext/descriptor_pool.cc:169:7: error: assigning to 'char *' from incompatible type 'const char *'
  if (PyString_AsStringAndSize(arg, &name, &name_size) < 0) {

This is because Python 3.7 has a bug in protobuf_python. Please recompile it after changing to Python 3.6

https://github.com/google/pro…

Compile time up to 1.5 hours, please wait patiently

Generate PIP installation package

Recompile and replace

$ gcc -march=native -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so bazel-out/darwin-py3-opt/bin/tensorflow/contrib/nccl/python/ops
$ rm _nccl_ops.o

Pack

$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/Downloads/

Clear

$ bazel clean --expunge

install

$ pip3 uninstall tensorflow
$ pip3 install ~/Downloads/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

You can also install it directly through http

$ pip3 install https://github.com/SixQuant/tensorflow-macos-gpu/releases/download/v1.8.0/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl

If it is a direct installation, please make sure that the relevant version is consistent with or higher than the compiled version.

  • cudadriver_396.148_macos.dmg
  • cuda_9.2.148_mac.dmg
  • cuda_9.2.148.1_mac.dmg
  • cudnn-9.2-osx-x64-v7.2.1.38.tgz

confirm

Confirm whether tensorflow GPU works normally

Confirm environment variables

Verify that Python code can read the correct environment variable dyld? Library? Path

$ nano tensorflow-gpu-01-env.py
#!/usr/bin/env python

import os

print(os.environ["DYLD_LIBRARY_PATH"])
$ python3 tensorflow-gpu-01-env.py
/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib

Confirm if GPU is enabled

If the tensorflow instruction has both CPU and GPU implementation, the GPU device has priority when the instruction is assigned to the device. For example, ifmatmulThere are CPU and GPU core functions at the same time.cpu:0andgpu:0In the system of the device,gpu:0Will be selected to runmatmul。 To find out which device your instructions and tensors are assigned to, create a session andlog_device_placementConfiguration options set toTrue

$ nano tensorflow-gpu-02-hello.py
#!/usr/bin/env python

import tensorflow as tf

config = tf.ConfigProto()
config.log_device_placement = True

# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
with tf.Session(config=config) as sess:
    # Runs the op.
    print(sess.run(c))
$ python3 tensorflow-gpu-02-hello.py
2018-08-26 14:13:45.987276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 706.66MiB
2018-08-26 14:13:45.987303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:13:46.245132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 426 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
2018-08-26 14:13:46.253938: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0

MatMul: (MatMul): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254406: I tensorflow/core/common_runtime/placer.cc:886] MatMul: (MatMul)/job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254415: I tensorflow/core/common_runtime/placer.cc:886] b: (Const)/job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0
2018-08-26 14:13:46.254421: I tensorflow/core/common_runtime/placer.cc:886] a: (Const)/job:localhost/replica:0/task:0/device:GPU:0
[[22. 28.]
 [49. 64.]]

Some of the useless log output that seems to be worrying, I commented it out directly from the source code, for example:

OS X does not support NUMA – returning NUMA node zero

Not found: TF GPU device with id 0 was not registered

It’s a little more complicated.

$ nano tensorflow-gpu-04-cnn-gpu.py
#!/usr/bin/env python

from __future__ import absolute_import, division, print_function
import os
import time
import numpy as np
import tflearn
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '0'

from tensorflow.python.client import device_lib
def print_gpu_info():
    for device in device_lib.list_local_devices():
        print(device.name, 'memory_limit', str(round(device.memory_limit/1024/1024))+'M', 
            device.physical_device_desc)
    print('=======================')

print_gpu_info()


DATA_PATH = "/Volumes/Cloud/DataSet"

mnist = tflearn.datasets.mnist.read_data_sets(DATA_PATH+"/mnist", one_hot=True)

config = tf.ConfigProto()
config.log_device_placement = True
config.allow_soft_placement = True

config.gpu_options.allocator_type = 'BFC'
config.gpu_options.allow_growth = True
#config.gpu_options.per_process_gpu_memory_fraction = 0.3

# Building convolutional network
net = tflearn.input_data(shape=[None, 28, 28, 1], name='input') 
net = tflearn.conv_2d(net, 32, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") 
net = tflearn.conv_2d(net, 64, 5, weights_init='variance_scaling', activation='relu', regularizer="L2") 
net = tflearn.fully_connected(net, 10, activation='softmax') 
net = tflearn.regression(net,
                         optimizer='adam',                  
                         learning_rate=0.01,
                         loss='categorical_crossentropy', 
                         name='target')

# Training
model = tflearn.DNN(net, tensorboard_verbose=3)

start_time = time.time()
model.fit(mnist.train.images.reshape([-1, 28, 28, 1]),
          mnist.train.labels.astype(np.int32),
          validation_set=(
              mnist.test.images.reshape([-1, 28, 28, 1]),
              mnist.test.labels.astype(np.int32)
          ),
          n_epoch=1,
          batch_size=128,
          shuffle=True,
          show_metric=True,
          run_id='cnn_mnist_tflearn')

duration = time.time() - start_time
print('Training Duration %.3f sec' % (duration))
$ python3 tensorflow-gpu-04-cnn-gpu.py
2018-08-26 14:11:00.463212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.2545
pciBusID: 0000:01:00.0
totalMemory: 2.00GiB freeMemory: 258.06MiB
2018-08-26 14:11:00.463235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:00.717963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
/device:CPU:0 memory_limit 256M
/device:GPU:0 memory_limit 204M device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0
=======================
Extracting /Volumes/Cloud/DataSet/mnist/train-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/train-labels-idx1-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-images-idx3-ubyte.gz
Extracting /Volumes/Cloud/DataSet/mnist/t10k-labels-idx1-ubyte.gz
2018-08-26 14:11:01.158727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.158843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-08-26 14:11:01.487530: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-08-26 14:11:01.487630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 203 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
---------------------------------
Run id: cnn_mnist_tflearn
Log directory: /tmp/tflearn_logs/
---------------------------------
Training samples: 55000
Validation samples: 10000
--
Training Step: 430  | total loss: 0.16522 | time: 45.764s
| Adam | epoch: 001 | loss: 0.16522 - acc: 0.9660 | val_loss: 0.06837 - val_acc: 0.9780 -- iter: 55000/55000
--
Training Duration 45.898 sec

Speed increased significantly:

CPU version without avx2 FMA, time: 168.151s

CPU version plus avx2 FMA, time: 147.697s

GPU plus avx2 FMA, time: 45.898s

cuda-smi

CUDA SMI is used to replace NVIDIA SMI on MAC

NVIDIA SMI is used to check GPU memory usage.

Download it and put it in the directory / usr / local / bin /

$ sudo scp cuda-smi /usr/local/bin/
$ sudo chmod 755 /usr/local/bin/cuda-smi
$ cuda-smi
Device 0 [PCIe 0:1:0.0]: GeForce GTX 750 Ti (CC 5.0): 5.0234 of 2047.7 MB (i.e. 0.245%) Free

problem

Error ﹣ ncclallreduce

Recompile a copy of nccl ops.so

$ gcc -c -fPIC tensorflow/contrib/nccl/kernels/nccl_ops.cc -o _nccl_ops.o
$ gcc _nccl_ops.o -shared -o _nccl_ops.so
$ mv _nccl_ops.so /usr/local/lib/python3.6/site-packages/tensorflow/contrib/nccl/python/ops/
$ rm _nccl_ops.o

Library not loaded: @rpath/libcublas.9.2.dylib

This is because the dyld? Library? Path environment variable is missing from jupyter

Or the new version of Mac OS forbids you to modify unsafe factors such as dyld ﹣ library ﹣ path at will, unless you turn off SIP function.

Reproduce

import os
os.environ['DYLD_LIBRARY_PATH']

The above code will fail in jupyter because the environment variable dyld? Library? Path cannot be modified because of SIP

Solution: refer to the previous “environment variable” setting section

Segmentation Fault

The so-called segment error means that the memory accessed exceeds the memory space of the program provided by the system.

Solution: please confirm again that the correct version and compilation parameters are used, especially Xcode

Not found: TF GPU device with id 0 was not registered

Ignore this warning directly

GPU memory leaks???

Don’t know how to solve:(

Recommended Today

Singularity iPhone version officially launched

Recently, I haven’t updated my short book, technology blog, CocoaChina, etc. I’ve been busy developing my own product singularity app. I hope to solve our technical problems in this high-quality “app ape” product, so that more people know the singularity. We dig high-quality Internet technology articles every day for you to recommend (currently, it supports […]