Compiling TensorFlow to use all available CPU instructions

Grant Stephens
Ravelin Tech Blog
Published in
4 min readMar 3, 2020

--

So now that you’re running your models in your language of choice (More info on how to do that here) you think that life is pretty good, but there is one thing that is bugging you. Every time you start running your model you get a log line that looks something like this:

tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA

This blog post is about getting rid of that log line and in the process speeding up your model predictions.

I won’t go into details about these instructions in this post, but this stack overflow answer has a good background with more links to read more about them.

Let’s get started. First things first, we’re going to be doing this in docker so that it is easily reproducible and also means that you can get your favorite cloudbuilder to do the heavy lifting for you so that you don’t render your laptop useless for a couple of hours while it compiles TensorFlow.

We’re also going to be doing this for TensorFlow version 2.0.1. Currently there is no pre-compiled libtensorflow for version 2 and up (See this issue for more). These instructions work for other versions, but some adjustments to the bazel version amongst others will be needed.

The dockerfile I’m going to be going through is available here, and so I will just discuss the important lines in the blog post.

Everything starts off with installing the required dependencies, which can take a while. The “important ones” here are gcc and g++. By chance the version that is installed by default on Ubuntu is the right one, but different versions of TensorFlow may require different versions of the compilers.

Next we set a rather large number of environment variables. The reason for this is that we do not want to use the interactive configure script and so setting the environment variables is how to avoid that. Two important ones here are TF_VERSION which sets the version of TensorFlow and USE_BAZEL_VERSION. This determines the version of bazel that bazelisk will install. Once again, this version is tightly coupled to the version of TensorFlow you are trying to install, but luckily the error is helpful and will tell you which version to use if you choose the wrong one.

Next we install some Python dependencies- why we need these is actually a mystery to me as we’re compiling C++ code, but nonetheless, there they are. I think this will also allow you to compile the Python pip package later if you need to.

Next we get the TensorFlow source code and then we install bazelisk. This makes using bazel bearable as it will automagically install the right version for you based on the USE_BAZEL_VERSION you set above. Note that we rename bazelisk to bazel to make it easier to use, even though this is quite bad practice.

Due to all the environment variables we set above the configure step is very straight forward. If you want to run this interactively you could not set all the variables and then build the docker image up to this point and then bash into it and run the configure interactively to see all the options and some more details about them.

Finally it is time for the compile. A couple of things to note here:

  • We hide warnings using -copt=-w to make the output less verbose as it is very chatty otherwise.
  • The all important CPU options are as follows: --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both By setting these we make sure that our TensorFlow library uses all the available CPU instructions.
  • The jobs flag is also set, in this case to 26- you’ll probably want to change this if you’re running on your laptop- essentially this is the number of concurrent jobs the compile will run. For our cloudbuild job we use a 32 core machine- using all of them is not a good idea and so that is where the magic 26 comes from.

OK, so let’s build this image- the magic incantation you need is:

docker build -t tf .

Now I found this would take about 4 hours on my 4 core machine. If it all succeeded you should be able to run:

docker run -it tf /bin/bash

Which will drop you into bash of your newly built image.

OK, cool, now what? You have a docker image that successfully built TensorFlow from source, how do you use it?

With the image running you can copy out the tarball with libtensorflow in it like so:

docker cp yourimageid:/tensorflow/bazel-bin/tensorflow/tools/lib_package/libtensorflow.tar.gz tf.tar.gz

That should give you a tf.tar.gz in your current working directory. Now you can copy this to your production images, or if you want to use it locally you can unzip and “install” it by running:

sudo tar -C /usr/local -xzf tf-v2.0.1.tar.gz && sudo ldconfig

In the next post we’ll check to see if these extra CPU instructions make any difference to prediction times.

--

--