First Machine Learning Steps with TensorFlow and FloydHub

TL;DR: Setting up a machine learning environment on an operating system other than Windows seems harder than just running the code in the cloud.

Recently Opitz Consulting hosted a Machine Learning Introduction with a Hackathon; the posed problem was the typical "Learn separating cats from dogs pictures" issue.

When dealing with big datasets, running the code in a CPU-only environment is far to slow, using a GPU can make execution around 10 times faster.

Interestingly, the most difficult part of the hackathon was to get the code to run in a GPU enabled environment with TensorFlow (sort of the de-facto Machine Learning framework these days): It apparently is easy on a Windows machine with NVIDIA GPU, more or less impossible on MAC, and might work on Linux if you carefully download the exact versions of NVIDIA drivers (CUDA, cuDNN).

A better solution than fiddling for hours with setting up GPU support for Tensorflow is running it in the cloud:

Google Colab offers free GPU usage for running Jupyter Notebooks, which is an environment to share code, documentation etc. So it is good for trying out code, but not really suitable for iteratively editing code locally in your favorite IDE and running it.

A better solution for that use case is FloydHub: You edit your code locally, then upload it and run it on a system with a powerful GPU (Tesla K80 or better). Also, you can upload your datasets separately, so you don't need to download it in the script you execute. In our case, the dataset consists of images of cats and dogs.

The downside: It is not for free ;) 10 hours of GPU usage cost 10$ which seems fair; the danger here is that your script is malformed somehow and runs longer than expected or just never finishes. For the setup here each run was around 2 minutes so with 10 hours you can get quite far. Also, the first 2 hours of GPU usage are for free.

The code to tell cats and dogs apart is hosted on github, and is a fork of Philipp Fehrmann's code, so almost all code was written by him for the Hackathon mentioned earlier.

To run your code on FloydHub, create a new project there, download the command line tools and then initialize one in your github repo with the same name:

git clone https://github.com/peter-ha/ML-Example-Steps.git

cd ML-Example-Steps

# create a repository on floydhub.com first and then init with the same name here:

floyd init peterpeterha/ml-example-steps

Then initialize the dataset, which can be downloaded from Microsoft for free; the github repository already contains a script to separate the code into training and test data:

cd ..

wget https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip

unzip kagglecatsanddogs_3367a.zip

# This will create a directory "data" with subdirectories "train" and "test":

python ML-Example-Steps/util.py

cd data

floyd data init kaggle-cats-and-dogs floyd data upload

Finally run the code on FloydHub:

floyd run --gpu --env tensorflow-1.8 --data peterpeterha/datasets/kaggle-cats-and-dogs/2:/data "python main.py"

# output logs in console e.g. for job 15:

floyd logs -t 15

Check out the logs of a successful run, which yields an accuracy of 66% and runs in 2 minutes. The accuracy is not that great yet, but at least now we have a set up over which we can iterate quickly.

Summary: Running Machine Learning code in the clode (here: FloydHub) seems more convenient than running it locally on a non-Windows environment; it can even be faster because the used GPU is better than a typical end-consumer graphics card.

--

Comments / questions? Feel free to let me know on Twitter: @peha23