When dealing with big datasets, running the code in a CPU-only environment is far to slow, using a GPU can make execution around 10 times faster.
Interestingly, the most difficult part of the hackathon was to get the code to run in a GPU enabled environment with TensorFlow (sort of the de-facto Machine Learning framework these days): It apparently is easy on a Windows machine with NVIDIA GPU, more or less impossible on MAC, and might work on Linux if you carefully download the exact versions of NVIDIA drivers (CUDA, cuDNN).
A better solution than fiddling for hours with setting up GPU support for Tensorflow is running it in the cloud:
Google Colab offers free GPU usage for running Jupyter Notebooks, which is an environment to share code, documentation etc. So it is good for trying out code, but not really suitable for iteratively editing code locally in your favorite IDE and running it.
A better solution for that use case is FloydHub: You edit your code locally, then upload it and run it on a system with a powerful GPU (Tesla K80 or better). Also, you can upload your datasets separately, so you don't need to download it in the script you execute. In our case, the dataset consists of images of cats and dogs.
The downside: It is not for free ;) 10 hours of GPU usage cost 10$ which seems fair; the danger here is that your script is malformed somehow and runs longer than expected or just never finishes. For the setup here each run was around 2 minutes so with 10 hours you can get quite far. Also, the first 2 hours of GPU usage are for free.
# This will create a directory "data" with subdirectories "train" and "test":
floyd data init kaggle-cats-and-dogs floyd data upload
Finally run the code on FloydHub:
floyd run --gpu --env tensorflow-1.8 --data peterpeterha/datasets/kaggle-cats-and-dogs/2:/data "python main.py"
# output logs in console e.g. for job 15:
floyd logs -t 15
Check out the logs of a successful run, which yields an accuracy of 66% and runs in 2 minutes. The accuracy is not that great yet, but at least now we have a set up over which we can iterate quickly.
Summary: Running Machine Learning code in the clode (here: FloydHub) seems more convenient than running it locally on a non-Windows environment; it can even be faster because the used GPU is better than a typical end-consumer graphics card.