I made a mistake without realising it today. I started to train a donkey car Neural Network using my CPU.
I had previously set up Donkey Car to train using my GPU automatically, but I recently updated CUDA and my graphics card driver and it must have broken my paths.
Either way, as I was watching my CPU slowly clock over Epoch after Epoch @ 100% utilisation, I decided to do a little bit of an experiential project.
Two questions came to mind:
1. How different is CPU vs GPU training?
2. Is any GPU better than no GPU (Jetson Nano vs CPU)?
EnvironmentCPU: i7-4790k
GPU: GTX 1080 Ti
Software: TensorFlow 1.15.2 with CUDA 10.2
Data Set: 10707 images.
All of the below assumes you have TensorFlow, CUDA and other parts installed.
CPU ResultsI thought ~40 seconds per Epoch was a good time and didn't take notice of all the warnings at the start when I launched TensorFlow from the python script. I have not done this in a while... I suspected something was wrong when it was taking so long after the first epoch.
I let it run it's course anyway.
Statistics
Average Epoch time: 51 seconds
Total Time: 29 minutes 40 seconds.
GPU ResultsAfter fixing up all my path variables for Python (this tutorial is really good for this), I was ready to be amazed!
BAM! That was fast.
Statistics
Average Epoch time: 3 seconds
Total Time: 2 minutes 5 seconds.
Total Improvement: 27+ minutes (93% improvement)
Jetson Nano ResultsLook, I have successfully trained a Donkey Car environment on the Jetson Nano before, but it is very 'touch and go'. Your best bet in this situation is to use something else to do the training task. For the Jetson Nano to work you have to edit swap files (give it up to 8+GB extra), turn off the GUI, check that the specialised TensorFlow is installed and then change the batch size to something small enough that it doesn't crash (which is sometimes a bit of trial and error).
Attempt 1 - Crash, not enough RAM
As you can see, my Jetson Nano frozen on this training data, without completing a single Epoch. I ran it again with a smaller batch size but it did the same thing again. I didn't want to waste too much time on this - it can do it - but in most cases it is not a good idea.
Attempt 2 - 12 GB Swap File,128 Block Size
When I did manage to get it working last time, it was on a smaller dataset and still took longer than my CPU - 45 minutes. So this dataset would have taken potentially a very very long time. If you watch the video above you can see it was averaging about 142 to 245 seconds (2 min 22 sec to 4+ minutes) per epoch. Going by how many Epochs the CPU and GPU did, this would take about 1 hour 25 minutes or longer to finish. UPDATE: It took 2 h 19 min.
The real bottleneck is the RAM. There's not enough of it.
This was taken in the middle of the 10th Epoch. You can see that the internal 4 GB RAM is FULL. Additionally the Nano is consuming 4.1 GB of Swap Memory. The swap memory is on the SD Card and this really slows down the process, as it has to continually copy data between the SD Card and RAM.
I suspect if the Jetson Nano had 8 GB of RAM (instead of 4), there would be a significant improvement on these results. 16 GB would be even better again.
Summary & ResultsUsing a GPU saw a huge improvement in efficiency when placed against the CPU. This was to be expected. Graphics cards these days are decided to crunch all that data in no time at all. Even older generation graphic cards are better than most top-tier CPUs when it comes to completing this task.
I find it unfortunate that the small specialised GPU in the Jetson Nano is starved of RAM and performance making it not great for compiling TensorFlow Neural Networks. It does only have 128 CUDA cores vs 3584 in the 1080 Ti. It was never designed for this anyway. It had the potential to be a all-in-one package but it just doesn't quite reach it yet.
Circling back to the questions at the start:
1. How different is CPU vs GPU training?
MUCH BETTER!
2. Is any GPU better than no GPU (Jetson Nano vs CPU)?
Yes and No. It depends on what you are trying to use and your setup. In most cases, using a GPU is the absolute best option. However, if your computer is staved of RAM or CPU power (like the Jetson Nano) it is probably best to find another computer to use or you could be waiting an eternity.
So, in practice, if you have a GPU - you should always set up TensorFlow to use it (no matter how difficult it is to set up). Do not ever use your CPU again and save yourself a heap of time - even for the smaller tasks.
Comments
Please log in or sign up to comment.