With the next Data Science Bowl just around the corner, I set out to prepare myself for the competition. The truth: I’m not a coder.
I have an interest in data science. I appreciate the process—and the results. I’m open to advice from the best of the best. With that in mind, I set out to find the top tips and tricks buried within last year’s competition forums. What I found is a treasure trove for anyone who is going to participate, from beginner to seasoned pro.
First, let’s talk about software.
- Most competitors used Python due to its numerous processing and learning packages.
- R was also mentioned, but it was generally not powerful enough for large datasets.
- A few users suggested Caffe (https://caffe.berkeleyvision.org/), which is specifically designed for deep learning frameworks.
- For those using Linux, ImageMagick was preferred as it was able to process data augmentation quicker than Python (https://www.imagemagick.org/script/index.php)
Some thoughts on pre-processing and training:
- Specifically for image learning, most competitors found that rotating, scaling, and flipping images helped to improve the performance of their algorithm.
- Some competitors felt that pre-processing was more valuable than self-training.
A note about testing:
- Beware of over fitting. When using multiple layers of deep convolution networks, layers with lots of parameters tend to over fit more than layers with fewer parameters. However the lowest layers of a convolutional network are nearly immure to over-fitting as they see the most amount of data.
There are really incredible, free, online classes available to add to your knowledge base. Jump on and get smart.
Coursera Class: Neural Nets
Deep Learning Documentation: Convolutional Neural Nets:
The next Data Science Bowl challenge will be announced in December, but I’m putting my skills to the test with a few challenges before then. Armed with advice from last year’s competitors, I’m confident I can be a data scientist.
—Written by Julia Longinotti