Booz Allen Hamilton’s first annual Data Science Bowl attracted more than a thousand teams worldwide to compete in developing the best computer-based visual recognition system. The competition was hosted on a Kaggle platform for the duration of three months.
Automatically classify images containing various plankton species into 121 predefined categories.
Figure 1 – Representative examples of plankton species included in the training dataset.
Plankton are key to Earth’s massively intricate ecosystems. These organisms intake 25% of carbon dioxide released from burning fossil fuels every year. They also form the foundation for marine and terrestrial food chains. Therefore, a large and thriving plankton population is crucial for ocean health, among other benefits. Many studies on plankton population have relied on the ability to capture and classify plankton species in imagery. This competition contributes to a fundamental task in rapidly assessing and predicting ocean health, one plankton at a time.
It is all about convolutional neural networks (CNNs) these days. In fact, the top ten teams in the competition used some variant of a deep CNN architecture. Why convolutional neural networks? Because they sound cool, and the words just roll off the tongue. Actually, CNNs comprise a class of statistical/machine learning algorithms capable of automatically capturing expressive features of example data and associating those features to some objects of interest. That’s right. Once a model is defined and its parameters configured, the model is able to learn various intricate concepts without feature engineering and/or human input. There are three key components to a CNN model:
- Convolution layer – comprises a set of processing units or filters coupled with specialized, biologically-inspired “activation” functions that can learn and extract features in an input image.
- Pooling layer – reduces the spatial size of the input representation through the selection of “strong” features. Studies have shown that the inclusion of a pooling layer helps control overfitting and improves the generalizability of the CNN model.
- Classification/regression layer – learns a decision function that associates input features to a target concept of interest, such as categorizing plankton images into known species.
However, remember the adage: if something sounds like black magic, it probably is. CNNs are not black magic; they were inspired from biology and have sound theoretical foundations in statistical learning and optimization. But they do require some experience and intuition, and perhaps a lot of patience, to extract the maximum performance out of them.
Tweaking CNN parameters can be downright frustrating, but their state-of-the-art accuracy performance is well worth the effort. The novelty of CNNs that is publishable these days lies in:
- Careful crafting of the architecture
- Developing new activation functions for the convolution filters
- Introducing new ways of pooling or selecting relevant features
- Minimizing the adverse effects of overfitting, etc.
The first Data Science Bowl winner, team Deep Sea, employed a complex set of alternating convolution and pooling layers to achieve competitive accuracy performance. In addition, they employed a new pooling layer called cyclic pooling to edge out the other competitors. The third place team used a new pooling technique called fractional max-pooling in order to win a prize. Team Deep Sea’s method of cyclic pooling and their many other experiments are beyond the scope of this blog post. For the detailed prose of team Deep Sea’s winning submission, check out their blog at: https://benanne.github.io/2015/03/17/plankton.html.
The ubiquity of CNNs is evident in almost any visual recognition task and application. Given the ubiquity of CNNs in practice, it is relatively easy for one to conduct rapid prototyping for research and application in deep learning using open source, off-the-shell packages such as the following:
- C++ + Python + Caffe
- C++ + Python + Cuda-convnet2
- Python + Theano + Lasagne + scikit-learn + pylearn-2
- Lua + Torch
- And many others
So pick a language you are comfortable with, and start diving head first into convolutional neural networks. Who knows, maybe you could win a competition or two using them.
—Written by Vu Tran