Data science is often regarded as an elite field. It draws experts with advanced degrees in math and science, fluency in multiple programming languages, and a firm grasp of statistics and probability theory. To reach the top of a Kaggle challenge like the Data Science Bowl is to demonstrate a feat of technical wizardry.
At the same time, there is a movement within both corporate and academic spaces to bring data science capabilities to the masses. Tools are now available that lower the barrier to entry for business users. An upcoming product from Booz Allen Hamilton will allow anyone to ask questions of Big Data sources in real-time using human-readable language. The traditional analytics workflow is being automated and crowd-sourced, and the programmers and analysts displaced as gatekeepers of information.
The irony for data scientists is that the more robust and accurate we make these tools, the less need there is for our intervention in the analysis. A recent provocative article in the Harvard Business Review puts it this way: “You need an algorithm, not a data scientist.” Are we just working to make ourselves obsolete?
In reality, there’s much more to data science than plug-and-play statistics and off-the-shelf machine learning algorithms. Anyone can ask a model to spit out an array of numbers with some correlation to a training set; a good data scientist will know more or less how the model arrived at those numbers, and what can safely be inferred from them. This is the final and most difficult step in any machine learning or analysis task: interpreting the results back to the human who asked the question. And it is not so easily automated.
When we suggest that machines alone can do the job of a data scientist, we risk overhyping the analytics and reaching bad conclusions. Some warn now of a Big Data “bubble,” where too many failed predictions and unrealistic expectations lead to a quelling of interest in data science investment. To be fair, not all data scientists are equally rigorous in qualifying the output of their models and incorporating uncertainties; it can be a difficult business. But as these new “self-service” tools proliferate, so too will the opportunities for bad predictions. This makes it all the more important that data scientists remain involved not just in their development, but also in their application.
The democratization of data analytics will eventually result in a significant change to the data science profession. Some of us will be freed to work on more interesting and advanced technologies; others will need to take a stronger role as educators and guides, to help our clients and coworkers use these new tools effectively. Either way, data scientists will remain a vital investment for organizations looking to remain competitive. We have a long way to go before machines can learn to fill that role.
—Written by David Norris