Graphic by Marc Smith, Director, Social Media Research Foundation
The Data Science Bowl is dedicated to using the power of data to solve the world’s most difficult challenges. But we know that uncovering breakthroughs is impossible alone.
In that spirit, on Thursday, January 26, the Data Science Bowl hosted a Twitter chat for the community of problem-solvers dedicated to thinking bigger, asking the right questions, and this year, ending lung cancer as we know it.
Kirk Borne (@KirkDBorne), Booz Allen’s principal data scientist (and top Twitter big data influencer) moderated the chat, which also featured top data scientists and machine intelligence experts from Kaggle and Booz Allen.
Kirk kicked off the chat by asking participants to share impactful examples of social good as a passion within the data science community. Participants cited the current and past Data Science Bowls as prime examples, including past years’ challenges around ocean and heart health. Other examples included using data science to track and fight the Zika virus, improving outcomes at animal shelters (a Kaggle competition), Bloomberg’s Data for Good Exchange, and more.
Participants agreed that prizes—like this year’s $1 million purse provided by the Laura and John Arnold Foundation—can play a major role in focusing data scientists’ efforts on unique problems. Kaggle Head of Competitions Will Cukierksi (@wcukierski) also noted that while there’s no shortage of data in the world, singular challenges are “a rarer beast,” and the search for the right Data Science Bowl problem is a months-long process.
This year’s challenge hit especially home for a number of participants, some of whom related their own cancer scares and loss of loved ones. Given its wide prevalence, cancer affects everyone, even if not directly. Moreover, it doesn’t take domain expertise to make a potential breakthrough, as Booz Allen Vice President Angela Zutavern (@AngelaZutavern) pointed out that the winners of last year’s competition were hedge fund traders, not traditional data scientists.
Kirk noted that ensemble models used to be widely used in data science competitions, and he asked whether the rise of deep learning was moving the data science community away from diverse approaches. Booz Allen Senior Vice President Mark Jacobsohn (@markjacobsohn) acknowledged that while new techniques like deep learning and machine intelligence could be enablers, it takes diverse thinking and approaches to make the most impact.
Chat participants were quick to give examples of approaches, tools, and tutorials that could help with data gathering, preparation, and inserting domain knowledge. Will advocated for the value of reproducible pipelines for data gathering, and gave a shout out to the most upvoted kernel in Kaggle history. Meanwhile, Angela said data science shouldn’t be considered a specialty area where domain expertise is simply inserted—instead, it should be “a fundamental fabric” across the organization.
Kirk also asked participants to share “ah-ha” moments from data science projects, as well as machine learning tips. Participants suggested following the competition forum on Kaggle.com to stay up-to-date, as well as reading posts from competition winners on the Kaggle blog No Free Hunch. Will quipped that even as a seasoned data scientist, he regularly has, “AH-HA, I have another bug in my code!” moments, and suggested trying regularization and cross-validation when dealing with low sample sizes in computer diagnosis.
Kirk closed the chat by reminding participants that data science is “a team sport. Collaborate with others. Don’t be afraid to say, ‘I don’t know about that.’ Please work w/ me.”
Search for #DataSciChat on Twitter to read the full discussion, and use the hashtag #DataSciBowl to continue the conversation!
—Written by Kirk Borne