The Challenge

In 2016 – 2017, competitors used anonymized, high-resolution lung scans from hundreds of patients provided by the National Cancer Institute (NCI), to create algorithms that can improve lung cancer screening technology. The participants created algorithms that can accurately determine when lesions in the lungs are cancerous and thereby dramatically decreasing the false positive rate of current low-dose CT technology.

Cancer Moonshot

In the U.S., cancer will strike two in every five people in their lifetimes. But it affects all of us. That’s why, in 2016, the office of the Vice President announced the Cancer Moonshot. It’s an audacious effort to make a decade’s worth of progress in cancer prevention, diagnosis, and treatment in just five years.

The 2017 Data Science Bowl® will pursue one of the Cancer Moonshot’s key goals: unleashing the power of data against this deadly disease. Presented by Booz Allen and Kaggle, the competition will convene the data science and medical communities to develop cancer detection algorithms, and help end the disease as we know it.

The Lung Cancer Detection Challenge

Lung cancer is one of the most common types of cancer, with nearly 225,000 new cases of the disease expected in the U.S. in 2016.

Early detection is critical, as it opens a range of treatment options not available when cancer is detected at later, more advanced stages. Low-dose computed tomography (CT) is a potential breakthrough technology for early detection, with the ability to reduce deaths by 20%.1 Often, suspicious lesions identified in screening are initially assessed as high risk of cancer, but after additional follow-up tests, they turn out to be non-cancerous (false positives from the initial screening).2 Can machine learning reduce the number of radiology exams flagged for potentially unnecessary follow up and avoid patient anxiety?

1Aberle DR, Adams AM, Berg CD, et al.: Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 365 (5): 395-409, 2011.
2Low-Dose CT has historically resulted in high false positive rates of around 25% (Aberle, et. al., New England J Med, 2011, 365:395-409).

The Prize

The Data Science Bowl® awarded a total prize purse of $1 million, provided by the Laura and John Arnold Foundation, to those who observed the right patterns, asked the right questions, and in turn, created unprecedented impact around this high-priority issue.

In addition, $5,000 was awarded to each of the top three most highly voted Kernels (Total of $15,000) and $10,000 in prizes awarded to three random drawing winners for sharing their Data Science Bowl journey on social media.

2017 Partners

National Cancer Institute

Data Science Bowl® Sponsor (

Keyvan Farahani
Keyvan FarahaniProgram Director
Read Bio
Paul F. Pinsky
Paul F. PinskyChief of Early Detection Research
Read Bio

Radboud University Medical Center

Diagnostic Image Analysis Group

Dr. Bram van Ginneken
Dr. Bram van GinnekenLead Scientist
Read Bio

2017 Sponsors

A special thank you for the National Cancer Institute for its support with this year’s data challenge.

2017 Supporting Organizations

A special thank you for the National Cancer Institute for its support with this year’s data challenge.

Breath In The Future.

Breathe Out the Past.


Submit Your Ideas

Got a big idea for us?
Ready to re-invent the future?
See potential ahead?

In our first contest, we dove deep with a microscopic lens to improve ocean health. In our last, we went on a life-saving mission to spot nuclei to diagnose killer diseases. In each case, we did what couldn’t be done before: bring the awesome creativity and capability of data scientists to open the doors to new approaches.

What should we tackle next? If you have ideas, let us hear from you!

We’re in the hunt for the next big problem to solve—a problem with the potential to change the world. If selected, the power of the entire data science community will be harnessed against it.

Contact us to submit your ideas or email Include an overview of the problem, your contact information, a brief description of the data, and where it can be obtained.

Contact Us