Tencia Lee, a Math graduate and hedge fund trader, partnered with Qi Liu, a PhD in Physics also with a hedge fund background, to devise the winning algorithm in this year’s Data Science Bowl. They spent more than 100 hours each in evenings and on weekends building and testing algorithms. Working in parallel, Lee and Liu built and trialled hundreds of algorithms to read the heart scans. Their efforts paid off, with the largest prize in the competition, among 993 data scientist contestants in the Data Science Bowl. In this blog, Tencia Lee reveals the work behind the win.
In November of 2015, I decided I would enter the next Kaggle competition involving computer vision, no matter what the topic. I had been studying the applications of neural networks in image analysis for months and I was excited to try out everything I had learned. I had no idea at that time that my participation in the Data Science Bowl would be so engaging – and hold so much potential to save lives.
When I learned the topic for the challenge, evaluating MRI readings of the heart’s squeezing ability to diagnose heart disease, I was intrigued. The idea of working on a math problem with the ability to save lives was profoundly meaningful to me.
The NIH provided an incredible dataset of over 100,000 MRI images from 700 patients and the time consuming diagnostic made by their cardiologists. The images included hearts from healthy people and those suffering from various degrees of heart disease. As I dug in, I was struck by the fact that these images came from real men, women, children and the elderly. It drove home the importance of constructing an algorithm that could accurately process results for all.
I worked with Qi Liu on a solution that would use many different neural networks to process these images. We encountered several data related challenges along the way, but by far the biggest challenge was the very small number of accurate cross-sectional area contours we had to work with, just 800 contours available from Sunnybrook Hospital. From these few images, we did our best to extrapolate what contours should look like on the 100,000 images in the NIHprovided dataset.
It can take a skilled cardiologist up to 20 minutes to measure and read a single set of MRI images. That’s precious time that such a skilled professional could be spending with patients. By publishing an open source algorithm, diagnosis can be made quickly by accurately measuring cardiac function and providing an objective, reference diagnostic model, free of measurement bias.
I hope the algorithm that we have built will one day lead to a truly effective, dependable data processing tool that can perform this important calculation 100 times as fast but just as accurately as well as a skilled cardiologist. As an algorithmic trader, I never dreamed that I would have such an impact in the field of medicine. Our solution is just the start, but I’m eager to see what other advances data science can bring to human health.
With heart disease as the most common cause of death, Lee and Liu’s efforts have life-saving potential in transforming how heart disease is diagnosed. Booz Allen’s Steve Mills, who oversees a team of 600 data scientists at the consultancy, says automated innovation will most certainly spread to other healthcare areas. “These analytics that enable doctors to perform logistical or diagnostic tasks in a much more streamlined way are a pretty powerful thing.”
—Written by Tencia Lee