The 2015/2016 Data Science Bowl is scored using a relatively little-known statistic, the Continuous Ranked Probability Score (CRPS). A detailed mathematical explanation of CRPS is available here and on the Data Science Bowl Kaggle evaluation page. It’s difficult to conceptualize the meaning of a specific CRPS, especially since the score can often appear “low” as its value nears zero. Still, the score has meaningful implications for the utility of your algorithm in a clinical setting.
To understand these implications, consider the use of the volume distributions that serve as input to the ejection fraction metric. Volumes projected by the distribution will be used as the end-systolic and end-diastolic volumes in the formula to calculate ejection fraction which is used to diagnose heart disease. As shown in the infographic, altering these volumes by 10 ml causes a different diagnosis, unacceptable in a medical setting. Since volume distributions vary drastically as do their interpretations, it is not easily said how CRPS correlates exactly with its effectiveness in predicting volume. However, in a clinical setting, a change of CRPS by a one hundredth proves to be significant. Because this “small” change in score has an impact on the utility of an algorithm, competitors must strive to improve on their work even when the difference in submission scores seems miniscule. This post on the Data Science Bowl Kaggle forums has further discussion on CRPS and its medical implications.
—Written by Jason Farbman