The dataset for the 2016 Data Science Bowl presents several challenges for automated exploitation. As the images were collected in a real world setting, with several types of sensors, there is a great deal of variation from patient to patient with respect to image orientation, pixel spacing, and intensity scaling. All of these factors should be dealt as part of any competitive solution; while they may not be required for a good solution, a winning design requires every last bit of information to be squeezed out of the data.
Even though our main approach involves the use of Deep Learning convolutional neural networks, which can perform feature extraction and classification better than nearly any other machine learning model, we still need to, as best as possible, present a consistent input to the network so it can spend its time learning the parameters of the underlying heart volume model rather than just the peculiarities in the dataset.
First among the challenges is ensuring the physical geometry of the images is consistent across the dataset. While most of the study folders have images oriented vertically (i.e. taller than they are wide), some are oriented horizontally (i.e. wider than they are tall). This is easily detectable, and the horizontal images can be rotated counter clockwise by 90°. While there will still be some small rotation differences between studies, this removes the majority of the rotational distortion. Following a rotation, we crop the images to be square. While not required for our downstream algorithms, this does reduce the number of pixels we have to process, leading to better (and faster!) learning.
The next challenge is that not all of the studies have the same resolution. It is important with this dataset to understand the relationship between the image size in pixels (which varies by more than a factor of two) and the true image size in real-world coordinates (millimeters in our case). Simply resizing the images to have the same dimensions will leave distortions in the data that may cause un-necessarily slow learning or insufficient ability to generalize to new datasets. Fortunately, we know the pixel spacing of each image from the metadata (which varies from ~0.6mm to nearly 2mm), which tells us the true physical size of each image. Therefore, prior to any image resizing, we can scale all images to span the same physical extent by specifying a desired pixel spacing, adding or cropping a few pixels at the borders to ensure consistent dimensions, then resizing to our desired size for processing.
The final step prior to processing involves adjusting the contrast of the images to be consistent across the dataset. While the introduction of Rectified Linear Unit (RelU) activation functions has reduced the need to scale or otherwise whiten the inputs when compared with sigmoid or hyperbolic tangent activation functions, it’s still good practice to present to the network a consistent dynamic range across the inputs. While we would like to keep our exact technique under wraps for a while, suffice it to say that we are non-linearly scaling the image intensities, as the raw images have some very high pixel values while most of the image is at a lower level. Our scaling has the effect of raising the contrast in the lower intensity regions, and considerably decreasing it in the high intensity regions.
You can see in some examples below the results of our pre-processing, and how it has contributed to ensuring the inputs to our solution are consistent. We are also investigating other image processing techniques, including various methods of de-noising and time based feature extraction, segmentation, and even advanced 3D Fourier feature extraction. But we’ll talk about those in a future post! As a preview, we’ve included some animated examples of our automatically generated segmentation mask along with the raw and pre-processed imagery.
Study 9, Slice 10. From Left to Right is the raw Imagery, scaled, and finally segmented.
Study 10, Slice 10. From Left to Right is the raw Imagery, scaled, and finally segmented.
Study 125, Slice 17. From Left to Right is the raw Imagery, scaled, and finally segmented.
—Written by Peter VanMaasdam