In our last blog post we described an end-to-end deep learning solution to this challenge. By “end-to-end” we mean that the raw pixels constituting a SAX study for an individual patient were fed into a convolutional neural network (ConvNet) and predicted left ventricle (LV) systolic and diastolic CDFs came out the other end – the only other processing that took place was the zero mean unit variance (ZMUV) pre-processing of the images. Whilst this approach to the problem is elegant in its simplicity, it is also a very challenging function for a neural network to learn. This is because there is no explicit training signal for the area of the left ventricle that should be measured from each image, just the whole volume for the SAX study.
Due to the challenges of taking an end-to-end deep learning approach, we have also been exploring segmentation based approaches to the challenge. In other words, we want to be able to automatically divide each SAX image into areas representing continuous muscle, other tissue, or blood and then use these segments to identify precisely those pixels that correspond to the interior of the LV. This gives us direct access to the area of the interior of the LV by simply counting the pixels in the segment – we can then integrate multiple area estimates over the length of the heart to estimate the LV volume. In our first blog post we showed some of our results where the interior of the LV was accurately segmented. The main difficulty with making use of these segmented images to estimate the LV volume is to automate the identification of which segment in the image represents the LV. We have been exploring a couple of different strategies for doing this.
Firstly, we have trained a ConvNet to predict the center of the LV in an image from the raw pixels. An example output of this network is shown below.
Figure 1: Predicted left ventricle location – the white dot shows the predicted location of the center of the LV. The middle image shows the segmented image and the right image shows the convex hull of the segment predicted to be the LV
This network can accurately predict the LV location for slices that fall well within the bounds of the heart on the vertical axis. Cases where the slice is close to the top or bottom of the heart are much more difficult and the location prediction results much less reliable. Segmentation for these slices is also much more difficult, so overall LV segmentation results are variable. Here is an example of one of those more difficult cases:
Figure 2: Poor left ventricle location prediction
In order to train the ConvNet we used a simple neural network architecture consisting of three convolutional layers feeding a final fully-connected layer with two neurons. We treat training the network as a regression problem where we compute the L2 loss, i.e. average Euclidean distance, between the (x,y) coordinate of the true center of the left ventricle and the two output neurons. We carried out this training using NVIDIA’s DIGITS 3 deep learning interface to the GPU accelerated Caffe framework. In version 3 support was added for training regression networks. Training the network takes just a few minutes using a Titan X GPU.
Obviously, the competition provided training data does not give the center of the left ventricle. We manually labelled approximately 6000 images using the Python labeling tool Sloth . This took about three-quarters of the Super Bowl to do. We believed we would need at least this many images to train a reliable localization network, which is why we chose to just identify the center of the ventricle rather than a bounding box or contour/outline.
In addition to the ConvNet localizer, we also examined an approach using features generated from each candidate LV blob in the segmentation mask. We use the blob area, filled area, convex hull area, distance from image center, eccentricity, Hu moments, and other features to train a Support Vector Machine to assign a likelihood that a blob represents the LV. To do this, we manually labeled several hundred images using Sloth. This approach has comparable results to the ConvNet.
In this last week of the first phase of the competition we are going to collect together all our individual approaches to the challenge and generate a single ensemble submission. As always, we’re happy to discuss our approaches on the forums.
—Written by Jon Barker