You are on page 1of 66

Image Display, Enhancement, and Analysis

Deep Learning in
Medical Image Analysis

Dinggang Shen

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning

Department of Radiology and BRIC, UNC-Chapel Hill


Supervised Learning

For imaging-based diagnosis, manual


labeling is expensive, and hence
ground truth data are limited.

Department of Radiology and BRIC, UNC-Chapel Hill


Unsupervised Learning

Methods Limitations
O PCA Linear
Not optimal for non-Gaussian data
O Gaussian Mixture Require knowledge for the number of clusters
Models Challenging when applied to high-dimensional
O k-Means data
O ICA Linear model
O Sparse Coding Shallow model (e.g., single-layer
O Non-Linear Embedding representation)

Department of Radiology and BRIC, UNC-Chapel Hill


All these methods involve just
one step of mapping

Mapping is shallow, not deep!

Thus, not able to represent


complex mapping!
Department of Radiology and BRIC, UNC-Chapel Hill
Deep Learning Why hot?

n Deep mapping and representation


n Won the 1st place in many
competitions.
n Speech recognition: Android voice
recognition (25% error reduction) -
http://www.wired.com/2013/02/android-neural-network/

n Industrial applications (Google,


IBM, Microsoft, Baidu, Facebook,
Samsung, Yahoo, Intel, Apple,
Nuance, BBN, ...)

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Why hot?

The following 7 Slides edited from Dr. Yoshua Bengios tutorial

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Why hot?
p p p
 

   
   



 

 




     
     




 
 
  
   

  
 
 
  


 
 
   



Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Why hot?

n Each level transforms the


data into a representation
which can be easily
modeled Unfloding it
more will map the original
data to a factorized
(uniform-like) distribution.

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Why hot?

Performance increase
with layers
Department of Radiology and BRIC, UNC-Chapel Hill
Deep Learning Why hot?

Department of Radiology and BRIC, UNC-Chapel Hill


Neural Network Why not working

n Issueswith previous neural network


(NN), such as Back-Propagation (BP)
n Gradient-based method Propagate errors
from the last layer to the previous layers
n Last layer represents high nonlinear function, i.e., a
jump function in binary classification unstable and
large gradient in small range, but zero in most places
difficult to propagate errors to previous layers

Department of Radiology and BRIC, UNC-Chapel Hill


Neural Network Why not working

n Effect of Initial Conditions in Deep Nets


n Random initialization
n Unsupervised pre-training
O Random initialization
O No two training trajectories
end up in the same place
Huge number of effective local
minima
O Pre-training: Transfer
knowledge from previous
learning (representation and
explanatory factors) cases
with few examples shared
underlying explanatory factors,
between P(X) and P(Y|X) O Unsupervised pre-training

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Why working now

n 3 Main Reasons:
1) New layer-wise training algorithm (Science 2006)
n Each time, train on simple task
2) Big data, compared to 20 years ago
3) Powerful computers
n Previousalgorithms may be theoretically working, but
practically not converged to good local minima with the
previous less-powerful computers.

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning

Restricted
Boltzmann
Machine

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Greedy Training

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Greedy Training

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning Greedy Training

Department of Radiology and BRIC, UNC-Chapel Hill


Stacked Auto-Encoder

Department of Radiology and BRIC, UNC-Chapel Hill


Stacked Auto-Encoder

Department of Radiology and BRIC, UNC-Chapel Hill


Application 1:
Hippocampus Segmentation Using
7T MR Images

O M. Kim, G. Wu, D. Shen, Unsupervised Deep Learning for Hippocampus Segmentation in 7.0
Tesla MR Images, MICCAI Workshop on Machine Learning in Medical Imaging (MLMI), 2013.
O M. Kim, G. Wu, W. Li, L. Wang, Y.-D. Son, Z.-H. Cho, D. Shen, Automatic Hippocampus
Segmentation of 7.0 Tesla MR Images by Combining Multiple Atlases and Auto-Context Models,
Neuroimage 83:335-345, 2013.

Department of Radiology and BRIC, UNC-Chapel Hill


Hippocampus Segmentation
n Importance
The volume of the hippocampus is an important
trait for early diagnosis of neurological diseases
(e.g., Alzheimers disease)

n Challenges
Hippocampus
n The hippocampus is small (35157mm3)
n The hippocampus is surrounded by complex structures
n Low imaging resolution (111mm3) of 1.5T or 3T MRI
scanners

Department of Radiology and BRIC, UNC-Chapel Hill


Challenges in 7T images

n The characteristics of 7T MR images


n Much richer structural information
n Severe intensity inhomogeneity problem
n Less partial volume effect

1.5T (1 x 1 x 1 mm3) 7.0T (0.35 x 0.35 x 0.35 mm3)

Comparison between 1.5T and 7T MR images

Department of Radiology and BRIC, UNC-Chapel Hill


Hand-Crafted Features

n Limited discriminative power


a b c

b c

Haar filters

voxels
in the
image
a patches

Extracting patches from a 7T MR image


Responses of Haar filters for the image
patches in a-c

Department of Radiology and BRIC, UNC-Chapel Hill


Deep-Learning Features
Proposed method:
Multi-atlas segmentation
Classification using high-level shape information via auto-
context models (ACM)
Hierarchical feature representation via unsupervised deep
learning

Basis filters generated by Context feature


unsupervised deep learning

Department of Radiology and BRIC, UNC-Chapel Hill


Hierarchical Feature Extraction

Stacked two-layer convolutional ISA (Independent


Subspace Analysis)

2nd ISA layer


Activations P
in 2nd layer

Basis filters W
in 2nd layer

Dimension-reduced
activations from 1st layer
Activations
P in 1st layer
PCA

Basis filters W
in 1st layer
Learned basis filters by the 1st ISA

Image
patches X

1st ISA layer

Department of Radiology and BRIC, UNC-Chapel Hill


Multi-Atlas Segmentation

image learned
patches 2-layer features Classifier
ISA sequence 1

Atlas space 1
image patches
Aligned
training image learned 2-layer ISA
images patches 2-layer features Classifier classification
in each ISA sequence 2 maps 1N
atlas
Adaptively weighted
space Atlas space 2 fusion
1N

probability map

Level set
image learned
patches 2-layer features Classifier
segmentation
ISA sequence N result
Atlas space N Subject image space
Training Stage Testing Stage

Department of Radiology and BRIC, UNC-Chapel Hill


Qualitative Evaluation

Ground Truth Haar + Texture Features Hierarchical Features

Department of Radiology and BRIC, UNC-Chapel Hill


Quantitative Evaluation
Overlap metrics
Precision

Recall
Relative overlap
Similarity index

Comparison Results Using 20 Leave-One-Out Cases


P R RO SI

Hand-Crafted Haar + Texture Features 0.843 0.847 0.772 0.865

Hierarchical Patch Representations 0.883 0.881 0.819 0.894

Department of Radiology and BRIC, UNC-Chapel Hill


Infant Brain Segmentation
The image cannot be
displayed. Your computer The image cannot be
may not have enough displayed. Your computer
memory to open the image, may not have enough
or the image may have been memory to open the image,
corrupted. Restart your or the image may have been
computer, and then open the corrupted. Restart your
     computer, and then open the
    

T1
appears, you may have to

CSF
delete the image and then appears, you may have to
insert it again. delete the image and then
insert it again.

The image cannot be


displayed. Your computer The image cannot be
may not have enough displayed. Your computer
memory to open the image, may not have enough
or the image may have been memory to open the image,
corrupted. Restart your or the image may have been

T2
computer, and then open the corrupted. Restart your

T2
     computer, and then open the

GM
appears, you may have to     
delete the image and then appears, you may have to
insert it again. delete the image and then
insert it again.

The image cannot be


displayed. Your computer may
not have enough memory to
open the image, or the image

FA
may have been corrupted.
Restart your computer, and

FA WM
      
  
have to delete the image and
then insert it again.

Input modalities consist of T1, T2, and fractional anisotropy (FA) images of infant brains
Each pixel is segmented into cerebrospinal fluid (CSF), gray matter, and white matter
Isointense stage (~6-8 months of age) brains are very difficult to segment
# of trainable parameters is ~5.3 million
W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, D. Shen: Deep Convolutional neural networks for
multi-modality isointense infant brain image segmentation. NeuroImage15

Department of Radiology and BRIC, UNC-Chapel Hill


Application 2:
Registration of Brain MR Images

O G. Wu, M. Kim, Q. Wang, B.C. Munsell, D. Shen, Scalable High Performance Image Registration
Framework by Unsupervised Deep Feature Representations Learning, Revised for IEEE TBME, 2015.
O G. Wu, M. Kim, Q. Wang, S. Liao, Y. Gao, D. Shen, "Unsupervised Deep Feature Learning for
Deformable Image Registration of MR Brains," MICCAI 2013.

Department of Radiology and BRIC, UNC-Chapel Hill


Image Registration

Determine accurate correspondences between images

Individual Model

Department of Radiology and BRIC, UNC-Chapel Hill


Feature Extraction

n Hand-crafted features are NOT reusable


n New learning-based framework for determining
intrinsic feature representations

Department of Radiology and BRIC, UNC-Chapel Hill


Learned Features

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning

Morphological signatures
for image registration

Input image patches

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning

Difficulty #1: How to determine the number of hidden nodes


and the number of layers?

Solution: Use affinity


propagation to roughly
estimate the number of
hidden units.

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning

Difficulty #2: How to deal with high


dimensional training data?
Solution: Use the convolutional
RBM in each layer
Nw
h1

Number of hidden nodes

Nv Nv-Nw+1

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning

Difficulty #3: How to deal with large number of training


samples?
Solution: Select key points across training images to
focus on distinctive image regions.

Department of Radiology and BRIC, UNC-Chapel Hill


Symmetric Image Registration

S-HAMMER (Symmetric HAMMER)

Department of Radiology and BRIC, UNC-Chapel Hill


ADNI Dataset

Training Images: 10 images


Testing Images: 24 images
Patch Size: 21 x 21 x 21
Number of Layers: 7
Number of Features: 150

Dice ratio
Methods Ventricle Gray Matter White Matter Hippocampus
Demons 90.2 76.0 85.7 72.2
M+PCA 90.5 76.6 85.5 72.3
M+DP 90.9 76.5 85.8 72.5
HAMMER 91.5 75.5 85.4 75.5
H+PCA 91.7 76.9 86.5 75.6
H+DP 95.0 78.6 88.1 76.8

Department of Radiology and BRIC, UNC-Chapel Hill


Results LONI LPBA 40

Methods Average
Demons 68.9
M+PCA 68.9
M+DP 69.2
HAMMER 70.2
H+PCA 70.6
H+DP 72.7

Department of Radiology and BRIC, UNC-Chapel Hill


Results IXI Dataset

STG: Superior temporal gyrus PCG: Precentral gyrus SFG: Superior frontal gyrus
M&ITG: Middle and inferior temporal gyrus AOG: Anterior orbital gyrus POG: Posterior gyrus
MFG: Middle frontal gyrus IFG: Inferior frontal gyrus SFG: Superior frontal gyrus
LG: Lingual gyrus

Department of Radiology and BRIC, UNC-Chapel Hill


Results
Development of registration algorithms for 7T MR Image

1.5T (1 x 1 x 1 mm3) 7.0T (0.35 x 0.35 x 0.35mm3)

Overlap ratio for


hippocampus
78.4% Our method

Features learned in the first layer


68.5% Demons

Department of Radiology and BRIC, UNC-Chapel Hill


Application 3:
Disease Diagnosis

H.-I. Suk, S.-W. Lee, and D. Shen, "Latent Feature Representation with Stacked Auto-Encoder for AD/
MCI Diagnosis, Brain Structure and Function, 2014.
H.-I. Suk and D. Shen, "Deep Learning-based Feature Representation for AD/MCI Classification,"
MICCAI 2013.

H.-I. Suk, S.-W. Lee, D. Shen, Hierarchical Feature Representation and Multimodal Fusion with Deep
Learning for AD/MCI Diagnosis, NeuroImage, 101:569-582, 2014.

Department of Radiology and BRIC, UNC-Chapel Hill


Alzheimers Disease

n The most common form of


Forecast of prevalence AD in the US

dementia
n 6th
leading cause-of-death in US
n Mild Cognitive Impairment
(MCI): prodromal stage of AD

n Treatments
n Small symptomatic benefits for
mild-to-moderate AD
n Cannot delay or halt the
progression of AD

Department of Radiology and BRIC, UNC-Chapel Hill


Computer-Aided Diagnosis

n Neuroimaging modalities for diagnosis


n Magnetic
Resonance Imaging (MRI), Positron Emission
Tomography (PET), fMRI, etc.

n Simple low-level features [Previous work]


n MRI: gray matter tissue volumes
n PET: mean signal intensities
n Cerebrospinal Fluid (CSF): biomarker measures

Vulnerable to noises and/or artifacts

Department of Radiology and BRIC, UNC-Chapel Hill


Motivation

n Hidden or latent high-level information


n Deep architecture can be efficiently used to discover latent or
hidden representation in self-taught learning
n Overcome the vulnerability to noise/artifacts in the data by
encoding in a hierarchical feature space

n Unsupervised greedy training


n Allows us to benefit from the target-unrelated samples to
discover general latent feature representations
n Leverages for enhancement of the accuracy

Department of Radiology and BRIC, UNC-Chapel Hill


Latent Feature Representation

n Deep or hierarchical architecture to find the highly


non-linear and complex patterns in data [Bengio, 2009]
n StackedAuto-Encoder (SAE): discover latent information
such as relations among features
Complex

Augmented feature vector


Output

Decoding
Hidden

Encoding
Input

Auto-encoder Simple

Stacked auto-encoder
Department of Radiology and BRIC, UNC-Chapel Hill
Proposed Framework
MRI PET CSF Label MMSE
ADAS-Cog
Template

Feature Feature
extraction extraction

Pre-training Fine-tuning
Latent feature
representation
Feature representation
with stacked auto-encoder

Clinical scores
Feature Label prediction
regression
selection
Multi-task learning

MRI PET CSF


Multi-modality kernel kernel kernel
fusion Multi-kernel SVM learning

AD/MCI diagnosis

Department of Radiology and BRIC, UNC-Chapel Hill


Experimental Results

n ADNI dataset: baseline MRI, PET, and CSF data


n 51 AD, 52 HC, 43 MCI-C, 56 MCI-NC
LLF: Low-Level Features, SAEF: SAE Features
1 1
0.9 0.9
0.8 0.8
0.7 LLF 0.7
SAEF
0.6 LLF+SAEF
0.6
0.5 0.5
Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity
AD vs. HC MCI vs. HC

1 1
0.9
0.8
0.8
0.6 0.7

0.4 0.6
0.5
Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity
AD vs. MCI MCIC vs. MCINC

Department of Radiology and BRIC, UNC-Chapel Hill


Multi-Modal Fusion

n Fusing complementary information from multiple


modalities helps enhance diagnostic accuracy
n Previous approaches
n Simple concatenation into a long vector [Kohannim et al.,
2010]
n Kernel methods [Hinrichs et al., 2011; Zhang et al., 2011;
Suk and Shen, 2013]

Independent steps of feature extraction and modality fusion

H.-I. Suk, S.-W. Lee, D. Shen, Hierarchical Feature Representation and Multimodal Fusion with Deep
Learning for AD/MCI Diagnosis, NeuroImage, 101:569-582, 2014.

Department of Radiology and BRIC, UNC-Chapel Hill


Motivation

n High-level feature representation via deep learning


n Successfully applied in medical imaging analysis
[Shin et al., 2013; Liao et al., 2013; Ciresan et al., 2013; Suk and Shen, 2013]
n Deep Boltzmann Machine (DBM) [Salakhutdinov and Hinton, 2012]
n An undirected graphical model
n Structured by stacking multiple restricted Boltzmann machines in a
hierarchical manner

n Inherent relations between modalities of MRI and PET


n Shared feature representation
n Multi-Modal DBM (MM-DBM) [Srivastava and Salakhutdinov, 2012]

Department of Radiology and BRIC, UNC-Chapel Hill


Proposed Framework

Multi-modal Patch Patch-level Image-level


input images extraction feature learning classifier learning
2@[III] 2K@[www] K@FS

v1MRI MRI k=1:K


Patch-level
vPET SVM learning


Spatially distributed
MRI v kMRI m m={MRI,PET}
{v } m={MRI,PET}
{fk }k=1:K
k k=1:K {v km } fk mega-patch


k=1:K
v PET PET
fk R FS construction
k v R
m
k
www
v km R FG


Weighted ensemble


v MRI
K SVM classifier
PET learning
v PET
K Preprocessor Multi-modal DBM

I: image size, w: patch size, K: # of selected patches, m: modality index,


FG: # of hidden units in Gaussian restricted Boltzmann machine,
FS: # of hidden units in the top-layer of multi-modal Deep Boltzmann Machine (DBM)

Department of Radiology and BRIC, UNC-Chapel Hill


Learned Weights
Hidden Layer 1
Hidden Layer 2

MRI

PET

Department of Radiology and BRIC, UNC-Chapel Hill


Experimental Results
Comparison with state-of-the-art methods

Dataset
Methods AD vs. NC (%) MCI vs. NC (%)
(AD/MCI/NC)
MRI+PET+CSF
Kohannim et al., 2010 90.7 75.8
(40/83/43)
MRI+CSF
Walhovd et al., 2010 88.8 79.1
(38/73/42)
MRI+PET
Hinrichs et al., 2011 92.4 n/a
(48/119/66)
MRI+CSF
Westman et al., 2012 91.8 77.6
(96/162/111)
MRI+PET+CSF
Zhang and Shen, 2012 93.3 83.2
(51/99/52)
MRI+PET
Proposed method
(93/204/101)
93.5
85.19

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning for Data Completion
Structure
MRI MRI MRI MRI MRI MRI

Function

Missing Missing Missing


PET PET PET

Normal Alzheimer's disease Mild cognitive


control impairment
Alzheimer's Disease diagnosis based on neuroimaging
data
R. Li, W. Zhang, H. Suk, L. Wang, J.
Data from Alzheimer's Disease Neuroimaging Initiative Li, D. Shen, and S. Ji: Deep Learning
(ADNI) Based Imaging Data Completion for
Improved Brain Disease Diagnosis.
More than 50% of the subjects do not have PET data
MICCAI, 2014.

Department of Radiology and BRIC, UNC-Chapel Hill


Deep Learning for Data Completion

MRI MRI MRI

PET PET PET

Each volume contains 646464 voxels; ~50,000 R. Li, W. Zhang, H. Suk, L. Wang, J.
patches were extracted from each volume, leading to a Li, D. Shen, and S. Ji: Deep Learning
Based Imaging Data Completion for
total of 19.9 million training patches.
Improved Brain Disease Diagnosis.
The network contains 37,761 parameters. MICCAI, 2014.

Department of Radiology and BRIC, UNC-Chapel Hill


Application 4:
Prostate Labeling

S. Liao, Y. Gao, and D. Shen, "Representation Learning: A Unified Deep Learning Framework for
Automatic Prostate MR Segmentation," MICCAI 2013.

Department of Radiology and BRIC, UNC-Chapel Hill


MRI Prostate Segmentation

n Due to good soft tissue contrast of MR prostate


images, there is an increasing interest in
n MR-guided transperineal prostate core biospy
n MR-guided radiotherapy planning
n Quantitative analysis using MR images (e.g., volume
measurement)
n Accurate segmentation of prostate in MR images is a
perquisite for all these steps.

Department of Radiology and BRIC, UNC-Chapel Hill


Challenges

Large inter-subject anatomical appearance variability

Inhomogeneity
Large inter-subject shape variability

Department of Radiology and BRIC, UNC-Chapel Hill


Multi-Atlas Label Propagation
w1(x,y)
How to find a goodwfeature
2(x,y) representation f~~x
for sparse label propagation?
wT(x,y)
O Hand-crafted features (e.g., Haar, HOG)
OAtlas
Data-driven
1 Atlas 2featuresAtlas
(e.g.,
T Deep Learning)
Target Image
PT P
t=1 x) wt (~
y 2Nt (~
~ x, ~y )Lt (~y )
Label Propagation: Lnew (~x) = PT P
t=1 x) wt (~
y 2Nt (~
~ x, ~y )
Sparse Representation:
1 ~
E(~x ) = ||f~x A~~x ||22 + ||~~x ||1 ,
~ ~~x 0
2
Sparse code ~~x is used as weight, f~~x is feature vector of voxel ~x,
A is a dictionary containing f~~y for all ~y 2 Nt (~x), t = 1 T.
Department of Radiology and BRIC, UNC-Chapel Hill
Feature Learning via ISA Network
One Layer ISA (Independent Subspace Analysis)

Simple units correspond to image filters, and pool units group similar image filters together
to increase the robustness of learned features. The goal of ISA is to learn a feature
representation that is: 1) sparse, and 2) diverse. Thus, the objective function is defined as:
v
u !2
X m
N X uXk d
X
u
arg min Rj (xi , W, V ), where W W T = I, where Rj (xi , W, V ) = t Vjl Wlp xpi
W,V
i=1 j=1 l=1 p=1

d, k and m denotes the dimension of each input xi , number of simple units in the first layer,
and the number of pooling units in the second layer, respectively.

Department of Radiology and BRIC, UNC-Chapel Hill


Feature Learning via ISA Network

Hierarchical ISA network by


stacking one-layer ISA

Building hierarchical ISA network by:


1) Divide large patch into small overlapping patches
2) Learn the first layer ISA on small patches
3) Take the learned features of small patches as input to the second
layer of ISA after PCA whitening
4) Learn the second-layer high-level features

Department of Radiology and BRIC, UNC-Chapel Hill


Visualization of Filters
Gabor-like image filters learned by the first-layer ISA

Feature Difference Maps: Comparing the green-cross voxel with other image voxels using
different features. Blue indicate low difference, and red indicate high difference.

Original Haar HOG Low-level ISA Stacked ISA

Department of Radiology and BRIC, UNC-Chapel Hill


Quantitative Comparison
O Comparisons between different multi-atlas based segmentation methods:
(1) Kleins Method, (2) Coupes Method, (3) Liaos Method
O Comparisons between different features:
Haar: Haar features, HOG: Histogram of Oriented Gradients, LBP: Local Binary Patterns
Single ISA: Single Layer ISA, Stacked ISA: Two-Layer stacked ISA.
SL: Sparse Label Propagation
Mean Dice + SD Mean Hausdorff + SD Mean ASD + SD
Method
(in %) (in mm) (in mm)
Kleins Method 83.43.1 10.22.6 2.51.4
Coupes Method 81.72.7 12.42.8 3.61.6
Liaos Method 84.41.6 11.52.2 2.31.5
Haar + SL 84.23.4 9.73.2 2.31.7
HOG + SL 80.52.7 12.24.5 3.82.2
LBP + SL 82.63.2 10.42.2 3.01.9
Haar + HOG + LBP + SL 84.93.6 9.83.3 2.51.8
Single ISA + SL 84.82.5 9.52.4 2.21.8
Stacked ISA + SL 86.72.2 8.22.5 1.91.6

Department of Radiology and BRIC, UNC-Chapel Hill


Thank You!
For more details, please visit:
http://bric.unc.edu/ideagroup
Or google: idea unc

You might also like