DinggangShen - MICCAI Tutorial - Deep Learning (2015!10!05)

Image Display, Enhancement, and Analysis
Deep Learning in
Medical Image Analysis
Dinggang Shen
Department of Radiology and BRIC, UNC-Chapel Hill

Deep Learning

Supervised Learning
For imaging-based diagnosis, manual

labeling is expensive, and hence
ground truth data are limited.

Unsupervised Learning
Methods Limitations
O PCA Linear
Not optimal for non-Gaussian data
O Gaussian Mixture Require knowledge for the number of clusters
Models Challenging when applied to high-dimensional
O k-Means data
O ICA Linear model
O Sparse Coding Shallow model (e.g., single-layer
O Non-Linear Embedding representation)

All these methods involve just
one step of mapping
Mapping is shallow, not deep!
Thus, not able to represent

complex mapping!
Deep Learning Why hot?
n Deep mapping and representation

n Won the 1st place in many
competitions.
n Speech recognition: Android voice
recognition (25% error reduction) -
http://www.wired.com/2013/02/android-neural-network/
n Industrial applications (Google,

IBM, Microsoft, Baidu, Facebook,
Samsung, Yahoo, Intel, Apple,
Nuance, BBN, ...)

The following 7 Slides edited from Dr. Yoshua Bengios tutorial

p p p


n Each level transforms the

data into a representation
which can be easily
modeled Unfloding it
more will map the original
data to a factorized
(uniform-like) distribution.

Performance increase
with layers

Neural Network Why not working
n Issueswith previous neural network

(NN), such as Back-Propagation (BP)
n Gradient-based method Propagate errors
from the last layer to the previous layers
n Last layer represents high nonlinear function, i.e., a
jump function in binary classification unstable and
large gradient in small range, but zero in most places
difficult to propagate errors to previous layers

Neural Network Why not working
n Effect of Initial Conditions in Deep Nets

n Random initialization
n Unsupervised pre-training
O Random initialization
O No two training trajectories
end up in the same place
Huge number of effective local
minima
O Pre-training: Transfer
knowledge from previous
learning (representation and
explanatory factors) cases
with few examples shared
underlying explanatory factors,
between P(X) and P(Y|X) O Unsupervised pre-training

Deep Learning Why working now
n 3 Main Reasons:
1) New layer-wise training algorithm (Science 2006)
n Each time, train on simple task
2) Big data, compared to 20 years ago
3) Powerful computers
n Previousalgorithms may be theoretically working, but
practically not converged to good local minima with the
previous less-powerful computers.

Deep Learning
Restricted
Boltzmann
Machine

Deep Learning Greedy Training



Stacked Auto-Encoder

Stacked Auto-Encoder

Application 1:
Hippocampus Segmentation Using
7T MR Images
O M. Kim, G. Wu, D. Shen, Unsupervised Deep Learning for Hippocampus Segmentation in 7.0
Tesla MR Images, MICCAI Workshop on Machine Learning in Medical Imaging (MLMI), 2013.
O M. Kim, G. Wu, W. Li, L. Wang, Y.-D. Son, Z.-H. Cho, D. Shen, Automatic Hippocampus
Segmentation of 7.0 Tesla MR Images by Combining Multiple Atlases and Auto-Context Models,
Neuroimage 83:335-345, 2013.

Hippocampus Segmentation
n Importance
The volume of the hippocampus is an important
trait for early diagnosis of neurological diseases
(e.g., Alzheimers disease)
n Challenges
Hippocampus
n The hippocampus is small (35157mm3)
n The hippocampus is surrounded by complex structures
n Low imaging resolution (111mm3) of 1.5T or 3T MRI
scanners

Challenges in 7T images
n The characteristics of 7T MR images

n Much richer structural information
n Severe intensity inhomogeneity problem
n Less partial volume effect
1.5T (1 x 1 x 1 mm3) 7.0T (0.35 x 0.35 x 0.35 mm3)
Comparison between 1.5T and 7T MR images

Hand-Crafted Features
n Limited discriminative power

a b c
b c
Haar filters
voxels
in the
image
a patches
Extracting patches from a 7T MR image

Responses of Haar filters for the image
patches in a-c

Deep-Learning Features
Proposed method:
Multi-atlas segmentation
Classification using high-level shape information via auto-
context models (ACM)
Hierarchical feature representation via unsupervised deep
learning
Basis filters generated by Context feature

unsupervised deep learning

Hierarchical Feature Extraction
Stacked two-layer convolutional ISA (Independent

Subspace Analysis)
2nd ISA layer

Activations P
in 2nd layer

Basis filters W
in 2nd layer

Dimension-reduced
activations from 1st layer
Activations
P in 1st layer
PCA
Basis filters W
in 1st layer
Learned basis filters by the 1st ISA
Image
patches X

1st ISA layer

Multi-Atlas Segmentation
image learned
patches 2-layer features Classifier
ISA sequence 1
Atlas space 1
image patches
Aligned
training image learned 2-layer ISA
images patches 2-layer features Classifier classification
in each ISA sequence 2 maps 1N
atlas
Adaptively weighted
space Atlas space 2 fusion
1N

probability map
Level set
image learned
patches 2-layer features Classifier
segmentation
ISA sequence N result
Atlas space N Subject image space
Training Stage Testing Stage

Qualitative Evaluation
Ground Truth Haar + Texture Features Hierarchical Features

Quantitative Evaluation
Overlap metrics
Precision
Recall
Relative overlap
Similarity index
Comparison Results Using 20 Leave-One-Out Cases

P R RO SI
Hand-Crafted Haar + Texture Features 0.843 0.847 0.772 0.865
Hierarchical Patch Representations 0.883 0.881 0.819 0.894

Infant Brain Segmentation
The image cannot be
displayed. Your computer The image cannot be
may not have enough displayed. Your computer
memory to open the image, may not have enough
or the image may have been memory to open the image,
corrupted. Restart your or the image may have been
computer, and then open the corrupted. Restart your
computer, and then open the

T1
appears, you may have to
CSF
delete the image and then appears, you may have to
insert it again. delete the image and then
insert it again.
The image cannot be

displayed. Your computer The image cannot be
may not have enough displayed. Your computer
memory to open the image, may not have enough
or the image may have been memory to open the image,
corrupted. Restart your or the image may have been
T2
computer, and then open the corrupted. Restart your
T2
computer, and then open the
GM
appears, you may have to
delete the image and then appears, you may have to
insert it again. delete the image and then
insert it again.
The image cannot be

displayed. Your computer may
not have enough memory to
open the image, or the image
FA
may have been corrupted.
Restart your computer, and
FA WM

have to delete the image and
then insert it again.
Input modalities consist of T1, T2, and fractional anisotropy (FA) images of infant brains
Each pixel is segmented into cerebrospinal fluid (CSF), gray matter, and white matter
Isointense stage (~6-8 months of age) brains are very difficult to segment
# of trainable parameters is ~5.3 million
W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, D. Shen: Deep Convolutional neural networks for
multi-modality isointense infant brain image segmentation. NeuroImage15

Application 2:
Registration of Brain MR Images
O G. Wu, M. Kim, Q. Wang, B.C. Munsell, D. Shen, Scalable High Performance Image Registration
Framework by Unsupervised Deep Feature Representations Learning, Revised for IEEE TBME, 2015.
O G. Wu, M. Kim, Q. Wang, S. Liao, Y. Gao, D. Shen, "Unsupervised Deep Feature Learning for
Deformable Image Registration of MR Brains," MICCAI 2013.

Image Registration
Determine accurate correspondences between images
Individual Model

Feature Extraction
n Hand-crafted features are NOT reusable

n New learning-based framework for determining
intrinsic feature representations

Learned Features

Deep Learning
Morphological signatures
for image registration
Input image patches

Deep Learning
Difficulty #1: How to determine the number of hidden nodes

and the number of layers?
Solution: Use affinity

propagation to roughly
estimate the number of
hidden units.

Deep Learning
Difficulty #2: How to deal with high

dimensional training data?
Solution: Use the convolutional
RBM in each layer
Nw
h1
Number of hidden nodes
Nv Nv-Nw+1

Deep Learning
Difficulty #3: How to deal with large number of training

samples?
Solution: Select key points across training images to
focus on distinctive image regions.

Symmetric Image Registration
S-HAMMER (Symmetric HAMMER)

ADNI Dataset
Training Images: 10 images

Testing Images: 24 images
Patch Size: 21 x 21 x 21
Number of Layers: 7
Number of Features: 150
Dice ratio
Methods Ventricle Gray Matter White Matter Hippocampus
Demons 90.2 76.0 85.7 72.2
M+PCA 90.5 76.6 85.5 72.3
M+DP 90.9 76.5 85.8 72.5
HAMMER 91.5 75.5 85.4 75.5
H+PCA 91.7 76.9 86.5 75.6
H+DP 95.0 78.6 88.1 76.8

Results LONI LPBA 40
Methods Average
Demons 68.9
M+PCA 68.9
M+DP 69.2
HAMMER 70.2
H+PCA 70.6
H+DP 72.7

Results IXI Dataset
STG: Superior temporal gyrus PCG: Precentral gyrus SFG: Superior frontal gyrus
M&ITG: Middle and inferior temporal gyrus AOG: Anterior orbital gyrus POG: Posterior gyrus
MFG: Middle frontal gyrus IFG: Inferior frontal gyrus SFG: Superior frontal gyrus
LG: Lingual gyrus

Results
Development of registration algorithms for 7T MR Image
1.5T (1 x 1 x 1 mm3) 7.0T (0.35 x 0.35 x 0.35mm3)
Overlap ratio for

hippocampus
78.4% Our method
Features learned in the first layer

68.5% Demons

Application 3:
Disease Diagnosis
H.-I. Suk, S.-W. Lee, and D. Shen, "Latent Feature Representation with Stacked Auto-Encoder for AD/
MCI Diagnosis, Brain Structure and Function, 2014.
H.-I. Suk and D. Shen, "Deep Learning-based Feature Representation for AD/MCI Classification,"
MICCAI 2013.
H.-I. Suk, S.-W. Lee, D. Shen, Hierarchical Feature Representation and Multimodal Fusion with Deep
Learning for AD/MCI Diagnosis, NeuroImage, 101:569-582, 2014.

Alzheimers Disease
n The most common form of

Forecast of prevalence AD in the US

dementia
n 6th
leading cause-of-death in US
n Mild Cognitive Impairment
(MCI): prodromal stage of AD
n Treatments
n Small symptomatic benefits for
mild-to-moderate AD
n Cannot delay or halt the
progression of AD

Computer-Aided Diagnosis
n Neuroimaging modalities for diagnosis

n Magnetic
Resonance Imaging (MRI), Positron Emission
Tomography (PET), fMRI, etc.
n Simple low-level features [Previous work]

n MRI: gray matter tissue volumes
n PET: mean signal intensities
n Cerebrospinal Fluid (CSF): biomarker measures
Vulnerable to noises and/or artifacts

Motivation
n Hidden or latent high-level information

n Deep architecture can be efficiently used to discover latent or
hidden representation in self-taught learning
n Overcome the vulnerability to noise/artifacts in the data by
encoding in a hierarchical feature space
n Unsupervised greedy training

n Allows us to benefit from the target-unrelated samples to
discover general latent feature representations
n Leverages for enhancement of the accuracy

Latent Feature Representation
n Deep or hierarchical architecture to find the highly

non-linear and complex patterns in data [Bengio, 2009]
n StackedAuto-Encoder (SAE): discover latent information
such as relations among features
Complex
Augmented feature vector

Output

Decoding
Hidden
Encoding
Input
Auto-encoder Simple
Stacked auto-encoder
Proposed Framework
MRI PET CSF Label MMSE
ADAS-Cog
Template
Feature Feature
extraction extraction
Pre-training Fine-tuning
Latent feature
representation
Feature representation
with stacked auto-encoder

Clinical scores
Feature Label prediction
regression
selection
Multi-task learning

MRI PET CSF

Multi-modality kernel kernel kernel
fusion Multi-kernel SVM learning

AD/MCI diagnosis

Experimental Results
n ADNI dataset: baseline MRI, PET, and CSF data

n 51 AD, 52 HC, 43 MCI-C, 56 MCI-NC
LLF: Low-Level Features, SAEF: SAE Features
1 1
0.9 0.9
0.8 0.8
0.7 LLF 0.7
SAEF
0.6 LLF+SAEF
0.6
0.5 0.5
Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity
AD vs. HC MCI vs. HC
1 1
0.9
0.8
0.8
0.6 0.7
0.4 0.6
0.5
Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity
AD vs. MCI MCIC vs. MCINC

Multi-Modal Fusion
n Fusing complementary information from multiple

modalities helps enhance diagnostic accuracy
n Previous approaches
n Simple concatenation into a long vector [Kohannim et al.,
2010]
n Kernel methods [Hinrichs et al., 2011; Zhang et al., 2011;
Suk and Shen, 2013]
Independent steps of feature extraction and modality fusion
H.-I. Suk, S.-W. Lee, D. Shen, Hierarchical Feature Representation and Multimodal Fusion with Deep
Learning for AD/MCI Diagnosis, NeuroImage, 101:569-582, 2014.

Motivation
n High-level feature representation via deep learning

n Successfully applied in medical imaging analysis
[Shin et al., 2013; Liao et al., 2013; Ciresan et al., 2013; Suk and Shen, 2013]
n Deep Boltzmann Machine (DBM) [Salakhutdinov and Hinton, 2012]
n An undirected graphical model
n Structured by stacking multiple restricted Boltzmann machines in a
hierarchical manner
n Inherent relations between modalities of MRI and PET

n Shared feature representation
n Multi-Modal DBM (MM-DBM) [Srivastava and Salakhutdinov, 2012]

Proposed Framework
Multi-modal Patch Patch-level Image-level

input images extraction feature learning classifier learning
2@[III] 2K@[www] K@FS
v1MRI MRI k=1:K

Patch-level
vPET SVM learning

Spatially distributed
MRI v kMRI m m={MRI,PET}
{v } m={MRI,PET}
{fk }k=1:K
k k=1:K {v km } fk mega-patch

k=1:K
v PET PET
fk R FS construction
k v R
m
k
www
v km R FG

Weighted ensemble

v MRI
K SVM classifier
PET learning
v PET
K Preprocessor Multi-modal DBM
I: image size, w: patch size, K: # of selected patches, m: modality index,

FG: # of hidden units in Gaussian restricted Boltzmann machine,
FS: # of hidden units in the top-layer of multi-modal Deep Boltzmann Machine (DBM)

Learned Weights
Hidden Layer 1
Hidden Layer 2

MRI
PET


Experimental Results
Comparison with state-of-the-art methods

Dataset
Methods AD vs. NC (%) MCI vs. NC (%)
(AD/MCI/NC)
MRI+PET+CSF
Kohannim et al., 2010 90.7 75.8
(40/83/43)
MRI+CSF
Walhovd et al., 2010 88.8 79.1
(38/73/42)
MRI+PET
Hinrichs et al., 2011 92.4 n/a
(48/119/66)
MRI+CSF
Westman et al., 2012 91.8 77.6
(96/162/111)
MRI+PET+CSF
Zhang and Shen, 2012 93.3 83.2
(51/99/52)
MRI+PET
Proposed method
(93/204/101)
93.5
85.19


Deep Learning for Data Completion
Structure
MRI MRI MRI MRI MRI MRI
Function
Missing Missing Missing

PET PET PET
Normal Alzheimer's disease Mild cognitive

control impairment
Alzheimer's Disease diagnosis based on neuroimaging
data
R. Li, W. Zhang, H. Suk, L. Wang, J.
Data from Alzheimer's Disease Neuroimaging Initiative Li, D. Shen, and S. Ji: Deep Learning
(ADNI) Based Imaging Data Completion for
Improved Brain Disease Diagnosis.
More than 50% of the subjects do not have PET data
MICCAI, 2014.

Deep Learning for Data Completion
MRI MRI MRI
PET PET PET
Each volume contains 646464 voxels; ~50,000 R. Li, W. Zhang, H. Suk, L. Wang, J.
patches were extracted from each volume, leading to a Li, D. Shen, and S. Ji: Deep Learning
Based Imaging Data Completion for
total of 19.9 million training patches.
Improved Brain Disease Diagnosis.
The network contains 37,761 parameters. MICCAI, 2014.

Application 4:
Prostate Labeling
S. Liao, Y. Gao, and D. Shen, "Representation Learning: A Unified Deep Learning Framework for
Automatic Prostate MR Segmentation," MICCAI 2013.

MRI Prostate Segmentation
n Due to good soft tissue contrast of MR prostate

images, there is an increasing interest in
n MR-guided transperineal prostate core biospy
n MR-guided radiotherapy planning
n Quantitative analysis using MR images (e.g., volume
measurement)
n Accurate segmentation of prostate in MR images is a
perquisite for all these steps.

Challenges
Large inter-subject anatomical appearance variability
Inhomogeneity
Large inter-subject shape variability

Multi-Atlas Label Propagation
w1(x,y)
How to find a goodwfeature
2(x,y) representation f~~x
for sparse label propagation?
wT(x,y)
O Hand-crafted features (e.g., Haar, HOG)
OAtlas
Data-driven
1 Atlas 2featuresAtlas
(e.g.,
T Deep Learning)
Target Image
PT P
t=1 x) wt (~
y 2Nt (~
~ x, ~y )Lt (~y )
Label Propagation: Lnew (~x) = PT P
t=1 x) wt (~
y 2Nt (~
~ x, ~y )
Sparse Representation:
1 ~
E(~x ) = ||f~x A~~x ||22 + ||~~x ||1 ,
~ ~~x 0
2
Sparse code ~~x is used as weight, f~~x is feature vector of voxel ~x,
A is a dictionary containing f~~y for all ~y 2 Nt (~x), t = 1 T.
Feature Learning via ISA Network
One Layer ISA (Independent Subspace Analysis)
Simple units correspond to image filters, and pool units group similar image filters together
to increase the robustness of learned features. The goal of ISA is to learn a feature
representation that is: 1) sparse, and 2) diverse. Thus, the objective function is defined as:
v
u !2
X m
N X uXk d
X
u
arg min Rj (xi , W, V ), where W W T = I, where Rj (xi , W, V ) = t Vjl Wlp xpi
W,V
i=1 j=1 l=1 p=1
d, k and m denotes the dimension of each input xi , number of simple units in the first layer,
and the number of pooling units in the second layer, respectively.

Feature Learning via ISA Network
Hierarchical ISA network by

stacking one-layer ISA
Building hierarchical ISA network by:

1) Divide large patch into small overlapping patches
2) Learn the first layer ISA on small patches
3) Take the learned features of small patches as input to the second
layer of ISA after PCA whitening
4) Learn the second-layer high-level features

Visualization of Filters
Gabor-like image filters learned by the first-layer ISA
Feature Difference Maps: Comparing the green-cross voxel with other image voxels using
different features. Blue indicate low difference, and red indicate high difference.
Original Haar HOG Low-level ISA Stacked ISA

Quantitative Comparison
O Comparisons between different multi-atlas based segmentation methods:
(1) Kleins Method, (2) Coupes Method, (3) Liaos Method
O Comparisons between different features:
Haar: Haar features, HOG: Histogram of Oriented Gradients, LBP: Local Binary Patterns
Single ISA: Single Layer ISA, Stacked ISA: Two-Layer stacked ISA.
SL: Sparse Label Propagation
Mean Dice + SD Mean Hausdorff + SD Mean ASD + SD
Method
(in %) (in mm) (in mm)
Kleins Method 83.43.1 10.22.6 2.51.4
Coupes Method 81.72.7 12.42.8 3.61.6
Liaos Method 84.41.6 11.52.2 2.31.5
Haar + SL 84.23.4 9.73.2 2.31.7
HOG + SL 80.52.7 12.24.5 3.82.2
LBP + SL 82.63.2 10.42.2 3.01.9
Haar + HOG + LBP + SL 84.93.6 9.83.3 2.51.8
Single ISA + SL 84.82.5 9.52.4 2.21.8
Stacked ISA + SL 86.72.2 8.22.5 1.91.6

Thank You!
For more details, please visit:
http://bric.unc.edu/ideagroup
Or google: idea unc

DinggangShen - MICCAI Tutorial - Deep Learning (2015!10!05)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DinggangShen - MICCAI Tutorial - Deep Learning (2015!10!05)

Uploaded by

Copyright:

Available Formats

Image Display, Enhancement, and Analysis

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

For imaging-based diagnosis, manual

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Mapping is shallow, not deep!

Thus, not able to represent

n Deep mapping and representation

n Industrial applications (Google,

Department of Radiology and BRIC, UNC-Chapel Hill

The following 7 Slides edited from Dr. Yoshua Bengios tutorial

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

n Each level transforms the

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

n Issueswith previous neural network

Department of Radiology and BRIC, UNC-Chapel Hill

n Effect of Initial Conditions in Deep Nets

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

n The characteristics of 7T MR images

1.5T (1 x 1 x 1 mm3) 7.0T (0.35 x 0.35 x 0.35 mm3)

Comparison between 1.5T and 7T MR images

Department of Radiology and BRIC, UNC-Chapel Hill

n Limited discriminative power

Extracting patches from a 7T MR image

Department of Radiology and BRIC, UNC-Chapel Hill

Basis filters generated by Context feature

Department of Radiology and BRIC, UNC-Chapel Hill

Stacked two-layer convolutional ISA (Independent

2nd ISA layer

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Ground Truth Haar + Texture Features Hierarchical Features

Department of Radiology and BRIC, UNC-Chapel Hill

Comparison Results Using 20 Leave-One-Out Cases

Hand-Crafted Haar + Texture Features 0.843 0.847 0.772 0.865

Hierarchical Patch Representations 0.883 0.881 0.819 0.894

Department of Radiology and BRIC, UNC-Chapel Hill

The image cannot be

The image cannot be

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Determine accurate correspondences between images

Department of Radiology and BRIC, UNC-Chapel Hill

n Hand-crafted features are NOT reusable

Department of Radiology and BRIC, UNC-Chapel Hill

Department of Radiology and BRIC, UNC-Chapel Hill

Input image patches

Department of Radiology and BRIC, UNC-Chapel Hill

Difficulty #1: How to determine the number of hidden nodes

Solution: Use affinity

Department of Radiology and BRIC, UNC-Chapel Hill