Professional Documents
Culture Documents
Analysis
Chapter 12
Outline
• Flexible Discriminant Analysis(FDA)
• Penalized Discriminant Analysis
• Mixture Discriminant Analysis (MDA)
1
Linear Discriminant Analysis
• According to the Bayes optimal classification
mentioned in chapter 2, the posteriors is needed.
post probability : PrG | X
assume:
f k x ——condition-density of X in class G=k.
k ——prior probability of class k, with k 1 k 1
K
2
2018/10/25 Linear Classifiers
Linear Discriminant Analysis
• Multivariate Gaussian density:
f k x
1 12 x k T k1 x k
e
2 p/2
k
1/ 2
4
Virtues and Failings of LDA
• LDA may fail in number of situations
– Often linear boundaries fail to separate classes
– With large N, may estimate quadratic decision boundary
– May want to model even more irregular (non-linear)
boundaries
• Single prototype per class may not be insufficient
5
Generalization of LDA
• Flexible Discriminant Analysis (FDA)
– LDA in enlarged space of predictors via basis expansions
6
Flexible Discriminant Analysis
• Suppose : G R1 is a function that assigns
scores to the classes, such that the transformed
class labels are optimally predicted by linear
regression on X.
• Training data ( xi , gi ), i 1,2,....N
• Objective Function
min
,
( ( g ) x
i
i
T
i ) 2
7
Flexible Discriminant Analysis
• More generally, we define L independent scores for
class labelling 1 ,2 , , L and corresponding
linear maps l ( X ) X T l , l 1, 2, , L
• Objective Function
L
1
ASR
N
l 1
l i i l
(
i 1
( g ) x T
) 2
8
Flexible Discriminant Analysis
9
Flexible Discriminant Analysis
Mahalanobis distance
of a test point x to k-th • We can replace linear
class centroid
regression fits l ( x) x l by
T
10
Computation of FDA
1. Multivariate nonparametric regression
2. Optimal scores
3. Update the model from step 1 using the optimal
scores
11
Computing the FDA Estimates
• For class gi, we define a N×K index response matrix Y,
such that if gi = k,then yik = 1, yjk = 0。
• Procedure:
1. Multivariate nonparametric regression. Fit a multiresponse, adaptive
nonparametric regression of Y on X, giving fitted values Ŷ . Let S be the
linear operator that fits the final chosen model, and ( x ) be the vector of
13
LDA Vs. FDA/BRUTO
14
Penalized Discriminant Analysis
PDA is a regularized discriminant analysis on
enlarged set of predictors via a basis expansion
1 L
N
ASR({l , l }l 1 ) (l ( gi ) h ( xi ) l ) l l
L T 2 T
N l 1 i 1
•
D( x, ) (h( x ) h( )) ( W ) ( h( x ) h( ))
T 1
16
Penalized Discriminant Analysis
max T
Bet
subject to ( W ) 1
T
17
USPS Digit Recognition
18
Digit Recognition-LDA vs. PDA
19
PDA Canonical Variates
20
Mixture Discriminant Analysis
• The class conditional densities modeled as mixture
of Gaussians
– Possibly different # of components in each class
– Estimate the centroids and mixing proportions in each
subclass by max joint likelihood P(G, X)
– EM algorithm for MLE
21
Mixture Discriminant Analysis
• A Gaussian mixture model for the k-th class
p( X | G k ) kj ( X ;kj , )
j 1
• The Posterior
k kj ( X ;kj , )
j 1
p (G k | X x )
( X ; , )
l
l
j 1
lj lj
22
Mixture Discriminant Analysis
• Maximum likelihood
K
log k kj ( xi ;kj , )
k 1 g k j 1
23
FDA and MDA
24
Waveform Signal with Additive Gaussian
Noise
h1(j) = max(6-|j-11|,0)
h2(j) = h1(j-4)
h3(j) = h1(j+4)
25
Wave From Data Results
26
The End
27