Professional Documents
Culture Documents
Toolkits of PCA
Agenda
Consider a 2-dimension space
Agenda
Principal component analysis (PCA)
involves a mathematical procedure that
transforms a number of possibly correlated
variables into a smaller number of
uncorrelated variables called “principal
components”.
For example :
Agenda
Consider a D-dimension space
◦ Given N point : {x1, x2, …, xn}
◦ xi is a D-dim vector
How to
◦ 1. 找一個點使得 squared-error 最小
◦ 2. 找一條線使得 squared-error 最小
How to ? - Point
∴ x0 =
◦ 1. 找一個點使得 squared-error 最小
◦ 2. 找一條線使得 squared-error 最小
L : xk’- x0 = ake
xk’= x0 + ake
= m + ake
Find a1…an
How to ? – Line
每個部份微分後 [2ak – 2aket(xk-m)]
How to ? – Line
Then, how about e ?
How to ? – Line
Independent of e
Let
How to ? – Line
J’1(e)= -etSe
Use lagrange multiplier :
f(x,y) ->
How to ? – Line
◦ What is S ?
Covariance Matrix ( 共變異數矩陣 )
◦ Assume D-dim
How to ? – Line
, we know S.
Then, what is e ? Eigenvectors of S.
How to ? – Line
Summary :
◦ Find a line : xk’= m + ake
ak = et(xk-m)
Se = λe ; e = eigenvectors of covariance matrix.
◦ D-dim space can find D eigenvectors.
How to ? – conclusion
Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo
Agenda
Dimensionality
Reduction
Consider a 2-dim space …
X1 = (a,b)
X2 = (c,d)
X1 = (a’,b’)
X2 = (c’,d’)
We are going to do …
X1 = (a’)
X2 = (c’)
Dimensionality Reduction
We want to proof :
◦ Axes of the data are independent.
Dimensionality Reduction
E = [e1 e2 … em]
= ED
S = EDE-1
Dimensionality Reduction
We want to know new Covariance Matrix of
projected vectors.
Y = ETX
SY
Dimensionality Reduction
SY =D
Dimensionality Reduction
Conclusion :
If we want to reduce
dimension D to M
(M<<D)
1. Find S
2. ->eigenvalues
3. Select Top M
4. Project data
Dimensionality Reduction
Theory :
◦ 1. Scenario
◦ 2. What is PCA?
◦ 3. How to minimize Squared-Error ?
◦ 4. Dimensionality Reduction
Toolkit :
◦ A list of PCA toolkits
◦ Demo
Agenda
Toolkits
C & Java
◦ Fionn Murtagh's Multivariate Data Analysis Software and Resources
◦ http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
Perl
◦ PDL::PCA
Matlab
◦ Statistics Toolbox™ : princomp
Weka
◦ weka.attributeSelection.PrincipalComponents
(http://www.laps.ufpa.br/aldebaro/weka/feature_selection.html
)
C:
Download: pca.c
Compile: cc pca.c -lm -o pcac
Run: ./pcac spectr.dat 36 8 R > pcaout.c.txt
Java :
Download: JAMA, PCAcorr.java
Compile: javac –classpath Jama-1.0.2.jar PCAcorr.java
Run: java PCAcorr iris.dat > pcaout.java.txt