You are on page 1of 8

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol.

2(6), 2010, 2288-2295

ENHANCING FACE RECOGNITION USING IMPROVED DIMENSIONALITY REDUCTION AND FEATURE EXTRACTION ALGORITHMS AN EVALUATION WITH ORL DATABASE
S.SAKTHIVEL
Assistant Professor, Information Technology, Sona College of Technology, Salem, Tamilnadu - 636005, India sakthits@rediffmail.com Dr.R.LAKSHMIPATHI Professor, Electrical and Electronic Engineering, St.Peters Engineering College, Chennai, Tamilnadu -600054, India drrlakshmipathi@yahoo.com ABSTRACT Face Recognition based on its attributes is an easy task for a researcher to perform; it is nearly automated, and requires little mental effort. A researcher can recognize faces even when the matching image is distorted, such as a person wearing glasses, and can perform the task fairly easy. A computer, on the other hand, has no innate ability to recognize a face or a facial feature, and must be programmed with an algorithm to do so. In this work, different dimensionality reduction techniques such as principal component analysis (PCA), Kernel Principal component analysis (kernel PCA), Linear discriminant analysis (LDA), Locality Preserving Projections (LPP), and Neighborhood Preserving embedding (NPE) are selected and applied in order to reduce the loss of classification performance due to changes in facial appearance. Experiments are designed specifically to investigate the gain in robustness against illumination and facial expression changes. The underlying idea in the use of the dimensionality reduction techniques is firstly, to obtain significant feature vectors of the face, and search for those components that are less sensitive to intrinsic deformations due to expression or due to extrinsic factors, like illumination. For training and testing Support Vector Machine (SVM) is selected as the classifying function. One distinctive advantage of this type of classifier over traditional neural networks is that SVMs can achieve better generalization performance. The proposed algorithms are tested on face images that differ in expression or illumination separately, obtained from face image databases, ORL. More significant and comparative results are found out. Keywords: Dimensionality Reduction, PCA, Kernel PCA, LDA, LPP, NPE, Support Vector Machine . 1. INTRODUCTION Face Recognition based on its attributes is an easy task for a human to perform; it is nearly automatic, and requires little mental effort. Humans can recognize face even when the matching image is distorted, such as a person wearing glasses, and humans can perform the task fairly easy. A computer, on the other hand, has no innate ability to recognize a face or facial features, and must be programmed with an algorithm to do so [10][2][15]. Understanding how humans decipher and do matching is an important research topic for medical and neural scientists. In this paper, we compared some of the dimensionality reduction or feature extraction algorithms and their suitability towards better face recognition systems. Face recognition is an important part of today's emerging biometrics and video surveillance markets. Face Recognition can benefit the areas of: Law Enforcement, Airport Security, Access Control, Driver's Licenses, Passports, Homeland Defense, Customs & Immigration, and Scene Analysis etc. Face recognition has been a research area for almost 30 years, with significantly increased research activity since 1990[16]. This has resulted in the development of successful algorithms and the introduction of commercial products. But, the researches and achievements on face recognition are in its initial stages of development. Although face recognition is still in the research and development phase, several commercial systems are currently available and research

ISSN: 0975-5462

2288

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295 organizations are working on the development of more accurate and reliable systems. Using the present technology, it is impossible to completely model human recognition system and reach its performance and accuracy. However, the human brain has its shortcomings in some aspects. The benefit of a computer system would be its capacity to handle large amount of data and ability to do a job in a predefined repeated manner. The observations and findings about human face recognition system will be a good starting point for automatic face attribute analysis. Recently there has been a lot of interest in geometrically motivated approaches to data analysis in high dimensional spaces. We consider the case where data is drawn from sampling a probability distribution that has support on or near a sub manifold of Euclidean space. Suppose we have a collection of data points of n-dimensional real vectors drawn from an unknown probability distribution. In increasingly many cases of interest in machine learning and data mining, one is confronted with the situation where dimensions are very large. However, there might be reason to suspect that the "intrinsic dimensionality" of the data is much lower. This leads one to consider methods of dimensionality reduction that allow one to represent the data in a lower dimensional space [1]. A great number of dimensionality reduction techniques exist in the literature. In practical situations, one is often forced to use linear or even sub linear techniques. Consequently, projective maps have been the subject of considerable investigation. Three classical yet popular forms of linear techniques are the methods of principal component analysis (PCA) [3][13], multidimensional scaling (MDS) [24][25], and linear discriminant analysis (LDA). Each of these is an eigenvector method designed to model linear variabilitys in high-dimensional data. 1.1. Early works on Face Recognition Sir Francis Galton, an English scientist, described how French prisoners were identified using four primary measures (head length, head breadth, foot length and middle digit length of the foot and hand respectively) [27]. Each measure could take one of the three possible values (large, medium, or small), giving a total of 81 possible primary classes. He felt it would be advantageous to have an automatic method of classification. For this purpose, he devised an apparatus, which he called a mechanical selector that could be used to compare measurements of face profiles. Galton reported that most of the measures he had tried were fairy efficient. Early face recognition methods were mostly feature based. Galton's proposed method, and a lot of work to follow, focused on detecting important facial features as eye corners, mouth corners, nose tip, etc. By measuring the relative distances between features, a feature vector can be constructed to describe each face. By comparing the feature vector of an unknown face to the feature vectors of known faces from a database of known faces, the closest match can be determined. One of the earliest works is reported by Bledsoe . In this system, a human operator located the feature points on the face and entered their positions into the computer. Given a set of feature point distances of an unknown person, nearest neighbor or other classification rules were used for identifying the test face. Since feature extraction is manually done, this system could accommodate wide variations in head rotation, tilt, image quality, and contrast. In Kanade's work, series fiducial points are detected using relatively simple image processing tools such as edge maps; signatures etc and the Euclidean distances are then used as a feature vector to perform recognition [7]. The face feature points are located in two stages. The coarse-grain stage simplifies the succeeding differential operation and feature finding algorithms. Once the eyes, nose and mouth are approximately located, more accurate information is extracted by confining the processing to four smaller groups, scanning at higher resolution, and using 'best beam intensity' for the region. The four regions are the left and right eye, nose, and mouth. The beam intensity is based on the local area histogram obtained in the coarse-gain stage. A set of 16 facial parameters functions are distances, areas, and angles to compensate for the varying size of the pictures, is extracted. To eliminate scale and dimension differences, the components of the resulting vector are normalized. A simple distance measure is used to check similarity between two face images. The eigenfaces method presented by Turk and Pent land finds the principal components (Karhunen-Loeve expansion) of the face image distribution or the eigenvectors of the covariance matrix of the set of face images[16]. These eigenvectors can be thought as a set of features, which together characterize the variation between face images. Pent land et al. discussed the use of facial features for face recognition. This can be either a modular or a layered representation of the face, where a coarse (low-resolution) description of the whole head is augmented by additional (high-resolution) details in terms of salient facial features. The eigenface technique was extended to detect facial features. For each of the facial features, a feature space is built by selecting the most significant eigenfeatures obtained from eigenvectors corresponding to the largest eigenvalues of the features correlation matrix. Before the publication of pent land et al, much of the work on automated face recognition has ignored the issue of what aspects of the face stimulus are important for identification, assuming that predefined measurements were relevant and sufficient. In early 1990s, M. Turk and A. Pent land have realized that an information theory approach of coding and decoding face images may give insight into the information content of face images, emphasizing the significant local and global "features". Such

ISSN: 0975-5462

2289

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295 features may or may not be directly related to our intuitive notion of face features such as the eyes, nose, lips, and hair. 2. MATERIALS and METHODS In statistics, dimension reduction is the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Feature selection approaches try to find a subset of the original variables also called features or attributes. Two strategies are filter (e.g. information gain) and wrapper (e.g. genetic algorithm) approaches. It occurs sometimes that data analysis such as regression or classification can be done in the reduced space more accurately than in the original space. Feature extraction is applying a mapping of the multidimensional space into a space of fewer dimensions. This means that the original feature space is transformed by applying a linear transformation. 2.1. Principal Component Analysis (PCA) Principal Components Analysis (PCA) [13] constructs a low-dimensional representation of the data that describes as much of the variance in the data as possible. This is done by finding a linear basis of reduced dimensionality for the data, in which the amount of variance in the data is maximal. In mathematical terms, PCA attempts to find a linear transformation that maximizes such that , where covX X is the covariance matrix of the zero mean data X. It can be shown that this linear mapping is formed by the diagonal principal eigenvectors (i.e., principal components) of the covariance matrix of the zero-mean data. Hence, PCA solves the eigenproblem (1) COV V= V
X X

The eigenproblem is solved for the d principal eigenvalues . The corresponding eigenvectors form the columns of the linear transformation matrix T. The low-dimensional data representations yi of the datapoints xi are computed by mapping them linear onto the basis T, i.e., Y ( X X )T Given an s-dimensional vector representation of each face in a training set of images, Principal Component Analysis tends to find a tdimensional subspace whose basis vectors correspond to the maximum variance direction in the original image space. This new subspace is normally lower dimensional (t<<s). If the image elements are considered as random variables, the PCA basis vectors are defined as eigenvectors of the scatter matrix.PCA has been successfully applied in a large number of domains such as face recognition [16], coin classification [14], and seismic series analysis [17]. The main drawback of PCA is that the size of the covariance matrix is proportional to the dimensionality of the datapoints. As a result, the computation of the eigenvectors might be infeasible for very high-dimensional data (under the assumption that n > D). Approximation methods, such as Simple PCA [18], deal with this problem by applying an iterative Hebbian approach in order to estimate the principal eigenvectors of the covariance matrix. Alternatively, PCA can also be rewritten in a probabilistic framework, allowing for performing PCA by means of an EM algorithm [19]. The reader should note that probabilistic PCA is closely related to factor analysis [20]. 2.2. Kernel principal component analysis (Kernel PCA) Kernel PCA (KPCA) is the reformulation of traditional linear PCA in a high-dimensional space that is constructed using a kernel function [22]. In recent years, the reformulation of linear techniques using the kernel trick has led to the proposal of successful techniques such as kernel ridge regression and Support Vector Machines [23]. Kernel PCA computes the principal eigenvectors of the kernel matrix, rather than those of the covariance matrix. The reformulation of traditional PCA in kernel space is straightforward, since a kernel matrix is similar to the in product of the data points in the high-dimensional space that is constructed using the kernel function. The application of PCA in kernel space provides Kernel PCA the property of constructing nonlinear mappings. Kernel PCA computes the kernel matrix K of the data points Xi. The entries in the kernel matrix are defined by Kij = K (Xi , Xj) (2) Where K is a kernel function [23]. Subsequently, the kernel matrix K is centered using the following modification of the entries.

ISSN: 0975-5462

2290

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295 2.3. Linear Discriminant Analysis (LDA) Linear Discriminant Analysis finds the vectors in the underlying space that best discriminates among classes. For all samples of all classes of between-class scatter matrix SB and the within-class scatter matrix SW are defined [3]. The goal is to maximize SB while minimizing SW, in other words, maximize the ratio det|SB|/det|SW|. This ratio is maximized when the column vectors of the projection matrix are the eigenvectors of (SW^-1 SB). 2.4. Locality Preserving Projections (LPP) In contrast to traditional linear techniques such as PCA and local nonlinear techniques for dimensionality reduction are capable of the successfully identifying the complex data manifolds such as the Swiss roll. This capability is due to the fact that cost functions are minimized by local nonlinear dimensionality reduction techniques, which aim at preserving local properties of the data manifold. However, in many learning settings, the use of a linear technique for dimensionality reduction is desired, e.g., when an accurate and fast out-ofsample extension is necessary, when data has to be transformed back into its original space, or when one wants to visualize the transformation that is constructed by the dimensionality reduction technique. Linearity Preserving Projection (LPP) is a technique that aims at combining the benefits of linear techniques and local nonlinear techniques for dimensionality reduction by finding a linear mapping that minimizes the cost function of Laplacian Eigenmaps [15]. Similar to Laplacian Eigenmaps, LPP starts with the construction of a nearest neighbor graph in which every data point xi is connected to its k nearest neighbors xij . The weights of the edges in the graph are computed. Subsequently, LPP solves the generalized eigenproblem.

( X X ) L( X X )V ( X X ) M ( X X )V

(3)

In which L is the graph Laplacian, and M is the degree matrix of the graph. It can be shown that the eigenvectors vi corresponding to the d smallest nonzero eigenvalues form the columns of the linear mapping T that minimizes the Laplacian Eigenmap cost function. The low-dimensional data representation Y is thus given by Y = ( T 2.5. Neighborhood Preserving Embedding (NPE) Similar to LPP, Neighborhood Preserving Embedding (NPE) [2] minimizes the cost function of a local nonlinear technique for dimensionality reduction under the constraint that the mapping from the highdimensional to the low-dimensional data representation is linear. NPE is the linear approximation to LLE. NPE [3] defines a neighborhood graph on the dataset X, and subsequently computes the reconstruction weights Wi as in LLE. The cost function of LLE is optimized by solving the following generalized eigenproblem for the d smallest nonzero eigenvalues

( X X ) ( I W )T ( I W )( X X )V ( X X ) ( X X )V

(4)

Where I represents the n n identity matrix. The low- dimensional data representation is computed by mapping X onto the obtained mapping T, i.e., by computing 2.6. Neural Networks and Learning paradigms In principle, the popular neural network can be trained to recognize face images directly. However, a simple network can be very complex and difficult to train [7]. There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. Usually any given type of network architecture can be employed in any of those tasks. 2.7. Learning algorithms Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms

ISSN: 0975-5462

2291

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295 used in training artificial neural networks are employing some form of gradient descent. This is done by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction. Evolutionary methods simulated annealing, and Expectation-maximization and non-parametric methods are among other commonly used methods for training neural networks. 2.8. Support Vector Machines (SVMs) Support vector machines [4] are a set of related supervised learning methods used for classification and regression. Viewing input data as two sets of vectors in an n-dimensional space, an SVM will construct a separating hyperplane in that space, one which maximizes the margin between the two data sets. To calculate the margin, two parallel hyperplanes are constructed, one on each side of the separating hyperplane, which are "pushed up against" the two data sets. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the neighboring data points of both classes, since in general the larger the margin, the better is the generalization error of the classifier. Support vector machine [5] is a pattern classification algorithm developed by Vapnik. It is a binary classification method that finds the optimal linear decision surface based on the concept of structural risk minimization. As shown by Vapnik, this maximal margin decision boundary can achieve optimal worst-case generalization performance. Note that SVMs are originally designed to solve problems where data can be separated by a linear decision boundary. By using kernel functions, SVMs can be used effectively to deal with problems that are not linearly separable in the original space. Some of the commonly used kernels include Gaussian Radial Basis Functions (RBFs), polynomial functions, and sigmoid polynomials whose decision surfaces are known to have good approximation properties. Relying on the fact that the training data set is not linearly separable, a Gaussian Radial Basis Function (RBF) kernel is selected in this paper. The RBF kernel performs usually better for the reason that it has better boundary response as it allows for extrapolation. 3. THE EVALUATION MODEL Given a set of feature vectors belonging to n classes, a Support Vector Machine (SVM) finds the hyperplane that separates the largest possible fraction of features of the same classes on the corresponding space, while maximizing the distance from either class to the hyperplane. Generally a suitable transformation is first used to extract features of face images and then discrimination functions between each class of images are learned by SVMs. In the following model, the dimensionality reduction section of the system will be replaced with different algorithms to evaluate its performance against a standard face data set. 3.1. Steps Involved in Training 1. Load a set of n ORL Face Images For Training 2. Resize the Images in to (48x48) pixel size to reduce the memory requirements of the overall application 3. Reshape the Images in to (1x 2304) and prepare an n x 2304 Feature Matrix representing the training data set. 4. Apply a Feature Extraction/ Dimensionality Reduction technique and find the Eigen Vector Matrix 5. Reduce the Dimension of input images by projecting them using the Eigen Vector Matrix. 6. The result will be a (n x d) matrix for training whered is the number of dimensions used for recognition task 7. Create an SVM network with d inputs and one output. 8. Train a SVM using the Reduced Dimension Feature Vectors 3.2. Steps Involved in Training 1. The first three steps of above procedures will be repeated with test image set of ORL database to obtain a feature matrix representing the testing data set. 2. Project the matrix using Previous Eigen Vector Matrix and Reduce the Dimension of Testing image 3. Classify the Reduced Dimension Feature Vectors using the previously trained SVM network. 4. Calculate Accuracy of Classification.

ISSN: 0975-5462

2292

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295 4. RESULTS and DISCUSSIONS Obviously, in order to test the system some faces are required. There are so many standard face databases for testing and rating a face detection algorithm. A standard database of face imagery is essential to supply standard imagery to the algorithm developers and to supply a sufficient number of images to allow testing of these algorithms. Without such databases and standards, there will be no way to accurately evaluate or compare facial recognition algorithms. All the experiments described here have been executed mainly on the faces provided by the ORL face database. The performance of the classification algorithms under evaluation are tested with the "ORL Face Database". The ORL Database of Faces contains a set of face images used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department. There are ten different images of each of 40 distinct subjects. For some subjects, the images are taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images are taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). We have evaluated the accuracy of recognition after dimensionality reduction by five different techniques. The following table shows the overall results of these dimensionality reduction techniques with different number of input face images.
Table 1 : Accuracy of Recognition

No. of Faces used Accuracy of Recognition (%) for Training and PCA KPCA LDA Testing 10 20 30 40 Average 90 80 83 80 83.25 100 93 89 85 91.75 90 80 83 80 83.25

LPP 100 80 83 75 84.5

NPE 100 80 77 70 81.75

From the Table 1 and the following graphs given in figure 4 and 5 it is clear that the performance of recognition while using Kernel PCA for dimensionality reduction, is better than others since there are only minimum number of face images need to be used in these tests. The performance of recognition while using PCA as well as LDA for dimensionality reduction seems to be equal in terms of accuracy. But it is observed that, LDA requires very long time for processing more number of multiple face images even for small databases. In case of LPP and NPE methods, the recognition rate is very less if there are increasing number of face images as compared to that of PCA and KPCA methods

ISSN: 0975-5462

2293

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295

Average Recognition performance


120 100 p ercen tag e 80 60 40 20 0 PCA KPCA LDA LPP NPE Dimensionality Reduction Techniques 83.25 91.75 83.25 84.5 81.75

Figure 1 The Bar Chart showing Accuracy of Recognition

5. CONCLUSIONS It can be concluded that the natural expressions such as smiling, blinking, talking etc do not cause severe performance reduction in face recognition when suitable dimensionality reduction or feature extraction techniques are used. The attained correct recognition rates in the case of PCA and KPCA are relatively high. Therefore, these dimensionality reduction methods provide significant improvements in performance. On the other hand, it is observed that recognition of faces, subject to illumination changes is a more sensitive task. Utilizing the proposed methods provide considerable improvement in the case of illumination variations. Finally it is concluded that the two methods PCA and Kernel PCA are the best performers. Hence for the face recognition system these two methods are better concerned than other three techniques. Strong and coordinated effort between the computer visions, signal processing and psychophysics and neurosciences scientific (or) research communities are needed. Face Recognition can benefit the areas of: Law Enforcement, Airport Security, Access Control, Driver's Licenses, Passports, Homeland Defense, Customs & Immigration, and Scene Analysis etc.

REFERENCE
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] Xiaofei He, Partha Niyogi, "Locality Preserving Projections (LPP) ", Computer Science Department, The University of Chicago, The University of Chicago, In Advances in Neural Information Processing Systems, Volume 16, page 37, Cambridge, MA, USA, 2004. The MIT Press. Xiaofei He; Deng Cai; Shuicheng Yan; Hong-Jiang Zhang,"Neighborhood preserving embedding", . Tenth IEEE International Conference on Computer Vision, Volume 2, Issue , 17-21 Oct. 2005. Shuicheng Yan; Dong Xu; Benyu Zhang; Hong-Jiang Zhang,"Graph embedding: a general framework for dimensionality reduction", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2, June 2005 E. Osuna, R. Freund, and F. Girosi, "Training support vector machines: an application to face detection," In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 1997. S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy. (2005). SVM and kernel methods Matlab toolbox. Perception Systmes et Information, INSA de Rouen, Rouen, France [Online]. Available: http://asi.insa-rouen.fr/~arakotom/toolbox/index.html Rein-Lien Hsu, Abdel-Mottaleb M. and Anil K. Jain, "Face detection in color images", ICIP, vol. 1, pp. 1046-1049, Greece, Oct. 2001. H.A. Rowley and T. Kanade, "Neural network-based face detection", IEEE Trans. on PAMI, vol. 20, no. 1, pp. 23-38,Jan. 1998. S. Sirohey, A. Rosenfiled, and Z. Duric, "A method of detection and tracking iris and eyelids in video," Pattern Recognition, vol. 35, pp. 1389-1401, 2002. Haro, M. Flickner, and I. Essa, "Detecting and tracking eyes by using their physiological properties, dynamics, and appearance," in Proc. IEEE Conf.. Computer Vision and Pattern Recognition, vol. 1, pp. 163-168, 2000. T. D'Orazio, M. Leo, G.Cicirelli, and A. Distante, "An algorithm for real time eye detection in face images," in Proc. 17th Int. Conf. on Pattern Recognition, vol. 3, pp. 278-281, 2004. K. K. Sung and T. Poggio, "Example-based learning for view-based human face detection," IEEETrans. Pattern Analysis and Machine Intelligence, vol. 20. no. 1, pp. 39-51, 1998. L. Wiskott, J.-M. Fellous, N. Krueuger, C. vonder Malsburg, Face Recognition by Elastic Bunch Graph Matching, Chapter 11 in Intelligent Biometric Techniques in Fingerprint and Face Recognition,eds. L.C. Jain et al., CRC Press, 1999, pp. 355-396. H. Hotelling. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24:417441, 1933.

ISSN: 0975-5462

2294

S. Sakthivel et. al. / International Journal of Engineering Science and Technology Vol. 2(6), 2010, 2288-2295
[15] R. Huber, H. Ramoser, K. Mayer, H. Penz, and M. Rubik Classification of coins using an eigenspace approach. Pattern Recognition Letters, 26(1):6175, 2005. [16] M.A. Turk and A.P. Pentland. Face recognition using eigenfaces . In Proceedings of the Computer Vision and Pattern Recognition 1991, pages 586591, 1991. [17] A.M. Posadas, F. Vidal, F. de Miguel, G. Alguacil, J.Pena, J.M. Ibanez, and J. Morales. Spatial-temporal analysis of a seismic series using the principal components method. Journal of Geophysical Research, 98(B2):19231932, 1993 [18] M. Partridge and R. Calvo. Fast dimensionality reduction and Simple PCA. Intelligent Data Analysis, 2(3):292298, 1997. [19] M.E. Tipping and C.M. Bishop. Probabilistic prinicipal component analysis. Technical Report NCRG/97/010, Neural Computing Research Group, Aston University, 1997. [20] T.W. Anderson. Asymptotic theory for principal component analysis. Annals of Mathematical Statistics, 34:122 148, 1963. [21] M. Brand. Charting a manifold. In Advances in Neural Information Processing Systems, volume 15, pages 985992, Cambridge, MA, USA, 2002. The MIT Press. [22] B. Scholkopf, A.J. Smola, and K.-R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):12991319, 1998. [23] J. Shawe-Taylor and N. Christianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004. [24] T. Cox and M. Cox. Multidimensional scaling. Chapman & Hall, London, UK, 1994. [25] J.B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis Psychometrika, 2 9:127, 1964. [26] W. Bledsoe, The model method in facial recognition, Panoramic Research Inc., Tech. Rep. PRI:15, Palo Alto, CA, 1964.

ISSN: 0975-5462

2295

You might also like