Professional Documents
Culture Documents
CHAPTER 1
INTRODUCTION
CHAPTER 1
INTRODUCTION
This project aims at creating a system using which , it is easily possible to estimate
automatically whether each student is present or absent and to mark his/her attendance
automatically. It is also possible to know whether students are awake or sleeping and whether
students are interested or bored in lecture using Concentration Analysis.
For simplicity, we are testing the system for a single person per trial using a simple webcam.
MATLAB is used for testing the system’s behavior under various external conditions like
noise, illumination, etc. The entire process of marking the attendance of the student plus
concentration analysis is actually divided into separate modules namely: Registration using
Facial Detection (using Viola-Jones algorithm), Facial Recognition (using Principal
Component Analysis) and Concentration Analysis (using Thresholding).
This project is able to register images of the student from video feed. These images form the
necessary training database required for Facial Recognition. The images undergo subsequent
intensity normalization and noise removal techniques for Image Enhancement. After
registration, the registering user needs to add name respective to his/her images. Then, for
marking attendance, the project accepts live video feed of a single student as input .The facial
recognition algorithm is implemented in the background and the name of the student is
displayed on the screen. The presence or absence of the student is marked corresponding to
his her name in the database which can be displayed. For concentration analysis, Firstly, an
eye pair is detected for the student whose concentration is to be measured. Secondly, the
project counts number of blinks per frame set. These form the basis of measuring
concentration percentage of the person.
Given a real time video of an ongoing class, the system should be able to detect
and recognize students to record their attendance automatically.
It should utilize minimum resources in terms of hardware and cost.
It should be able to save time which is otherwise wasted in taking the attendance
manually.
It should be able to measure the increase or decrease in student’s concentration
in class at subsequent time intervals.
CHAPTER 2
LITERATURE SURVEY
CHAPTER 2
LITERATURE SURVEY
This paper describes a face detection framework that is capable of processing images
extremely rapidly while achieving high detection rates. There are three key contributions.
The first is the introduction of a new image representation called the “Integral Image” which
allows the features used by our detector to be computed very quickly. The second is a simple
and efficient classifier which is built using the AdaBoost learning algorithm (Freund and
Schapire, 1995) to select a small number of critical visual features from a very large set of
potential features. The third contribution is a method for combining classifiers in a “cascade”
which allows background regions of the image to be quickly discarded while spending more
computation on promising face-like regions. A set of experiments in the domain of face
detection is presented. The system yields face detection performance comparable to the best
previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade,
2000; Roth et al., 2000). The system was implemented on a conventional desktop, face
detection proceeds at 15 frames per second.
This paper brings together new algorithms and insights to construct a framework for robust
and extremely rapid visual detection .In other face detection systems, auxiliary information,
such as image differences in video sequences, or pixel color in color images, have been used
to achieve high frame rates. This system achieves high frame rates working only with the
information present in a single grey scale image. These alternative sources of information
can also be integrated with our system to achieve even higher frame rates. There are three
main contributions of our face detection framework.
The first contribution of this paper is a new image representation called an integral image
that allows for very fast feature evaluation. Motivated in part by the work of Papageorgiou
et al. (1998) our detection system does not work directly with image intensities. Like these
authors we use a set of features which are reminiscent of Haar Basis functions (though we
will also use related filters which are more complex than Haar filters). In order to compute
these features very rapidly at many scales we introduce the integral image representation for
images (the integral image is very similar to the summed area table used in computer
graphics (Crow, 1984) for texture mapping). The integral image can be computed from an
image using a few operations per pixel. Once computed, any one of these Haar like features
can be computed at any scale or location in constant time.
The second contribution of this paper is a simple and efficient classifier that is built by
selecting a small number of important features from a huge library of potential features using
AdaBoost (Freund and Schapire, 1995). Within any image sub-window the total number of
Haar-like features is very large, far larger than the number of pixels. In order to ensure fast
classification, the learning process must exclude a large majority of the available features,
and focus on a small set of critical features. Motivated by the work of Tieu and Viola (2000)
feature selection is achieved using the AdaBoost learning algorithm by constraining each
weak classifier to depend on only a single feature. As a result each stage of the boosting
process, which selects a new weak classifier, can be viewed as a feature selection process.
AdaBoost provides an effective learning algorithm and strong bounds on generalization
performance (Schapire et al., 1998).
The third major contribution of this paper is a method for combining successively more
complex classifiers in a cascade structure which dramatically increases the speed of the
detector by focusing attention on promising regions of the image. The notion behind focus
of attention approaches is that it is often possible to rapidly determine where in an image a
face might occur. More complex processing is reserved only for these promising regions.
The key measure of such an approach is the “false negative” rate of the attention process. It
must be the case that all, or almost all, face instances are selected by the attention filter. We
will describe a process for training an extremely simple and efficient classifier which can be
used as a “supervised” focus of attention operator.1 A face detection attention operator can
be learned which will filter out over 50% of the image while preserving 99%of the faces (as
evaluated over a large dataset). This filter is exceedingly efficient; it can be evaluated in 20
1216110074, 1216110080, 1216110091, 1216110094, 1216110109, 1216110124 Page | 7
AUTOMATED ATTENDANCE AND CONCENTRATION ANALYSIS SYSTEM
Viola–Jones Face Detection: The Viola - Jones method for face object detection contains
three techniques:
Integral Image for feature extraction, the Haar-like features is rectangular type
that is obtained by integral image.
Figure 2.1 An Integral Image whose value will be calculated at point (x, y)
As shown in Figure 2.1, the value of the integral image at point (x, y) is the sum of all the pixels
above and to the left.
Face recognition systems have been grabbing high attention from commercial market point
of view as well as pattern recognition field. It also stands high in researcher’s community.
Face recognition have been fast growing, challenging and interesting area in real-time
applications. A large number of face recognition algorithms have been developed from
decades. The present paper refers to different face recognition approaches and primarily
focuses on principal component analysis, for the analysis and the implementation is done in
free software, Scilab. This face recognition system detects the faces in a picture taken by
web-cam or a digital camera, and these face images are then checked with training image
dataset based on descriptive features. Descriptive features are used to characterize images.
Matlab’s IMAQ toolbox is used for performing image analysis.
Face recognition systems have been grabbing high attention from commercial market point
of view as well as pattern recognition field. Face recognition has received substantial
attention from researches in biometrics, pattern recognition field and computer vision
communities. The face recognition systems can extract the features of face and compare this
with the existing database. The faces considered here for comparison are still faces. Machine
recognition of faces from still and video images is emerging as an active research area. The
present paper is formulated based on still or video images captured either by a digital camera
or by a web cam. The face recognition system detects only the faces from the image scene,
extracts the descriptive features. It later compares with the database of faces, which is
collection of faces in different poses.
This paper mainly addresses the building of face recognition system by using Principal
Component Analysis (PCA). PCA is a statistical approach used for reducing the number of
variables in face recognition. In PCA, every image in the training set is represented as a
linear combination of weighted eigenvectors called Eigen faces. These eigenvectors are
obtained from covariance matrix of a training image set. The weights are found out after
selecting a set of most relevant Eigen faces. Recognition is performed by projecting a test
image onto the subspace spanned by the Eigen faces and then classification is done by
measuring minimum Euclidean distance. A number of experiments were done to evaluate
the performance of the face recognition system.
Over the last ten years or so, face recognition has become a popular area of research in
computer vision and one of the most successful applications of image analysis and
understanding. Because of the nature of the problem, not only computer science researchers
are interested in it, but neuroscientists and psychologists also. It is the general opinion that
advances in computer vision research will provide useful insights to neuroscientists and
psychologists into how human brain works, and vice versa .The goal is to implement the
system (model) for a particular face and distinguish it from a large number of stored faces
with some real-time variations as well. It gives us efficient way to find the lower dimensional
space. Further this algorithm can be extended to recognize the gender of a person or to
interpret the facial expression of a person. Recognition could be carried out under widely
varying conditions like frontal view, a 45° view, scaled frontal view, subjects with spectacles
etc. are tried, while the training data set covers limited views. The algorithm models the real-
time varying lighting conditions as well. But this is out of scope of the current
implementation. The aim of this research paper is to study and develop an efficient
MATLAB program for face recognition using principal component analysis and to perform
test for program optimization and accuracy. This approach is preferred due to its simplicity,
speed and learning capability.
Eigen faces are a set of eigenvectors used in the computer vision problem of human face
recognition. Eigen faces assume ghastly appearance. They refer to an appearance-based
approach to face recognition that seeks to capture the variation in a collection of face images
and use this information to encode and compare images of individual faces in a holistic
manner. Specifically, the Eigen faces are the principal components of a distribution of faces,
or equivalently, the eigenvectors of the covariance matrix of the set of face images, where
an image with NxN pixels is considered a point (or vector) in N2 -dimensional space.
The idea of using principal components to represent human faces was developed by Sirovich
and Kirby and used by Turk and Pentland for face detection and recognition. The Eigen face
approach is considered by many to be the first working facial recognition technology, and it
1216110074, 1216110080, 1216110091, 1216110094, 1216110109, 1216110124 Page | 11
AUTOMATED ATTENDANCE AND CONCENTRATION ANALYSIS SYSTEM
served as the basis for one of the top commercial face recognition technology products. Since
its initial development and publication, there have been many extensions to the original
method and many new developments in automatic face recognition systems. Eigen faces is
still considered as the baseline comparison method to demonstrate the minimum expected
performance of such a system. Eigen faces are mostly used to:
Extract the relevant facial information, which may or may not be directly related
to human intuition of face features such as the eyes, nose, and lips. One way to
do so is to capture the statistical variation between face images.
Represent face images efficiently. To reduce the computation and space
complexity, each face image can be represented using a small number of
dimensions.
The Eigen faces may be considered as a set of features which characterize the global
variation among face images. Then each face image is approximated using a subset of the
Eigen faces, those associated with the largest Eigen values. These features account for the
most variance in the training set. In the language of information theory, we want to extract
the relevant information in face image, encode it as efficiently as possible, and compare one
face with a database of models encoded similarly. A simple approach to extracting the
information contained in an image is to somehow capture the variations in a collection of
face images, independently encode and compare individual face images. Mathematically, it
is simply finding the principal components of the distribution of faces, or the eigenvectors
of the covariance matrix of the set of face images, treating an image as a point or a vector in
a very high dimensional space. The eigenvectors are ordered, each one accounting for a
different amount of the variations among the face images. These eigenvectors can be
imagined as a set of features that together characterize the variation between face images.
Each image locations contribute more or less to each eigenvector, so that we can display the
eigenvector as a sort if “ghostly” face which we call an Eigen face. The face images that are
studied are shown in the Figure 2.3, and their respective Eigen faces are shown in Figure
2.4.
Each of the individual faces can be represented exactly in terms of linear combinations of
the Eigen faces. Each face can also be approximated using only the “best” Eigen face, which
has the largest Eigen values, and the set of the face images. The best M Eigen faces span an
M dimensional space called as the “Face Space” of all the images. The basic idea using the
Eigen faces was proposed by Sirovich and Kirby as mentioned earlier, using the principal
component analysis and where successful in representing faces using the above mentioned
analysis. In their analysis, starting with an ensemble of original face image they calculated
a best coordinate system for image compression where each coordinate is actually an image
that they termed an Eigen picture. They argued that at least in principle, any collection of
face images can be approximately reconstructed by storing a small collection of weights for
each face and small set if standard picture (the Eigen picture). The weights that describe a
face can be calculated by projecting each image onto the Eigen picture. Also according to
the Turk and Pentland [1], the magnitude of face images can be reconstructed by the
weighted sums of the small collection of characteristic feature or Eigen pictures and an
efficient way to learn and recognize faces could be to build up the characteristic features by
experience over feature weights needed to (approximately) reconstruct them with the
weights associated with known individuals. Each individual therefore would be
characterized by the small set of features or Eigen picture weights needed to describe and
Eigen Face Approach: One of the simplest and most effective PCA approaches used in face
recognition systems is the so-called Eigen face approach. This approach transforms faces
into a small set of essential characteristics, Eigen faces, which are the main components of
the initial set of learning images (training set).
Recognition is done by projecting a new image in the Eigen face subspace, after which the
person is classified by comparing its position in Eigen face space with the position of known
individuals. The advantage of this approach over other face recognition systems is in its
simplicity, speed and insensitivity to small or gradual changes on the face.
The problem is limited to files that can be used to recognize the face. Namely, the images
must be vertical frontal views of human faces. The whole recognition process involves two
steps:
Initialization process
Recognition process
These operations can be performed from time to time whenever there is a free excess
operational capacity. This data can be cached which can be used in the further steps
eliminating the overhead of re-initializing, decreasing execution time thereby increasing the
performance of the entire system [4]. Having initialized the system, the next process involves
the steps:
Calculate a set of weights based on the input image and the M Eigen faces by
projecting the input image onto each of the Eigen faces.
Determine if the image is a face at all (known or unknown) by checking to see if
the image is sufficiently close to a ―free space.
If it is a face, then classify the weight pattern as either a known person or as
unknown.
Update the Eigen faces or weights as either a known or unknown, if the same
unknown person face is seen several times then calculate the characteristic
weight Face Recognition Using Principal Component Analysis Method.
The last step is not usually a requirement of every system and hence the steps are left optional
and can be implemented when there is a requirement.
CHAPTER 3
PROPOSED METHODOLOGY
CHAPTER 3
PROPOSED METHODOLOGY
The system design for the proposed model has been broken down into three key steps:
Registration, Recognition and Concentration Analysis.
3.1.1 Registration:
In this module we are taking video feed as input. To register the images we are using facial
detection. Noise removal, averaging and resizing of images to proper resolution is performed
here. These images form the training database.
3.1.2 Recognition:
In this module we are taking video feed as input, with one student at a time. The face is
recognized with the help of PCA facial recognition and the name of the recognized student
is displayed in the annotation on the video input.
In this module the attendance is marked automatically and results are displayed. This tells
us about the attendance of the student.
In this module, the number of blinks are calculated per frame set and concentration is
determined whether it is increasing or decreasing.
1. Face Detection.
2. Face Recognition.
3. Concentration Analysis.
A face detector has to tell whether an image of arbitrary size contains a human face and if
so, where it is. One natural framework for considering this problem is that of binary
classification, in which a classifier is constructed to minimize the misclassification risk.
Since no objective distribution can describe the actual prior probability for a given image to
have a face, the algorithm must minimize both the false negative and false positive rates in
order to achieve an acceptable performance.
This task requires an accurate numerical description of what sets human faces apart from
other objects. It turns out that these characteristics can be extracted with a remarkable
committee learning algorithm called AdaBoost, which relies on a committee of weak
classifiers to form a strong one through a voting mechanism. A classifier is weak if, in
general, it cannot meet a predefined classification target in error terms.
To study the algorithm in detail, we start with the image features for the classification task.
3.2.1.1 Features:
The Viola-Jones algorithm uses Haar-like features, that is, a scalar product between the
image and some Haar-like templates. More precisely, let I and P denote an image and a
pattern, both of the same size N × N as shown in Figure 3.6. The feature associated with
pattern P of image I is defined by,
1≤I≤N 1 ≤ j ≤ N1 ≤ I ≤ N 1≤ j ≤ N
As shown in Figure 3.6, the example rectangle features shown relative to the enclosing
detection window. The sums of the pixels which lie within the White rectangles are
subtracted from the sum of pixels in the grey rectangles. Two-rectangle features are shown
in (A) and (B). Figure (C) shows a three-rectangle feature, and (D) a four-rectangle feature.
To compensate the effect of different lighting conditions, all the images should be mean and
variance normalized beforehand. Those images with variance lower than one, having little
information of interest in the first place, are left out of consideration.
Our face detection procedure classifies images based on the value of simple features. There
are many motivations for using features rather than the pixels directly. The most common
reason is that features can act to encode ad-hoc domain knowledge that is difficult to learn
using a finite quantity of training data. For this system there is also a second critical
motivation for features: the feature-based system operates much faster than a pixel-based
system.
More specifically, we use three kinds of features. The value of a two-rectangle feature is the
difference between the sums of the pixels within two rectangular regions. The regions have
the same size and shape and are horizontally or vertically adjacent as shown in Figure 3.6.
A three rectangle feature - computes the sum within two outside rectangles subtracted from
the sum in a center rectangle. Finally a four-rectangle feature computes the difference
between diagonal pairs of rectangles.
Given that the base resolution of the detector is 24 ×24, the exhaustive set of rectangle
features is quite large, 160,000. Note that unlike the Haar basis, the set of rectangle features
is over complete.
Rectangle features can be computed very rapidly using an intermediate representation for
the image which we call the integral image. The integral image at location x, y contains the
sum of the pixels above and to the left of x, y, inclusive:
x’≤x, y’≤y
Where ii (x, y) is the integral image and i (x, y) is the original image. Using the following
pair of recurrences:
Where s(x, y) is the cumulative row sum, s(x, −1) =0, and ii (−1, y) = 0) the integral image
can be computed in one pass over the original image. Using the integral image any
rectangular sum can be computed in four array references. Clearly the difference between
two rectangular sums can be computed in eight references. Since the two-rectangle features
defined above involve adjacent rectangular sums they can be computed in six array
references, eight in the case of the three-rectangle features, and nine for four-rectangle
features.
The authors point out that in the case of linear operations (e.g. f·g), any invertible linear
operation can be applied to f or g if its inverse is applied to the result. For example in the
case of convolution, if the derivative operator is applied both to the image and the kernel the
result must then be double integrated:
The authors go on to show that convolution can be significantly accelerated if the derivatives
of f and g are sparse (or can be made so). A similar insight is that an invertible linear
operation can be applied to f if its inverse is applied to g:
( f’)*∫∫(g) = f *g ………………………………(v)
Viewed in this framework computation of the rectangle sum can be expressed as a dot
product (i.r), where i is the image and r is the box car image (with value1 within the rectangle
of interest and 0 outside). This operation can be rewritten,
The integral image is in fact the double integral of the image (first along rows and then along
columns). The second derivative of the rectangle (first in row and then in column) yields
four delta functions at the corners of the rectangle. Evaluation of the second dot product is
accomplished with four array accesses.
As shown in Figure 3.7, the sum of the pixels within rectangle D can be computed with four
array references. The value of the integral image at location1 is the sum of the pixels in
rectangle A. The value at location 2 is A + B, at location 3 is A + C, and at location 4 is A +
B + C + D. The sum within D can be computed as 4 + 1 − (2 + 3).
How to make sense of these features is the focus of AdaBoost. A classifier maps an
observation to a label valued in a finite set. For face detection, it assumes the form of f :
Rd→ {−1, 1}, where 1 means that there is a face and −1 the contrary and d is the number of
Haar-like features extracted from an image. Given the probabilistic weights w・∈ R+
assigned to a training set made up of no observation-label pairs (xi, yi), AdaBoost aims to
iteratively drive down an upper bound of the empirical loss
Under mild technical conditions. Remarkably, the decision rule constructed by AdaBoost
remains reasonably simple so that it is not prone to over fitting, which means that the
empirically learned rule often generalizes well.
This section describes an algorithm for constructing a cascade of classifiers which achieves
increased detection performance while radically reducing computation time. The key insight
is that smaller, and therefore more efficient, boosted classifiers can be constructed which
reject many of the negative sub-windows while detecting almost all positive instances.
Simpler classifiers are used to reject the majority of sub-windows before more complex
classifiers are called upon to achieve low false positive rates. Stages in the cascade are
constructed by training classifiers using AdaBoost. Starting with a two-feature strong
classifier, an effective face filter can be obtained by adjusting the strong classifier threshold
to minimize false negatives. The initial AdaBoost threshold,
is designed to yield a low error rate on the t = 1 training data. A lower threshold yields higher
detection rates and higher false positive rates. The detection performance of the two-feature
classifier is far from acceptable as a face detection system. Nevertheless the classifier can
significantly reduce the number of sub-windows that need further processing with very few
operations:
The overall form of the detection process is that of a degenerate decision tree, what we call
a “cascade”. A positive result from the first classifier triggers the evaluation of a second
classifier which has also been adjusted to achieve very high detection rates. A positive result
from the second classifier triggers a third classifier, and so on. A negative outcome at any
point leads to the immediate rejection of the sub-window. The structure of the cascade
reflects the fact that within any single image an overwhelming majority of sub-windows are
negative. As such, the cascade attempts to reject as many negatives as possible at the earliest
stage possible. While a positive instance will trigger the evaluation of every classifier in the
cascade, this is an exceedingly rare event.
Much like a decision tree, subsequent classifiers are trained using those examples which pass
through all the previous stages. As a result, the second classifier faces a more difficult task
than the first. The examples which make it through the first stage are “harder” than typical
examples .At a given detection rate, deeper classifiers have correspondingly higher false
positive rates.
1. User selects values for f, the maximum acceptable false positive rate per layer
and d, the minimum acceptable detection rate per layer.
2. User selects target overall false positive rate, Ftarget.
3. P = set of positive examples
4. N = set of negative examples
5. F0 = 1.0; D0 = 1.0
6. I=0
7. while Fi >Ftarget
7.1. i←i+ 1
7.2. ni= 0; Fi = Fi−1
7.3. while Fi >f ×Fi−1
7.3.1. ni←ni+ 1
7.3.2. Use P and N to train a classifier with ni features using AdaBoost
7.3.3. Evaluate current cascaded classifier on validation set to determine Fi
and Di.
7.3.4. Decrease threshold for the ith classifier until the current cascaded
classifier has a detection rate of at least d×Di−1 (this also affects Fi )
7.4. N ←∅
7.5. If Fi >Ftarget then evaluate the current cascaded detector on the set of non-face
images and put any false detections into the set N
Since the final detector is insensitive to small changes in translation and scale, multiple
detections will usually occur around each face in a scanned image. The same is often true of
some types of false positives. In practice it often makes sense to return one final detection
per face. Toward this end it is useful to post process the detected sub-windows in order to
combine overlapping detections into a single detection. In these experiments detections are
combined in a very simple fashion. The set of detections are first partitioned into disjoint
subsets. Two detections are in the same subset if their bounding regions overlap. Each
partition yields a single final detection. The corners of the final bounding region are the
average of the corners of all detections in the set. In some cases this post processing
decreases the number of false positives since an overlapping subset of false positives is
reduced to a single detection.
Face recognition systems have been grabbing high attention from commercial market point
of view as well as pattern recognition field. Face recognition has received substantial
attention from researches in biometrics, pattern recognition field and computer vision
communities. The face recognition systems can extract the features of face and compare this
with the existing database. The faces considered here for comparison are still faces. Machine
recognition of faces from still and video images is emerging as an active research area.
The face recognition system detects only a face from the image scene, extracts the
descriptive features. It later compares with the database of faces, which is collection of faces
in different poses. The present system is trained with the database, where the images are
taken in different poses, with glasses, with and without beard.
Eigen faces are a set of eigenvectors used in the computer vision problem of human face
recognition. Eigen faces assume ghastly appearance. They refer to an appearance-based
approach to face recognition that seeks to capture the variation in a collection of face images
and use this information to encode and compare images of individual faces in a holistic
manner. Specifically, the Eigen faces are the principal components of a distribution of faces,
or equivalently, the eigenvectors of the covariance matrix of the set of face images, where
an image with N x N pixels is considered a point (or vector) in N2 dimensional space. The
idea of using principal components to represent human faces was developed by Sirovich and
Kirby and used by Turk and Pent land for face detection and recognition .The Eigen face
approach is considered by many to be the first working facial recognition technology, and it
served as the basis for one of the top commercial face recognition technology products. Since
its initial development and publication, there have been many extensions to the original
method and many new developments in automatic face recognition systems. Eigen faces is
still considered as the baseline comparison method to demonstrate the minimum expected
performance of such a system. Eigen faces are mostly used to:
Extract the relevant facial information, which may or may not be directly related
to human intuition of face features such as the eyes, nose, and lips. One way to
do so is to capture the statistical variation between face images.
Represent face images efficiently. To reduce the computation and space
complexity, each face image can be represented using a small number of
dimensions The Eigen faces may be considered as a set of features which
characterize the global variation among face images. Then each face image is
approximated using a subset of the Eigen faces, those associated with the largest
Eigen values. These features account for the most variance in the training set. In
the language of information theory, we want to extract the relevant information
in face image, encode it as efficiently as possible, and compare one face with a
database of models encoded similarly. A simple approach to extracting the
information contained in an image is to somehow capture the variations in a
collection of face images, independently encode and compare individual face
images.
Each of the faces can be represented exactly in terms of linear combinations of the Eigen
faces. Each face can also be approximated using only the “best” Eigen face, which has the
largest Eigen values, and the set of the face images. The best M Eigen faces span an M
dimensional space called as the “Face Space” of all the images. The basic idea using the
Eigen faces was proposed by Sirovich and Kirby as mentioned earlier, using the principal
component analysis and where successful in representing faces using the above mentioned
analysis. In their analysis, starting with an ensemble of original face image they calculated
a best coordinate system for image compression where each coordinate is actually an image
that they termed an Eigen picture. They argued that at least in principle, any collection of
face images can be approximately reconstructed by storing a small collection of weights for
each face and small set if standard picture (the Eigen picture). The weights that describe a
face can be calculated by projecting each image onto the Eigen picture. Also according to
the Turk and Pentland, the magnitude of face images can be reconstructed by the weighted
sums of the small collection of characteristic feature or Eigen pictures and an efficient way
to learn and recognize faces could be to build up the characteristic features by experience
over feature weights needed to (approximately) reconstruct them with the weights associated
with known individuals.
Each individual, therefore would be characterized by the small set of features or Eigen
picture weights needed to describe and reconstruct them, which is an extremely compact
representation of the images when compared to themselves.
1. Initialization process
2. Recognition process
These operations can be performed from time to time whenever there is a free excess
operational capacity. This data can be cached which can be used in the further steps
eliminating the overhead of re-initializing, decreasing execution time thereby increasing the
performance of the entire system.
Having initialized the system, the next process involves the steps -
1. Calculate a set of weights based on the input image and the M Eigen faces by
projecting the input image onto each of the Eigen faces.
2. Determine if the image is a face at all (known or unknown) by checking to see if
the image is sufficiently close to a “free space”.
3. If it is a face, then classify the weight pattern as either a known person or as
unknown.
4. Update the Eigen faces or weights as either a known or unknown. If the same
unknown person face is seen several times then calculate the characteristic
weight pattern and incorporate into known faces. The last step is not usually a
requirement of every system and hence the steps are left optional and can be
implemented as when the there is a requirement.
1216110074, 1216110080, 1216110091, 1216110094, 1216110109, 1216110124 Page | 30
AUTOMATED ATTENDANCE AND CONCENTRATION ANALYSIS SYSTEM
Let the training set of face images be Γ1,Γ2…………..ΓM. The average face of the set is defined
by,
Ψ = (1/M)∑Γk …………………………(ix)
Φ = Γi − Ψ …...…..………………………(x)
The vectors µk and λk scalars are Eigen vectors and Eigen values, respectively; of the
covariance matrix.
1
C = (m) (∑𝑀
𝑛=1 Φ. Φt) = A. 𝐴
𝑇
………………………(xii)
The matrix C, however, is N2 xN2 by N, and determining the N Eigen vectors and Eigen
values is an intractable task for typical image sizes. A Computationally feasible method is
to be funded to calculate these eigenvectors. If the number of data points in the image space
is M (M<N2), there will be only M-1 meaningful eigenvectors, rather than N2. The
eigenvectors can be determined by solving much smaller matrix of the order M2xM2 which,
reduces the computations from the order of N2 to M, pixels. Therefore we construct the
matrix L
L=A. AT ……………………………….(xiii)
where,
𝑇
𝐿𝑚𝑛 = Φ𝑚 Φ𝑛 …………………………..(xiv)
and find the M eigen vector ul of L . These vectors determine linear combination of the M
training set face images to form the Eigen faces
v1 = µ𝑙𝑘 Φk ……………………………..(xv)
where, l = 1……M.
Once the Eigen faces are created, identification becomes a pattern recognition task. The
Eigen faces span an N2-dimensional subspace of the original A image space. The M'
significant eigenvectors of the L matrix are chosen as those with the largest associated Eigen
values. In the test cases, based on M = 6 face images, M' = 4 Eigen faces were used. The
number of Eigen faces to be used is chosen heuristically based on the Eigen values. A new
face image (I) is transformed into its Eigen face components (projected into "face space")
by a simple operation
where, k = l…….M'.
This describes a set of point-by-point image multiplications and summations. Figures shows
three images and their projections into the seven-dimensional face space ,the weights form
a vector,
Ω𝑇 = [Ω1 Ω2 … … … Ω𝑀 ] ……………………….(xvii)
that describes the contribution of each Eigen face in representing the input face image,
treating the Eigen faces as a basis set for face images. The vector is used to find which of a
number of predefined face classes, if any, best describes the face. The simplest method for
determining which face class provides the best description of an input face image is to find
the face class k that minimizes the Euclidean distance
ε𝑘 = ||Ω − Ω𝑘 || ……………………………..(xix)
A face is classified as belonging to class k when the minimum εk is below some chosen
threshold θε Otherwise the face is classified as "unknown”. The distance threshold, θε, is
half the largest distances between any two face images, mathematically can be expressed as,
θε = ½max||Ω − Ω𝑘 || ..…………………….(xx)
where j, k = 1 to M.
where,
For simplicity, we are trying to measure the concentration of a single student per trial. For
Concentration Analysis following steps should be followed:
Eye Tracking
o Track eyes in detected faces to identify its view point with respect to
camera/blackboard.
Concentration Quotient Calculation
o Calculate the number of eye-blinks of the student per certain number of
frames which is pre-defined.
o Comparing each new set of total blinks with the previous set of total
blinks.
o Calculate the concentration percentage using above collected data.
Based on the concentration percentage we will find whether the student’s concentration
increases or decreases. The main steps for the same are as explained below.
First we detect the user’s face by Haar Cascade Classifier. We are again using Viola Jones
algorithm to detect the ROI of the student i.e. the eye pair. It works almost same as described
previously. The only difference is we are now using it to detect the eye pair instead of a face.
As shown in Figure 3.8, the example of a Haar Feature that looks similar to the eye region
which is darker than the upper cheeks is applied onto a face.
Our main motive is to calculate the number of eye blinks of the student per certain number
of frames which is predefined. For this purpose thresholding plays the major role. The eye
pair image is converted into the binary format with the help of the specified threshold
depending upon the illumination. Our system gives best results at threshold 55.
(a) (b)
When eyes are closed, the image shows complete black region while when the eyes are
opened, some white objects are visible as shown in the Figure. This forms the basis of blink
detection. This is used to calculate the value of s as shown in the algorithm.
In the next step, we are comparing each new set of total blinks with the previous set of total
blinks. This is used to calculate the concentration percentage using above collected data
which is given by the formula,
Based on the concentration percentage we will find whether the student’s concentration
increases or decreases.
A Data Flow Diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modeling its process aspects. A DFD is often used as a preliminary step
to create an overview of the system, which can later be elaborated. DFDs can also be used
for the visualization of data processing (structured design).
3.4 Advantages
There are various advantages of our system. They are illustrated as follows:-
Reduce errors: Time and Attendance software reduces the risk of human error
and ensures and easy, impartial, and orderly approach in addressing specific
needs without any confusion. In fact, Time and Attendance software has been
shown to have an accuracy rate of more than 99% versus manual systems by
eliminating errors in data entry and calculations.
Increase productivity: Productivity increases because the process is seamless
and makes day-to-day operations more efficient and convenient.
Reduces Manual Work: As the system is automated it doesn’t require more
resources like hand written record of student’s attendance, but the record is
maintained in the database.
The system has less hardware requirements in comparison to the other biometric
system which is RFID based .It does not require additional components like
microcontroller. It works with camera and a computer.
As the system uses fewer resources therefore the cost of the system is less.
The system also reduces human effort.
The system does not only perform the attendance of the system but also checks
the concentration of a person in the class.
This system uses the facial recognition technology and can be further used in
various applications like for surveillance, checking the concentration of person
while driving.
This system is efficient and works perfectly in the ideal conditions.
The system also works in real time.
Software Requirements:
CHAPTER 4
EXPERIMENTAL RESULT
CHAPTER 4
EXPERIMENTAL RESULT
Here the face detection is done using the cascade object detector by viola jones algorithm
.Here we use bounding box to detect the faces in the image. In this image we detect the faces
of each and every person. The result of using the viola jones is efficient as it detects all the
faces in the images.
In this result we used the viola jones algorithm to detect faces in the real time ongoing video
.The faces are marked using a rectangular annotation with a label “face”.
We have applied our face recognition algorithm-PCA to recognize the training image to the
test image. Here we have used the KEC database .this database was created for testing the
1216110074, 1216110080, 1216110091, 1216110094, 1216110109, 1216110124 Page | 46
AUTOMATED ATTENDANCE AND CONCENTRATION ANALYSIS SYSTEM
algorithm. The database has proper illumination .the algorithm gives efficient results and
recognizes most of the images correctly
We have applied our algorithm on the standard database .this database is properly
illuminated. The results that we got using this database were really good. The recognition
percentage is 100%.
3
Average Time Per Person
2.5
2
1.5
1
0.5
0
10 20 30 40
No. of Persons
Figure 4.5 Comparisons between Execution Time of Recognition Module vs. Number of Faces
This graph describes that as the no of the persons increases in the database, the average time
for computation of recognition also increases.
This table provides us with the information for each database in which we get the results of
matching the training images with the test images. Here we also have the average time per
matching of the individual database.
In this result of concentration analysis we have detected the eyes by applying the Viola Jones
in real time.
CHAPTER 5
CONCLUSION
CHAPTER 5
CONCLUSION
We have designed a real time automated attendance system which reduces the time and
resources that is required while taking attendance manually. This system uses the technology
of face detection and recognition. The system also tells us whether the student is
concentrating in class or not by calculating the concentration of the person. Various efficient
algorithms are used in order to get the desired results. This system works well in the ideal
conditions and further improvement can be made when the conditions are not ideal like
proper illumination or lightning.
ADVANTAGES:
Reduced errors
Time and Attendance software reduces the risk of human error and ensures and easy,
impartial, and orderly approach in addressing specific needs without any
confusion. In fact, Time and Attendance software has been shown to have an
accuracy rate of more than 99% versus manual systems by eliminating errors in
data entry and calculations.
Increased productivity
As the system is automated it doesn’t require more resources like hand written record
of student’s attendance, but the record is maintained in the database.
The system has less hardware requirements in comparison to the other biometric
system which is RFID based .It does not require additional components like
microcontroller. It works with camera and a computer.
As the system uses fewer resources therefore the cost of the system is less.
The system also reduces the human effort.
The system does not only perform the attendance of the system but also checks
the concentration of a person in the class.
This system uses the facial recognition technology and can be further used in
various applications like for surveillance, checking the concentration of person
while driving.
This system is efficient and works perfectly in the ideal conditions.
The system also works in real time.
SCOPE:
REFERENCES
[2]. Paul Viola and Michael J. Jones, “Robust Real-Time Face Detection”,
International Journal of Computer Vision 57(2), p.p. 137–154, 2004.
[3]. William Robson Schwartz, Huimin Guo, Jonghyun Choi, Larry S. Davis, “Face
Identification Using Large Feature Sets”, IEEE Transactions On Image
Processing, Volume. 21, No. 4, 2012.
[5]. Matthew A. Turk and Alex P. Pentland, “Face Recognition Using Eigen Faces”,
Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91, IEEE
Computer Society Conference, p.p. 586-591, 1991.
[6]. Yi-Qing Wang, “An Analysis of the Viola-Jones Face Detection Algorithm”,
Image Processing On Line, Vol. 4, p.p. 128-148, 2014.
[7]. Tarik Crnovrsanin, Yang Wang, Kwan-Liu Ma., “Stimulating a Blink: Reduction
of Eye Fatigue with Visual Stimulus”, Conference on Human Factors in
Computing Systems, p.p.2055-2064, 2014.
[8]. Patrik Polatsek, “Eye Blink Detection”, Proceedings of 9th Student Research
Conference in Informatics and Information Technologies, Bratislava, Slovakia,
STU, 2013.
[9]. Deepak Ghimire, Joonwhoan Lee, ”A Robust Face Detection Method Based on
Skin Color and Edges”, Journal of Information Processing System, Vol. 9, 2013.
[11]. Richard M. Jiang, Abdul H. Sadka, Huiyu Zhou, ”An Automatic Human Face
Detection Method”, International Workshop on Content-Based Multimedia
Indexing, IEEE 2008.
APPENDIX
It uses Viola Jones Algorithm to detect people’s face, eyes, mouth and upper body.
2. step()
3. eig()
4. imfill()
5. bewareopen()
6. regionprops()
7. set()
8. insertObjectAnnotation()
9. imread()
10. imshow()
It display image.
11. imcrop()
It creates an interactive associated with the image displayed in the current figure,
called the target image.
12. imresize()
13. mean()
14. load()
15. save()
16. get()
17. cla()
It deletes from the current axes all graphic objects whose handles are not hidden.
18. peekdata()
19. strcat()
20. strcmp()