Professional Documents
Culture Documents
ABSTRACT
With the growth of digital technologies, ever
increasing visual data is created and stored. Thus the
explosive growth of image data leads to the growth
and development in content based image retrieval
(CBIR). However research in this area shows that
there is a semantic gap between image semantics and
content based image retrieval. Thus to bridge the gap
automatic image annotation technique (AIA) is used.
In this paper, we focus on Automatic Image
Annotation technique, which extract features using
SIFT technique. Scale Invariant Feature Transform
(SIFT) algorithm is a machine learning technique
which employs to extract features and key points.
Visual words are then constructed using k means
clustering. Accuracy of registered image depends on
accurate feature detection and matching. Hence,
SIFT technique is robust and accurate method for
Automatic Image registration.
Keywords AIA automatic image annotation, BoW bag
of words model, DoG Difference of Gaussian, SIFT
scale invariant feature transform, visterms visual words
1. INTRODUCTION
With the advent of digital technologies, the number
of digital images has been growing rapidly and there is a
need for effective and efficient tool to find visual
information. A huge amount of information is available,
and everyday gigabytes of visual information is being
generated, transmitted and stored. A large amount of
research has been carried out in image retrieval area.
Systems using non-textual (image) queries have been
proposed but many users find it hard to represent their
information needs using abstract image features. Most
users prefer textual queries and this has been usually
achieved by manually providing keywords or captions
2. RELATED WORK
In this section, we review some of the popular
methods for automatic image annotation. Automatic
www.ijsret.org
713
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
www.ijsret.org
714
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
1. SIFT IMPLEMENTATION
Scale invariant feature transform (SIFT) is used for
extracting distinctive invariant features from images that
can be invariant to image scale and rotation. This
method was proposed by David Lowe in 1999 and has
been applied in many areas. Figure 1 shows the steps
involved in detecting image features using SIFT [4].
Image
SIFT descriptor
Keypoint orientation
Where * is the convolution of x and y
www.ijsret.org
715
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
Fig.4 Maxima and minima of the differenceof-Gaussian images are detected by comparing a
pixel (marked with X) to its 26 neighbors in 3x3
regions at the current and adjacent scales (marked
with circles).
4.3 Keypoint Localization
Fig.3 For computing difference-of Gaussian each
octave of scale space is convolved with Gaussians to
reduce set of scale space images, which are then
subtracted to produce difference-of-Gaussian. After
each octave, the Gaussian image is down-sampled by
a factor of 2, and the process repeated.
It is a particularly efficient function to compute, as
the smoothed images, L, need to be computed in any
case for scale space feature description, and D can
therefore be computed by simple image subtraction, as
stated in [5].
4.2 Extrema detection
This stage is used to find extrema points in DoG
pyramid. In order to detect the local maxima and minima
of D(x, y, ), each sample point is compared to its eight
neighbors in the current image and nine neighbors in the
scale above and below i.e. 26 neighbors. The points are
selected only if it is larger than all of these neighbors or
smaller than all of them. The cost of this check is
reasonably low due to the fact that most sample points
will be eliminated following the first few checks. The
following Fig.4 shows how sample points are considered
and compared[5].
x
where, D and its derivatives are evaluated at the
sample point and X = (x, y, )T is the offset from this
point. The location of the extremum, , is determined by
taking the derivative of this function with respect to X
and setting it to zero, giving
www.ijsret.org
716
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
16 histograms x 8 orientations
= 128 features
Fig.5 Finding the keypoint descriptors
5. VISUAL WORD REPRESENTATION
The visual word representation is also called as
Bag of word model also known as bag of features or bag
of visterms model for object recognition. The bag of
word model was firstly proposed in the text retrieval
domain problem for text recognition. The bag of word
model in case of image analysis is based on vector
www.ijsret.org
717
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
1. Keypoint detection
2. Feature extraction
Detect key
points
Compute
descriptors
3.Vector quantization
Cluster
descriptors
4.Bag of words
Build
histograms
Fig.6 Four steps for constructing the bag-of-words for image representation
6. TAG ASSIGNMENT
This section will introduce with the whole
version of our implementation for automatic image
annotation. It measures the correlation between the
visual words and labels or tags and automatically
annotates images. The effectiveness of the
implementation is computed by using large set of
training and testing images.
Firstly we try to build visual word distributions
for all training images, thus we can assign labels to the
visual word distributions. As the labels need to be
learned from their visual word distribution from all the
training images, each training image is introduced to all
the labels which are stored as an annotation file for all
the training images.
Before training an image to all labels, the labels
have no knowledge of the visual word distribution. Thus
from the annotated dataset we try to map the objects in
an image with the label set and maintain a visual word
and label co-relation. Then again the visual words and
label information is trained to get the correlation among
the training set of images.
7. EXPERIMENTAL RESULTS
All of the experiments were based on the
labelMe datasets. For the labelMe dataset [8], each
image is presented in visual and textual high
dimensional features. For the above specified dataset we
consider 1200 ground truth images. We extracted the
visual and textual features in the following ways. We
www.ijsret.org
718
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
Original Annotations
Automatic annotation
And
2.0
Fmeasure
0.933
www.ijsret.org
719
International Journal of Scientific Research Engineering & Technology (IJSRET), ISSN 2278 0882
Volume 3, Issue 3, June 2014
9. REFERENCES
1. Li, X., Chen, L., Zhang, L., Lin, F., and Ma, W.Y., Image annotation by large-scale contentbased image retrieval, 2006.
2. Wang, X.-J., Zhang, L., Jing, F., and Ma, W.-Y.,
Annosearch: Image auto-annotation by search.
In IEEE Conference on Computer Vision and
Pattern Recognition, New York, USA, 2006.
3. Kobus Barnard, Pinar Duygulu, David Forsyth,
Nando de Fretias, David M. Blei,and Michael I.
Jordan. Matching words and pictures. Journal
of Machine Learning Research, 3:11071135,
2003.
4. Wikipedia, Scale-invariant feature transform
Wikipedia,
The
Free
Encyclopedia,http://en.wikipedia.org/w/index.ph
p?title=Scale-invariantfeature
transform&oldid=304881559.
5. D. G. Lowe, Distinctive image features from
scale-invariant keypoints , International Journal
of Computer Vision, vol. 60, no. 2, pp. 91110,
2004.
6. Chih-Fong Tsai, Bag-of-words representation
in image annotation, International Scholarity
Research Network of Artificial Intelligence,
vol.2012.
7. Christian Hetschel, Sebastian Stober, Andreas
Nurnberger and Marcin Detyniecki, Automatic
Image Annotation using a visual dictionary
based on reliable Image Segmentation, LNCS
4918,pp. 45-56, 2008.
8.
www.ijsret.org
720