You are on page 1of 7

POINT DETECTION

Point detectors are used to find interest points in the images. By interest point we simply mean
any point in the image for which the signal changes two dimensionally. Interest points have been
used in the context of motion, stereo and tracking problems. A desirable quality of an interest
point is its invariance to changes in illumination and camera viewpoint.
The commonly used interest point detectors are Moravecs interest operator, Harris interest point
detector, KLT detector and SIFT detector.
MORAVECS INTEREST OPERATOR:
The Moravec Interest Operator module identifies "interesting" points within the image using the
Moravec Interest Operator detection technique. These points normally identify corners of objects
as corners are good points to detect and track. The technique first calculates an edge map based
on horizontal, vertical and both diagonal directions. From this edge map the maximum response
of all these edges is recorded. Once all maximal edge responses have been calculated the point
that is maximal with respect to its neighbors is identified as "interesting" and marked. In simple
terms, it computes the variation of image intensities in a 4X4 patch in the horizontal, vertical,
diagonal, and anti-diagonal directions and selects the minimum of the four variations as
representative values for the window. A point is declared interesting if the intensity variation is a
local maximum in 12X12 patch.

Source Image

Detected points

HARRIS INTEREST POINT DETECTOR


The Harris detector computes the rst order image derivatives,
(Ix , I y ), in x and y directions to highlight the directional intensity
variations, then a second moment matrix, which encodes this
variation, is evaluated for each pixel in a small neighborhood:

An interest point is identied using the determinant and the trace of M which
measures the variation in a local neighborhood R = det(M )k tr(M )2 , where k is
constant. The interest points are marked by thresholding R after applying nonmaxima suppression. Harris detector improves the approach of Moravec by
using the auto correlation matrix M. The use of discrete directions and
discrete shifts is thus avoided. Interest points are detected if the autocorrelation matrix M has two significant eigen values.

Source image

Detected points

SIFT (Scale Invariant Feature Transform)


The corner detectors like Harris are rotation-invariant, which means, even if the image is rotated,
we can find the same corners. It is obvious because corners remain corners in rotated image also.
But what about scaling? A corner may not be a corner if the image is scaled. Harris corner is not
scale invariant.
So, in 2004, D.Lowe, University of British Columbia, came up with a new algorithm, Scale
Invariant Feature Transform (SIFT) in his paper, Distinctive Image Features from ScaleInvariant Keypoints, which extract keypoints and compute its descriptors.
It is composed of four steps. First, a scale space is constructed by convolving the
image with Gaussian lters at different scales. Convolved images are used to
generate difference-of-Gaussians (DoG) images. Candidate interest points are
then selected from the minima and maxima of the DoG images across scales.
The next step updates the location of each candidate by interpolating the color
values using neighboring pixels. In the third step, low contrast candidates as

well as the candidates along the edges are eliminated. Finally, remaining interest
points are assigned orientations based on the peaks in the histograms of
gradient directions in a small neighborhood around a candidate point. SIFT
detector generates a greater number of interest points compared to other
interest point detectors. This is due to the fact that the interest points at
different scales and different resolutions (pyramid) are accumulated.

BAYES FILTER
Bayes filter is a general probabilistic approach for estimating an unknown probability density
function recursively over time using incoming measurements and a mathematical process model.
A Bayes filter is an algorithm used in computer science for calculating the probabilities of
multiple beliefs to allow a robot to infer its position and orientation. Essentially, Bayes filters
allow robots to continuously update their most likely position within a coordinate system, based
on the most recently acquired sensor data. This is a recursive algorithm. It consists of two parts:
prediction and innovation. If the variables are linear and normally distributed the Bayes filter
becomes equal to the Kalman filter.
The true state is assumed to be an unobserved Markov process, and the measurements are
the observed states of a Hidden Markov Model (HMM). The following picture presents a
Bayesian Network of a HMM.

Because of the Markov assumption, the probability of the current true state given the
immediately previous one is conditionally independent of the other earlier states.

Similarly, the measurement at the k-th timestep is dependent only upon the current state, so is
conditionally independent of all other states given the current state.

Using these assumptions the probability distribution over all states of the HMM can be written
simply as:

However, when using the Kalman filter to estimate the state x, the probability distribution of
interest is associated with the current states conditioned on the measurements up to the current
time step. (This is achieved by marginalising out the previous states and dividing by the
probability of the measurement set.)
This leads to the predict and update steps of the Kalman filter written probabilistically.
The probability distribution associated with the predicted state is the sum (integral) of the
products of the probability distribution associated with the transition from the (k - 1)-th timestep
to the k-th and the probability distribution associated with the previous state, over all
possible
.

The probability distribution of update is proportional to the product of the measurement


likelihood and the predicted state.

The denominator

is constant relative to , so we can always substitute it for a coefficient , which can usually be
ignored in practice. The numerator can be calculated and then simply normalized, since its
integral must be unity.

KERNEL TRACKING
A new approach toward target representation and localization, the central component in visual
tracking of non-rigid objects, is proposed. The feature histogram-based target representations are
regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth
similarity functions suitable for gradient-based optimization, hence, the target localization
problem can be formulated using the basin of attraction of the local maxima. We employ a
metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift
procedure to perform the optimization.
Kernel tracking is typically performed by computing the motion of
the object, which is represented by a primitive object region, from one
frame to the next. The object motion is generally in the form of
parametric motion (translation, conformal, afne, etc.) or the dense
ow eld computed in subsequent frames. These algorithms differ in
terms of the appearance representation used, the number of objects
tracked, and the method used to estimate the object motion.
Tracking Using Template and Density-Based Appearance Models.
Templates and density-based appearance models have been widely
used because of their relative simplicity and low computational cost.
We divide the trackers in this category into two subcategories based
on whether the objects are tracked individually or jointly.
1.

Tracking single objects. The most common approach is


template matching. Template matching is a brute force method
of searching the image, Iw , for a region similar to the object
template, Ot dened in the previous frame. The position of the
template in the current image is computed by a similarity
measure. A limitation of template matching is its high
computation cost due to the brute force search. To reduce the
computational cost, researchers usually limit the object search
to the vicinity of its previous position. Instead of templates,
other object representations can be used for tracking, for

instance, color histograms or mixture models can be computed


by using the appearance of pixels inside the rectangular or
ellipsoidal regions.
Jepson et al. [2003] propose an object tracker that tracks
an object as a three-component mixture, consisting of the
stable appearance features, transient features and noise
process.
The stable component identies the most reliable appearance
for motion estimation, that is, the regions of the object whose
appearance does not quickly change over time.
The transient component identies the quickly changing
pixels.
The noise component handles the outliers in the object
appearance that arise due to noise.
2.

Tracking Multiple Objects. Modeling objects individually does


not take into ac-count the interaction between multiple
objects and between objects and background during the
course of tracking. An example interaction between objects
can be one object partially or completely occluding the other.
The tracking methods given in the following model the
complete image, that is, the background and all moving objects
are explicitly tracked.
An object tracking method based on modeling the whole
image, I t , as a set of layers has been proposed. This
representation includes a single background layer and one
layer for each object. Each layer consists of shape priors
(ellipse), motion model (translation and rotation), and layer
appearance.
Layering is performed by rst compensating the background
motion modeled by projective motion such that the objects
motion can be estimated from the compensated image using
2D parametric motion. Then, each pixels probability of
belonging to a layer (object), pi , is computed based on the
objects previous motion and shape characteristics. Any pixel
far from a layer is assigned a uniform background
probability, pb. Later, the objects appearance (intensity,
color) probability pa is coupled with pi to obtain the nal
layer estimate.
However, due to the difculty in simultaneous estimation of
the parameters, the authors individually estimate one set,
while xing the others. For instance, they rst estimate layer
ownership using the intensity for each pixel, then they
estimate the motion (rotation and translation) using
appearance probabilities, and nally update layer ownership
using this motion. The unknowns for each object are

iteratively estimated until the layer ownership probabilities


are maximized.

You might also like