You are on page 1of 11

International Journal of Computer Engineering & Technology (IJCET)

Volume 8, Issue 1, January- February 2017, pp. 1929, Article ID: IJCET_08_01_003
Available online at
http://www.iaeme.com/IJCET/issues.asp?JType=IJCET&VType=8&IType=1
Journal Impact Factor (2016): 9.3590(Calculated by GISI) www.jifactor.com
ISSN Print: 0976-6367 and ISSN Online: 09766375
IAEME Publication

TRACKING MULTI-TARGETS WITH UNIFIED


HANDLING OF VIDEO
Dr. S. China Ramu
Department of Computer Science & Engineering,
Chaitanya Bharathi Institute of Technology, Hyderabad, India

Dr. D. Lakshmi Sreenivasa Reddy


Department of Computer Applications,
Chaitanya Bharathi Institute of Technology, Hyderabad, India

D. Vijaya Prasunna
Department of Computer Science & Engineering,
Chaitanya Bharathi Institute of Technology, Hyderabad, India

ABSTRACT
Data association is an essential component of the human detection and tracking system. The
majority of the existing methods, such as Bi-partite matching and GMCP methods are incorporated
the limited-temporal-locality of the sequence into data association problem. GMCP tracker is
considered as an important complete representation of the tracking problem, where all pair wise
relationships between the detections in temporal span of a video is considered and makes the input
to the data association as a complete Bi-partite graph. In Bi-partite graph a track of a person will
form a clique (a subgraph in which all the nodes are connected to each other). A cost is assigned to
each clique and it maximizes the score function, which is selected as the best clique (track), but it is
sub-optimal. GMCP tracker does not follow the joint optimization for all the tracks simultaneously
and finds the tracks one by one which makes difficulties caused by cluttered background, and
crowded scenes to detect and tracking
Tracking-by-detection methods are used to track multiple targets with unified handling of
complex scenarios, where current detection responses are linked to the previous trajectories. By
adding the standard Hungarian algorithm, dummy nodes to each trajectory to allow nodes to
temporally disappear and solve the data association implicitly in a global manner even though it is
formulated between two consecutive frames. If a trajectory fails to find its matching detection, it is
linked to its corresponding dummy nodes until its emergence of matching detection. The source
nodes are also incorporated into the account of new targets. The dummy nodes tend to accumulate
in fake or disappeared trajectories while they occasionally appear in real trajectories and improve
detection inevitable failures, which include the miss detection, the false detection and the occlusion,
where an object is partially or fully invisible because of the limited camera view. Extended hybrid
Hungarian algorithm is relatively better when compared with GMCP and Hybrid Hungarian

http://www.iaeme.com/IJCET/index.asp 19 editor@iaeme.com
Dr. S. China Ramu, Dr. D. Lakshmi Sreenivasa Reddy and D. Vijaya Prasunna

algorithm in accuracy. Experiments show that the proposed method makes significant improvement
in tracking and detection of different length of videos, specifically with short length videos.
Key words: Data Association, Human Tracking, GMCP, Hungarian algorithm.
Cite this Article: Dr. S. China Ramu, Dr. D. Lakshmi Sreenivasa Reddy and D. Vijaya Prasunna,
Tracking Multi-Targets with Unified Handling of Video. International Journal of Computer
Engineering & Technology, 8(1), 2017, pp. 1929.
http://www.iaeme.com/IJCET/issues.asp?JType=IJCET&VType=8&IType=1

1. INTRODUCTION
The quandary of finding the detections corresponding to one particular object of different frames of a video
is data association. The input to the data association quandary of a sequence can ideally be represented by
a graph in which all the detections in each frame are connected to all other detections in the other frames,
regardless of their propinquity in time by utilizing Hungarian algorithm [8]. Similarly, the output can be
ideally represented by several subgraphs of the input in which the detections belonging to common entities
are connected.
Identifying the ideal subgraphs which are represented in the exact solution to the data association
requires solving an optimization quandary which remains unsolved due to its high complexity. One of the
solutions to find ideal graphs is limited-temporal localities, e.g. two or few frames of the input graph, and
solving the optimization quandary of the smaller and less complex subgraph.
A Extended Hungarian method of data association which incorporates both the appearance and motion
in a global way. The proposed framework incorporates the whole temporal span of the sequence into the
data association quandary, but mainly focuses on one object at time rather than addressing all of them
simultaneously. This is useful to avoid an extremely complex optimization quandary [1, 2]. Although it
focuses on solving the data association quandary for one object at time it also incorporates all the other
objects implicitly. Therefore, approximations in the object-domain are significantly less restrictive than
those used by other approximate methods, such as limited-temporal-locality. This occurs due to limited-
temporal-locality methods are literally blind to the information outside of the temporal neighborhood they
focused on, while the proposed method incorporates the whole approximation domain, i.e. all objects,
implicitly. These fundamental differences are overcome by proposed algorithm, which are relatively better
in data association quandary.

1.1. Tracker
The global data association is shown in the form of block diagram Fig. 1. The first step is to detect objects
(humans) in each frame. Next, by dividing the input video into the number of segments and finding the
tracklet of pedestrians within each segment utilizing the global method for tracklet generation. In the last
step, merge the tracklets found in all the segments are to form the trajectory of each person over the course
of the whole video. The appearance of the pedestrian remains better consistent throughout a video, the
pattern of the motion tends to differ significantly in short and long term. In exiting principles, it is difficult
to model the motion of one person for a long duration without having the knowledge of the destination,
structure of the scene, interactions between people, etc. However, the motion can be modeled sufficiently
using constant velocity or acceleration models over a short period of time. Therefore, the way of motion is
incorporated into the global data association process [6,7] should be different in short and long terms. This
inspires to employ the hierarchical approach, i.e. determining tracklets first and then merging them into full
trajectories.

http://www.iaeme.com/IJCET/index.asp 20 editor@iaeme.com
Tracking Multi-Targets with Unified Handling of Video

Figure 1 The block diagram of the proposed human tracking method.

2. FINDING TRACKLETS
The proposed method divides a video into f frames froms segments of each. A data association method
for finding tracklets which are globally consistent [12] in terms of motion and appearance over the course
of a each segment. Than the input to data association quandary for determining tracklets is a graph G = (V,
E, W), where V denote the set of nodes, E denote the set of edges and W denote the weights of edges,
respectively. Each node V is divided into f disjoint clusters. Each cluster represents in one frame and all
nodes represents the objects (human) detections in that particular frame. Let Ci, where i N : 1 i f,
denote the frame (cluster) i, and Uai denotes ath detection (node) in the ith frame, Where N is set if Natural
{
numbers. Therefore Ci = {u1l , u2l , u3l ,....} the edges of the graph are defined E = ( uai , ubj ) i j} which
represents all the nodes in G are inter connected as long as they do not belong to the same cluster. A node
uai is associated with a location feature, lii ,which is the 2-dimensional spatial coordinates of the center of
the corresponding detection and appearance features, bjj , which represents the color histogram for body
part of the detection. The weight of an edge between two nodes, w: E R+, represents the similarity
between the two corresponding detections:
g

(
w ( uai , ubj ) = k lii , bjj
i =1
) (1)

where k represents histogram intersection kernel.


Finding the tracklet requires identifying the detections of a person in each frame of one particular
person in a segment. Therefore, a feasible solution to this quandary can be represented by a subgraph of G
in which one node (detection) is selected from each cluster (frame). The subgraph represents a feasible
solution Gfs = (Vfs, Efs, Wfs). Therefore, Gfs contains a set of nodes which includes the general form Vs =
{Va1, Vb2, Vc3,...} which means the ath node from 1st cluster, bth node from 2nd cluster, and so on are
selected to be in Vs. By the definition, Es = {E (p, q) |p Vs, q Vs} and Ws = {W (p, q) |p Vs, q Vs}.
The feasible solution Gfs represents the tracklet of one person and remaining people are not visible in this
segment. Fig 2 (a) shows the human detections in the small segment frames along with the graph Gfs and
they form (b) in Fig 2. (d) Shows a feasible solution Gfs with the tracklet which forms in (c) in Fig 2. The
left column shows the detections of persons in each frame along with graph G. The middle column shows
the feasible solution with minimal cost along with the tracklet it forms. The right column shows the
feasible solution with minimal cost along with the tracklet it forms.

http://www.iaeme.com/IJCET/index.asp 21 editor@iaeme.com
Dr. S. China Ramu, Dr. D. Lakshmi Sreenivasa Reddy and D. Vijaya Prasunna

Figure 2 Finding a tracklet for


or a small segment of 6 frames.
To compose Gfs, the he set of nodes Vs are enough to calculate feasible solutions which are denote by FS.
The appearance cost of the feasible asible solution Fs is defined as below:
1
( 2
f f
)
apperance (Fs ) = i=1 j =1, j i w( Fs (i), Fs ( j)) (2)

Eq. 2 is the cost function of the complete graph induced by the nodes in Fs which is a global cost
function predicated on comparing all pairs of detections detection from the feasible solut
solutions, no matter how close
they are temporally. This is based on the posit that the appearance of people does not transmute drastically
in a segment. Overlapping bounding boxes, occlusion, noisy descriptors, background domination, etc in
part of a trajectoryy can potentially cause in the limited-temporal-locality ones.
he minimum appearance cost, by finding the feasible solution i.e. arg min vs ( appearance ( Fs )) , the tracklet
The
of the person in the segment with the most stable color histogram h features will be found. Generalized
Minimum Clique Problem (GMCP), Generalized Graph Problems, known as Generalized Network Design
Problems [10], are a class of quandaries which are commonly built on the concept of generalizing the
standard subgraph quandaries. s. The generalization
generalization is commonly done by expanding the definition of a node
to a cluster of nodes. For example,
example, the objective in the standard Traveling Salesman Problem ((TSP) is to
find all visited the nodes of the input graph exactly once by using the minimal Hamilt Hamiltonian cycle. In the
Generalized Traveling Salesman Problem, the nodes of the input graph are grouped into disjoint clusters,
and the objective is to find all visited nodes of the input graph exactly once by using the Minimal
Hamiltonian cycle.
Similarly, inn the Generalized Minimum Clique Problem [3] the nodes of the input graph are grouped
into disjoint clusters. The objective is to find a subset of the nodes that include exactly one node from each
cluster while requiring the minimum cost for the complete graph graph they produce [15].
In order to have a more formal definition
def of GMCP with Hungarian,, assume that a graph G = (V, E, W)
exists, where G is undirected and weighted, V is the set of all nodes, E is the set of edges, and W: E R+
is the weight of a given en edge. The set of nodes V is divided into f clusters C1, C2,..., Cf such that all of the
clusters are completely disjoint: C1 C2 ... Cf = V and ci c j = (1 i j f ) . A feasible solution to
the GMCP with Hungarian instance is a subgraph Gfs = (Vfs, Efs, Wfs), where Vfs is a subset of V which
encompasses only one node from a given cluster. Efs is a subset of E which includes the nodes Vs induces
and Wfs is a set of their corresponding weights from W. Our objective is to find fin the feasible solution with
minimal cost. Where here the cost is defined as the sum of all weights along with the solution subgraph.
In this formulation, there exists an edge E for all possible pairs of nodes of V, as long as they do not
belong to the same cluster. ThereforTherefore, feasible solution to a clique is by making the subgraph Gfs to be
complete.
As seen from the formulation of data association quandaries explained in Eq 2, it can solves the same
optimization quandary. Iff the input graph G is formed, it is needed to solve so for finding a tracklet.

http://www.iaeme.com/IJCET/index.
IJCET/index.asp 22 editor@iaeme.com
Tracking Multi-Targets with Unified Handling of Video

Therefore, the optimal solution is obtained by solving the graph G which corresponds to the feasible
solution with most consistency in appearance of features over the course of the segment, i.e.
arg min vs ( appearance ( Fs )) is found.
Not only the appearance, but also incorporates motion into the data association quandary, integrate one
more term to the cost function and define global data association as the following optimization quandary:
v s = arg min vs (appearance (Fs ) + motion (Fs )) (3)
Where v s is the optimal solution to determine the data association for one tracklet and is the
mixture constant which balances the contribution of appearance and motion.
Finding v s by solving Eq 3 yields the tracklet of one person in the segment. Therefore, the
optimization quandary of Eq. 3 has to be solved several times, in each segment in order to find the tracklets
of all the pedestrians. The algorithm finds the tracklet which has the least total cost for first time Eq 3 is
solved,, i.e. the most stable appearance features and most consistent motion with the model [13]. Then, the
vertices selected in v s are excluded from G and the above optimization process is repeated to find the
tracklet for the next person, and so on. This process is repeated until zero or few nodes remains in G.
Since the algorithm finds the tracklets with stable and consistent in their appearance and motion of
features. The tracklets which are less liable to be confused are calculated and excluded from G first.
Therefore, GMCP with Hungaraian method does not lower the chance of successful extraction for the
tracklets found at the last iterations. Global motion-cost model, which defines the term motion ( Fs ) .

3. PROPOSED METHOD
3.1. Tracklet-Global Motion Model
To calculate a cost of the feasible solution Fs based on motion, it incorporates motion into the optimization
process [4] of Eq 3,. The spatial velocity vector for the feasible solution Fs is defined as: Xs(i) = X(i + 1)
X(i), where 1 i (f 1). One common approach in computing the motion cost is calculating the
deviation from a presumed model, such as constant velocity. This can be done by using each velocity
vector to predict the spatial location of the detection immediately after it, and summing up the errors
between the predicted locations and corresponding locations in the feasible solution. This piecewise
approach is mainly used in bipartite matching and similar approaches [9, 10, 11]. However, in global
framework one feasible solution is meant to represent one tracklet over the course of the whole segment.
Therefore calculate the motion cost in a more effective way, which assures both piecewise and global
consistency within the model. By assuming the constant velocity model for the motion of pedestrians in
one segment and calculate the motion cost as:
deviation
s s 1 6444447444448
motion ( Fs ) = | X s (i) [ X s ( j ) + X s ( j ).(i j )] | (4)
i =1 j =1
1444
424444
3
prediction

Where the term in brackets in eq. 4 is the predicted location for the node Fs (i) using Xs(j). In Eq 4, by
assuming that a person moves at a constant velocity manner in one segment and each element of Xs vector
is used to predict the location of all other nodes in the feasible solution Fs .

http://www.iaeme.com/IJCET/index.asp 23 editor@iaeme.com
Dr. S. China Ramu, Dr. D. Lakshmi Sreenivasa Reddy and D. Vijaya Prasunna

Figure 3 Tracklet-Global motion cost.3 (a) shows the tracklet of a feasible solution with three outliers. 3 (b) and 3
(c) show the cost for an outlier and inlier, respectively.
Fig. 3 [3] explains more about the tracklet global motion. Fig. 3 (a) shows a feasible solution which is
being generated for a person with the red boundary. However, three detections of another person are
mistakenly selected in the feasible solution. Therefore, expect the three wrong selections to add a large
value to the motion cost, while the rest of the selected nodes, which are consistent and with low cost
values. The value of Eq. 3 is shown for two nodes of i = 6 and i = 3 in parts 3 (b) and 3 (c), respectively.
The black circles show the predicted locations for the node i. The red lines depict the distance between the
predicted locations and Xs(i), which shows the deviation from the model. The value node i, which adds to
the motion cost is the sum of these distances. As shown before the node i = 6 is not consistent with the
majority of the tracklet and adds a large value to the cost whereas i = 3 adds a lower value.
Therefore, in contrasts with the piecewise motion models and the motion cost in Eq. 4 is calculated by
measuring the deviation from the constant velocity model in a tracklet in a global manner due to all nodes
are contributing into the cost of the other nodes. Eventhough by using constant velocity model in Eq 4, the
extension to the constant acceleration and higher order models for more complicated scenarios is
straightforward.

3.2. Merging Tracklets into Trajectories


The proposed method divides the video into s segments and find the tracklets in each segment of all the
pedestrians using the GMCP with Hungarian method. In order to generate a trajectory of a person, merge
the tracklets belonging to each individual one over the course of the full video. This is a data association
quandary by using any available data association method, such as bi-partite matching [5, 4, 3]. However, in
order to have a fully global framework, use the same GMCP with Hungarian method (Hybrid Hungarian)
for finding tracklets to merge them. Therefore, the clusters and nodes in G now represent segments and
tracklets, respectively. The appearance of a feature node, which represents one tracklet, is defined as the
average appearance of the human detections in the tracklet and its spatial location is defined as middle
point of the tracklet. Fig.4 [3] shows six consecutive segments with their tracklets along with the complete
graph of their representative nodes induced. The left column shows six consecutive segments with four
tracklets in each, along with G. The middle column shows a feasible solution without adding the
hypothetical nodes [8] to handle tracklet-occlusion. The right column shows the converged solution Vs
along with the generated full trajectory.

http://www.iaeme.com/IJCET/index.asp 24 editor@iaeme.com
Tracking Multi-Targets
Multi Targets with Unified Handling of Video

Fig
Figure 4 Merging tracklets into trajectories.
The data association at the track level is fundamentally different from the one used in finding tracklets.
By assuming pedestrian moves at a constant velocity within one segment, but modeling
mo the human motion
over long periods of time in a track becomes extremely difficult. Generally, its difficult to model the
motion of pedestrians for a long duration without the knowledge of scene structure, intentions, destination
and social interactions etc.

3.3. Handling Using Dummy Nodes


By adding the standard
ard Hungarian algorithm [8] where unequal rows & columns increases occlusions to
avoid them dummy nodes are introduced to equal the numbers rows and columns. columns By introducing dummy
nodes to each trajectory to allow nodes to temporally disappear and solve the data association implicitly in
a global manner. If a trajectory fails to find its matching detection, it is linked to its corresponding dummy
nodes until its emergence of matching detection. The dummy nodes are also incorporated into the account
of new targets and tend to accumulate in fake or disappeared trajectories while they occasionally appear in
real trajectories and improve detection inevitable failures, which include the miss detection, the false
detection and the occlusion, where an object is partially or fully invisible because
beca of the limited camera
view. The set of cliques, where one dummy node from each cluster is selected in each clique. In
formulation of each node, it represents
represent a tracklet
acklet of person which may not necessarily be present in all the
frames (cluster) or miss-detected
detected by using Hungarian Algorithm. In order to avoid selecting irrelevant
nodes in a track of a person, by introducing an additional set of nodes in each cluster called dummy nodes.
Dummy nodes are treated the same as the rest of the nodes in the graph with only one difference. The
weights of the edges connected to each dummy node are fixed to a pre-defined
pre defined value of cluster cd. Our
dummy nodes will ensure that the tracks for each person will be free of outliers. In other words, when there
is no confident tracklet for a clique in a particular cluster, a dummy node from that th cluster is selected.
Cliques, which are selected shown in different color and dummy nodes are shown with triangles. The
dummy nodes are used to fill the miss-detection
miss detection spots whenever needed. Considering the dummy nodes
expand the cost function into four terms is shown as below
Re alEdges Re alEdges Re alEdges Dummyedges
} } } }
j1
c j1 y j1 + j c j2 y j2 + j c j3 y j3 + j c j4 y j4
2 3 4
(5)

Where y j1 , y j2 , y j3 , andy j4 , are the four types of variables in column


column vector Y. y j1 defines the variables
specified to real edges in the graph, y j2 are used to define the variables for dummy edges, i.e, edges which
whic
are connected to dummy nodes, y j3 is the variable for real nodes representing the tracklets in each cluster
and finally y j4 represents the dummy nodes in the graph. The cost associated to each type
t of variable is
defined using cj c j1 , c j2 , c j3 , andc j4 . In formulation, c j2 = cd and c j3 and c j4 are set to zero. However one
can also define a cost for the nodes in the graph, e.g average detection confidences of one o tracklet can

http://www.iaeme.com/IJCET/index.
IJCET/index.asp 25 editor@iaeme.com
Dr. S. China Ramu, Dr. D. Lakshmi Sreenivasa Reddy and D. Vijaya Prasunna

define the score of a node and defined based on the motion and appearance similarity of the two tracklets.
Given the number of clusters and the number of nodes in each cluster, one can define the upper bound for
the number of dummy nodes which needs to be added to each cluster as shown below
N di = j i N j (6)

Where N di is the number of dummy nodes added to cluster i and Nj is the number of true-nodes in
cluster j. One should note that this is the upper bound for the number of dummy nodes, where the
assumption is that, for each track there is only one true node among all the clusters. This upper bound for
dummy nodes overcomes the occlusion problems. Dummy nodes are able to robustly replace miss
detections as well as detection hypothesis with low global appearance and motion similarity with the rest
of the merging tracklets.

3.4. Extended Hungarian Algorithm


For each row and column multiply with n values (n represents number of rows or columns).This process
enhanced the frames and improves the accuracy in detection.

4. EXPERIMENTAL RESULTS
4.1. Dataset
The proposed method is evaluated on two benchmark sequences to test its adaptability to different
scenarios (e.g., different frame rates, resolution, time span, and crowdedness). The data sets are taken [14,
15].
Parking Lot [14]: This sequence is introduced for multi-target tracking evaluations recently. On average, 14
people are visible. There are 1000 frames with moving crowded targets. Detections are provided.
Town Center [15]: This sequence shows a busy town center street from a single elevated camera. On
average, 16 people are visible. Furthermore, many people are not detected due to partial occlusions caused
by static scene structures such as benches. The dataset provides manually detection of pedestrians is
coarsely. Run the multi-target tracking on the different frames.

Table 1 Comparison of different algorithms of parking lot video with different lengths.
GMCP GMCP Hybrid Hybrid Extended Extended
algorithm algorithm Hungarian Hungarian Hybrid Hybrid
algorithm algorithm Hungarian Hungarian
Length of No.of Accuracy No.of Accuracy No.of Accuracy
videos persons rate persons rate persons rate
detected detected detected
10 secs 5 55% 6 66% 8 88%
15 secs 6 50% 10 83% 10 83%
32 secs 5 35% 12 85% 12 85%

Table 1 depicts Extended Hybrid Hungarian algorithm accuracy is relative better when compared to the
accuracy of GMCP algorithm, Hybrid Hungarian algorithm of parking lot video with different lengths.

http://www.iaeme.com/IJCET/index.asp 26 editor@iaeme.com
Tracking Multi-Targets
Multi Targets with Unified Handling of Video

Comparison of Accuracy rate


88% 83%83% 85%85%
100% 55%66%

Accuracy
80% 50%
60% 35%
40%
20%
0%
10 Secs 15 Secs 32 Secs

Length of parking Lot video

GMCP Algorithm Accuracy rate


Hybrid Hungarian Algorithm Accuracy rate
Extended Hybrid Hungarian Algorithm Accuracy rate

Figure 5 Comparison of accuracy between GMCP algorithm, GMCP with Hungarian algorithm and GMCP with
Extended Hungarian algorithm of Parking Lot Video.

Table 2 Comparison of different algorithms of Town Center data set.

GMCP Hybrid Hybrid Extended Extended


algorithm GMCP
Hungarian Hungarian Hybrid Hybrid
algorithm
algorithm algorithm Hungarian Hungarian
No.of No.of No.of
Lengths Accuracy Accuracy Accuracy
persons persons persons
of video rate rate rate
detected detected detected
15 secs 7 43% 12 75% 14 77%
32 secs 8 50% 13 81% 13 81%

Table 2 depicts Extended Hybrid Hungarian algorithm accuracy is relative better when compared to the
accuracy
acy of GMCP algorithm, Hybrid Hungarian algorithm of Town Center data set.

Comparison of accuracy

75% 77% 81% 81%


100%
Accuracy

80% 43% 50%


60%
40%
20%
0%
15 secs 32 secs

Lengths of Town Center video

GMCP Algorithm Accuracy rate


Hybrid Hungarian Algorithm Accuracy rate
Extended Hybrid Hungarian Algorithm Accuracy rate

Figure 6 Comparison of accuracy


curacy between GMCP algorithm, GMCP with Hungarian algorithm and GMCP with
Extended Hungarian algorithm of Town Center video.

http://www.iaeme.com/IJCET/index.
IJCET/index.asp 27 editor@iaeme.com
Dr. S. China Ramu, Dr. D. Lakshmi Sreenivasa Reddy and D. Vijaya Prasunna

Fig 5 and 6 depicts Extended Hybrid Hungarian algorithm accuracy is relatively better when compared
to the accuracy of GMCP algorithm, Hybrid Hungarian algorithm of Parking Lot and Town Center data set
videos respectively with different lengths.
From the graph it is clearly shown that as number of frames increases the accuracy of GMCP decreases
where as in Hybrid Hungarian Algorithm and Extended Hybrid Hungarian Algorithm as number of frames
increases the accuracy also increases of Parking Lot sequences video, Town Center video with different
lengths.
Extended Hybrid Hungarian algorithm accuracy is relative better when compared with accuracy of
GMCP algorithm, Hybrid Hungarian algorithm of Town Center video, Parking Lot sequences video with
different lengths.

5. CONCLUSION
This work proposes different activities in identifying unwanted entities, tracking their actions,
understanding their action that leads terrorist activities, thefts and other activities in a better way. In this
process the code was tested for two different datasets. A video is divided into frames, those frames are then
sent to Generalized Minimum Clique Problem (GMCP) with Hybrid Hungarian algorithm and with
Extended Hybrid Hungarian Algorithm to detect persons, further this information about detected humans
in one frame is correlated with the information of the other frames and thus the detected human is tracked
in the subsequent frames.
Parking Lot sequence dataset has divided into different frames lengths which are taken under
surveillance. Each video in this dataset contains different frame lengths and the detection rate has been
improved. Town Center Dataset has also divided into two frame lengths for processing. Each video in this
dataset contains 1000 and 936 frames respectively. The detection rate has been improved relatively better
with more number of frames than existing methods.
Frame count in the video has greater impact in detecting humans which is indirectly impact in tracking
them. GMCP with Extended Hungarian algorithm is relatively better formulated when compared with
GMCP, GMCP with Hungarian algorithm in human detection and tracking. Experiments show that the
proposed method makes significant improvement in tracking of different videos.

REFERENCES
[1] Huaizu Jiang, Jinjun Wang, Yihong Gong, Na Rong, Zhenhua Chai, and Nanning Zheng, Online Multi-
Target Tracking With Unified Handling of Complex Scenarios IEEE transactions on image processing,
vol. 24, no. 11, november 2015.
[2] L. Zhang, Y. Li, and R. Nevatia, Global data association for multi-object tracking using network
flows, in Proc. IEEE Conf. CVPR, Jun. 2008,pp. 18.
[3] A. R. Zamir, A. Dehghan, and M. Shah, GMCP-tracker: Global multiobject tracking using generalized
minimum clique graphs, in Proc. 12th ECCV, 2012, pp. 343356.
[4] K. Shafique, M. Shah, A noniterative greedy algorithm formultiframe point correspondence,in Proc.
IEEE Conf. CVPR, Jun. 2005,pp. 12651272.
[5] A. Dehghan, S. M. Assari, and M. Shah, GMMCP tracker: Globally optimal generalized maximum
multi clique problem for multiple object tracking, in Proc. IEEE Conf. CVPR, Jun. 2015, pp. 4091
4099.
[6] B. Benfold and I. Reid, Stable multi-target tracking in real time surveillance video,in Proc. IEEE
Conf. CVPR, Jun. 2011.
[7] B. Leibe, K. Schindler, and L.V Gool An online learned CRF model for multi-target tracking, in Proc.
IEEE Conf. CVPR, Jun. 2012, pp. 20342041.

http://www.iaeme.com/IJCET/index.asp 28 editor@iaeme.com
Tracking Multi-Targets with Unified Handling of Video

[8] H. W. Kuhn, The Hungarian method for the assignment problem,Naval Res. Logistics Quart., vol. 2,
nos. 12, pp. 8397, 1955.
[9] L. Zhang, Y. Li, R. and Nevatia, Globally-optimal greedy algorithms for tracking a variable number of
objects, in Proc.IEEE Conf. CVPR, Jun. 2011, pp. 12011208.
[10] W. Brendel, M. Amer, and S. Todorovic, Multi-target tracking by Lagrangian relaxation to min-cost
network flow, in Proc. IEEE Conf. CVPR, Jun. 2013, pp. 18461853.
[11] J. Berclaz, F. Fleuret, E. Turetken, and P. Fua, Globally optimal solution to multi-object tracking with
merged measurements, in Proc. IEEE ICCV, Nov. 2011, pp. 24702477.
[12] C.-H. Kuo, C. Huang, and R. Nevatia, Multi-target tracking by on-line learned discriminative
appearance models, in Proc. IEEE Conf. CVPR, Jun. 2010, pp. 685692.
[13] A. Milan, K. Schindler, and S. Roth, Detection- and trajectory-level exclusion in multiple object
tracking, in Proc. IEEE Conf. CVPR, Jun. 2013, pp. 36823689.
[14] G. Shu, A. Dehghan, and M. Shah. Improving an Object De-tector and Extracting Regions using
Superpixels. In CVPR, 2013.
[15] B. Benfold and I. Reid. Stable multi-target tracking in real-time surveillance video. In CVPR, pages
34573464, June 2011.
[16] Ms. Kavita P. Mahajan, Prof. S. V. Patil, Tracking and Counting Human In Visual Surveillance System,
International Journal of Electronics and Communication Engineering and Technology, 3(3), 2012, pp.
139146.
[17] G S Akhil, Chole Manjunath and V. Ashuthosh, Video Calling System Using Biometric Remote
Authentication, International Journal of Electronics and Communication Engineering and Technology,
7(5), 2016, pp. 4757.

http://www.iaeme.com/IJCET/index.asp 29 editor@iaeme.com

You might also like