Professional Documents
Culture Documents
AbstractThe application such as video surveillance for trafc is a signicant reduction in the man power involved for doing
control in smart cities needs to analyze the large amount the same.
(hours/days) of video footage in order to locate the people who
are violating the trafc rules. The traditional computer vision However, in order to adopt such automatic solutions, cer-
techniques are unable to analyze such a huge amount of visual tain issues need to be addressed like real-time implementation,
data generated in real-time. So, there is a need for visual big occlusion, direction of motion, temporal changes in weather
data analytics which involves processing and analyzing large scale conditions, and the quality of video feed [2]. So, the processing
visual data such as images or videos to nd semantic patterns
signicant amount of information in a time constraint manner
that are useful for interpretation. In this paper, we propose a
framework for visual big data analytics for automatic detection is a challenging task. As such applications involve tasks like
of bike-riders without helmet in city trafc. We also discuss segmentation, feature extraction, classication, and tracking,
challenges involved in visual big data analytics for trafc control in which a signicant amount of information need to be
in a city scale surveillance data and explore opportunities for processed in short duration to achieve the goal of real-time
future research. implementation [1] [3]. As stated in [1], successful framework
KeywordsVisual Big Data Analytics, Smart City, Trafc Con- for surveillance application should have useful properties such
trol, Machine Learning. as real-time performance, ne tuning, and robust to sudden
changes. Keeping these challenges and desired properties in
I. I NTRODUCTION mind, we propose a visual big data analytics framework based
solution for automatic detection of bike-riders without helmet
Nowadays video surveillance systems become an essential in real time from trafc surveillance camera network of a smart
equipment in order to keep a watch on any kind of criminal city.
or anti law activity in modern civilizations. Law enforcement
agencies in modern cites all over the world deployed large To date many frameworks are proposed for road trafc
network of CCTV cameras covering all sensitive public areas monitoring using surveillance cameras. A trafc monitoring
of a city like airports, railway stations, and road network. The system includes object detection and tracking, behavioral
road trafc monitoring is most important from the point of analysis of trafc patterns, number plate recognition, and
view of tracking criminals, detecting trafc violators, detecting automated security and surveillance on video streams captured
road accidents, collecting evidences for investigation, etc. Au- by surveillance cameras. T. Abdullah et al. [4] presented a
tomatic decision making systems are desirable in order to catch framework for stream processing in cloud that is capable
various kind of trafc violators. Two-wheelers are becoming of detecting vehicles from the recorded video streams. This
very popular mode of transportation everywhere, however, a framework provides an end-to-end solution for video stream
high risk is associated with it because there is no protection to capture, storage, and analysis using a cloud based graphics
the head part of human body. So, the Governments impose the processor unit (GPU) cluster. It empowers trafc control room
use of wearing helmets to protect the head part of body for the operators by automating the process of vehicle identication
persons who are riding two-wheelers. Observing the usefulness and nding events of interest from the recorded video streams.
of helmet, Governments have made it a punishable offense to An operator only species the analysis criteria and the duration
ride a bike without helmet and have adopted manual strategies of video streams to analyze. These video streams are then
to catch the persons who are violating the trafc rules. But, automatically fetched from the cloud storage, decoded and
manual system of tracking people who are violating the trafc analyzed on a Hadoop based GPU cluster without operator
rules is not a feasible solution due to involvement of humans, intervention. It reduces the latency in video analysis process
whose efciency may decrease over a long duration [1]. by porting its compute intensive parts to the GPU cluster.
Automation of this process is highly desirable for reliable and An automatic license plate recognition system is presented
robust monitoring of these trafc rule violations. Also, it can by Y. Chen et al. [5] using cloud computing in order to
signicantly reduce the amount of human resources needed for realize massive data analysis, which enables the detection and
trafc monitoring. To make the urban areas as smart city, many tracking of a target vehicle in a city with a given license plate
countries are adopting systems involving surveillance cameras number. It realizes a fully integrated system with a surveillance
at public places for round the clock security monitoring. So, network of city scale, automatic large scale data retrieval and
this automated solution for trafc monitoring is also cost- analysis, and combination of pattern recognition in order to
effective because it is using the existing infrastructure and there achieve contextual information analysis. C. Zhang et al. [6]
887
on their visual features. basis function (RBF) to arrive at best hyper-plane.
Object classication requires some suitable representation
of visual features. In literature, histogram of oriented gradients C. Recognition of Bike-riders Without Helmet
(HOG) [10], scale invariant feature transform (SIFT) [11], and After the bike-riders are detected in the previous step, the
local binary patterns (LBP) [12] are proven to be efcient for next step is to determine if bike rider is using a helmet or not.
object detection. For this purpose, we analyze three features Usual face detection algorithms would not be sufcient for
viz; HOG, SIFT and LBP. HOG descriptors which are proven this phase due to following reasons : i) Low resolution poses
to be very efcient in object detection. These descriptors a great challenge to capture facial details such as eyes, nose,
capture local shapes through gradients. SIFT tries to cap- mouth. ii) Angle of movement of bike may be at obtuse angles.
ture key-points in the image. For each key-point, it extracts In such cases, face may not be visible at all. So, the proposed
feature vectors. Scale, rotation, and illumination invariance framework detects region around head and then proceed to
of these descriptors provide robustness in varying conditions. determine whether bike-rider is using helmet or not. In order
We used bag-of-words(BoW) technique to create a dictionary. to locate the head of bike-rider, the proposed framework uses
Then mapping SIFT descriptors to dictionary words results in the fact that appropriate location of helmet will probably be in
feature vectors. These feature vectors are used to determine upper areas of bike rider. For that we consider only the upper
similarity between images. LBP captures texture information one fourth part of object. Identied region around head of bike-
in the frame. For each pixel, a binary number is assigned by rider is used to determine if bike-rider is using the helmet or
thresholding the pixels in the circular neighborhood and the not. To achieve this, similar features as used in phase-I i.e.
frequency histogram of these numbers is the feature vector. HOG, SIFT and LBP are used. Fig. 3 visualizes the patterns
Fig. 2 visualizes the patterns of phase-I classication in 2-D for phase-II in 2-D using t-SNE [13]. The distribution of the
space using t-SNE [13]. The distribution of the HOG feature HOG feature vectors show that the two classes i.e non-helmet
vectors show that the two classes i.e bike-riders (Positive (Positive class shown in blue cross) and helmet (Negative
class shown in blue crosses) and others (Negative class class shown in red dot) fall in overlapping regions which shows
shown in red dots) fall in almost distinct regions with only the complexity of representation. However, Table I shows that
few exceptions. This shows that the feature vectors efciently the generated feature vectors contain signicant discriminative
represent the activity and contains discriminative information, information in order to achieve good classication accuracy.
which further gives hope for good classication accuracy.
1000
Negative Class
Positive Class
Negative Class
Positive Class
800
40
600
Principal Component - 2
400
Principal Component - 2
20
200
0 0
200
20
400
600
40
800
1000 500 0 500 1000
40 20 0 20 40 Principal Component - 1
Principal Component - 1
Fig. 3. Visualization of HOG feature vectors for helmet vs non-helmet
Fig. 2. Visualization of HOG feature vectors for bike-rider vs others classication using t-SNE [13]. Red dot indicates helmet class and Green
classication using t-SNE [13]. Blue cross represents bike-rider class and Red cross indicates non-helmet class [Best viewed in color].
dot represents non bike-rider class [Best viewed in color].
The method needs to determine if biker is violating the
After feature extraction, next task is to classify them as law i.e. not using helmet. For this purpose, we consider two
bike-riders vs other objects. Thus, this requires a binary classes : i) Bike-rider not using helmet (Positive Result), and
classier. Any binary classier can be used here, however, we ii) Biker using helmet (Negative Result). The support vector
choose SVM due to its robustness in classication performance machine (SVM) is used for classication using extracted fea-
even when trained from less number of feature vectors. Also, tures from previous step. To analyze the classication results
we use different kernels such as linear, sigmoid (MLP), radial and identify the best solution, different combination of features
888
Local Feature Extract Local Feature
Video Clips Local Feature
Descriptors Descriptors
(Sliding Window) Descriptors
(STIP, HOG, HOF, SIFT)
Video Stream
Distributed
Clustering
(K-Means)
Master
Vocabulary Node-N
(Visual Words) Node-1 Cloud Environment
Node-i
Node-2 Node-3
Generate
Frequency
Histograms
Video
Representation Distributed
Video Data Partitioning
(BoVW) Learning Classifier
Representation (Clustering + Binning)
(BigSVM)
(BoVW)
Fig. 4. Block diagram of visual big data framework over the cloud for trafc monitoring using large surveillance camera network of a smart city. Depicting
use of distributed processing of feature representation using bag-of-words (BoW) and classication using distributed support vector machine (SVM).
and kernels are used. Results along with analysis is included video surveillance application such as trafc monitoring in
in Result section. smart cities needs to analyze the large amount (hours/days)
of video footage in order to locate unusual patterns. The
D. Consolidated Alarm Generation real-time analysis of such a big visual data is a challenging
task due to involvement of large volume of data generated at
From earlier phases, we obtain local results i.e. whether high velocity. Thus, the overall problem becomes a visual big
bike rider is using helmet or not, in a frame. However, till data problem for which existing visual computing techniques
now the correlation between continuous frames is neglected. are unable to meet the desired performance. Also, the up-
So, in order to reduce false alarms, we consolidate local results. front infrastructure investment is costly, so we are leveraging
Consider yi be label forith frame which is either +1 or -1. the benets of cloud computing in order to get computing
n
If for past n frames, n1 i=1 (yi = 1) > Tf , then framework resources on-demand at reduced cost. This framework uses a
triggers violation alarm. Here Tf is the threshold value which hybrid cloud architecture where continuously used resources
is determined empirically. In our case, the value of Tf = 0.8 are provided by private cloud. For computationally expensive
and n = 4 were used. A combination of independent local tasks, it leverages the benet of public cloud. For example,
results from frames is used for nal global decision i.e. biker in surveillance system, some computing resources are used
is using or not using helmet. continuously viz; cameras, and attached computing resources
to check against any malicious activity on a trained model. For
III. V ISUAL B IG DATA A NALYTICS F RAMEWORK this requirement, we can have our own physical infrastructure,
a private cloud or a public cloud. For the purpose of long term
In order to apply the proposed approach on a city scale storage of data and computationally expensive training process,
surveillance camera data network, we proposed a framework it utilizes the services provided by a public cloud. As training
called visual big data framework over the cloud. The entire is a short-term need, it is a cost effective solution. In cloud, we
framework for visual big data analytics over the cloud involves can setup a suitable cluster for distributed processing of visual
visual computing, big data analytics, and cloud computing. data using MapReduce according to the need. Fig. 4 shows the
Visual computing involves processing and analyzing visual block diagram of visual big data framework over the cloud for
data such as images or videos in order to nd semantic trafc monitoring using large surveillance camera network of
patterns that are useful for interpretation. The large scale
889
a smart city. The visual computing applications consists of
two main tasks, namely, feature extraction and classication.
Both these task are computationally complex and while dealing
large visual data, the situation further becomes worse. Here, we
are using cloud computing to full ll our needs of short term
computing resources. Training over a large datasets increase
the performance of the pattern recognition tasks. The proposed
approach involves computationally complex tasks like k-means
clustering for vocabulary generation in bag-of-words (BoW)
feature representation and training of support vector machine
(SVM) classier. For that, we propose a distributed framework
shown in Fig. 4, where we distribute computations for BoW
and SVM over a distributed environment like cluster or cloud.
Bag-of-words approach makes use of clustering, however, the
enlarged volumes of data becomes a challenge to clustering.
Many distributed implementations are available for variety of Fig. 5. Sample frames from dataset
clustering algorithm, however, we use PKMeans, a parallel k-
means clustering algorithm proposed by W. Zhao et al. [14] for
MapReduce. There are three functions in PKMeans, namely, order to validate the performance of each combination of
map, combiner, and reduce. The map function assigns samples representation and model, we conducted experiments using 5-
to a closest center, a combiner function calculates the sum fold cross validation. The experimental results in Table I for
of points assigned to each cluster by a single map function bike vs. non-bike classication, show that average performance
in order to reduce communication, while reduce function of classication using SIFT and LBP features is almost similar.
computes new centers from the arrays of partial sum by each Also, the performance of classication using HOG with MLP
map function. For SVM, we use divide and conquer (DCSVM) and RBF kernels is similar to the performance of SIFT
proposed by Hsieh et al. [15], where the training samples and LBP. However, HOG with linear kernel performs better
are divided into smaller partitions using k-means clustering than all other combinations, because feature vector for this
and local SVM models are trained for each smaller partition representation is sparse in nature which is suitable for linear
independently using LIBSVM library. The support vectors of kernel. We can observe that for head vs. helmet classication,
local SVMs are then passed as an input to next level SVM. average performance of classication using SIFT and LBP is
Finally, a global SVM model is obtained via combining local almost similar. Also, the performance of classication using
SVM models. This reduces the overall time taken, specially HOG with MLP and RBF kernel is similar to the performance
when dealing with large scale data. of SIFT and LBP. However, HOG with linear kernel performs
better than all other combinations.
IV. E XPERIMENTS AND R ESULTS TABLE I. P ERFORMANCE OF CLASSIFICATION (%) OF D ETECTION OF
B IKE - RIDER WITHOUT H ELMET
The experiments are conducted on a cluster of two ma-
Feature Kernel Bike vs. Non-bike head vs. helmet
chines running Ubuntu 16.04 Xenial Xerus havining speci- Linear 98.88 93.80
cations Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz48 HOG MLP 82.89 64.50
processor, 128GB RAM with NVIDIA Corporation GK110GL RBF 82.89 64.50
Linear 82.89 64.51
[Tesla K20c]2 GPUs and Intel(R) Xeon(R) CPU E5-2697 SIFT MLP 82.89 64.51
v2 @ 2.70GHz16 processor, 64GB RAM with NVIDIA RBF 82.89 64.51
Corporation GK110GL [Tesla K20c]6 GPUs, respectively. Linear 82.89 64.53
LBP MLP 82.89 64.53
Programs are written in C++, where for video processing we RBF 82.89 64.53
use OpenCV 3.0, the bag-of-words and SVM are implemented
using OpenMP and OpenMPI with distributed k-means. From the results presented in Table I, it can be observed
that using HOG descriptors help in achieving best perfor-
A. Dataset Used mance.
We collected our own data from the surveillance system Fig. 6 & Fig. 7 presents ROC curves for performance of
at Indian Institute of Technology Hyderabad campus because classiers in detection of bike-riders and detection of bike-
there is no public data set available. Total two hour surveillance riders with or without helmet, respectively. Fig. 6 clearly shows
data is collected at 30 frame per second. Fig. 5 present samples that the accuracy is above 95% with a low false alarm rate less
from the collected dataset. First one hour video is used for than 1% and area under curve (AUC) is 0.9726. Similarly,
training the model and remaining for testing. Training video Fig. 7 clearly shows that the accuracy is above 90% with a
contains 42 bikes, 13 cars, and 40 humans. Whereas, testing low false alarm rate less than 1% and AUC is 0.9328.
video contains 63 bikes, 25 cars, and 66 humans.
C. Computational Complexity
B. Results and Discussion
To test the performance, a surveillance video of around
In this section, we present experimental results and discuss one hour at 30 fps i.e. 107500 frames was used. The proposed
the suitability of the best performing representation and model framework processed the full data in 1245.52 secs i.e. 11.58 ms
over the others. Table. I presents experimental results. In per frame. However, frame generation time is 33.33 ms, so
890
R EFERENCES
1
[1] A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, Robust real-time
unusual event detection using multiple xed-location monitors, IEEE
0.8 Transactions on Pattern Analysis and Machine Intelligence, vol. 30,
True Positive Rate (TPR)
Area Under Curve (AUC) = 0.9726 no. 3, pp. 555560, March 2008.
[2] K. Dahiya, D. Singh, and C. K. Mohan, Automatic detection of bike-
0.6 riders without helmet using surveillance videos in real-time, in Proc. of
the Int. Joint Conf. on Neural Networks (IJCNN), Vancouver, Canada,
Jul.24-29 2016.
0.4 [3] B. Duan, W. Liu, P. Fu, C. Yang, X. Wen, and H. Yuan, Real-time
on-road vehicle and motorcycle detection using a single camera, in
Procs. of the IEEE Int. Conf. on Industrial Technology (ICIT), 10-13
0.2 Feb 2009, pp. 16.
[4] T. Abdullah, A. Anjum, M. F. Tariq, Y. Baltaci, and N. Antonopoulos,
Trafc Monitoring Using Video Analytics in Clouds, in IEEE/ACM
0 International Conference on Utility and Cloud Computing, 2014, pp.
0 0.02 0.04 0.06 0.08 0.1
3948.
False Positive Rate (FPR)
[5] Y. L. Chen, T. S. Chen, T. W. Huang, L. C. Yin, S. Y. Wang, and
T. C. Chiueh, Intelligent urban video surveillance system for automatic
Fig. 6. ROC curve for classication of bike-riders vs. others in phase-I vehicle detection and tracking in clouds, in International Conference
showing high area under the curve on Advanced Information Networking and Applications (AINA), 2013,
pp. 814821.
[6] C. Zhang and E.-c. Chang, Processing of Mixed-Sensitivity Video
1 Surveillance Streams on Hybrid Clouds, in IEEE International Con-
ference on Cloud Computing, 2014, pp. 916.
[7] Z. Zivkovic, Improved adaptive gaussian mixture model for back-
True Positive Rate (TPR)
891