You are on page 1of 64

Conference Title

The International Conference on Data Mining, Multimedia,


Image Processing and their Applications (ICDMMIPA2016)

Conference Dates

September 6-8, 2016


Conference Venue

Asia Pacific University of Technology and Innovation (APU),


Malaysia

ISBN

978-1-941968-37-6 2016 SDIWC

Published by

The Society of Digital Information and Wireless


Communications (SDIWC)
Wilmington, New Castle, DE 19801, USA
www.sdiwc.net

Table of Contents

Color Image Segmentation Features and Techniques: A Comparative Study ...... 1

AAICS: Aesthetics-Driven Automatic Image Cropping and Scaling ... 8

Exploring Concepts and Narratives for First-Person Games .. 18

Automatic Speech Recognition for the Holy Qur'an, A Review 23

Differential Qiraat Processing Applications using Spectrogram Voice Analysis 30

Execution of an Advanced Data Analytics by Integrating Spark with MongoDB . 39

Content Based Image Retrieval Using Uniform Local Binary Patterns 49

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Color Image Segmentation Features and Techniques: A Comparative Study


Jalal Omer Atoum
Southern Arkansas University
100 E. University, Magnolia, AR 71753, USA
jalalomer@saumag.edu

ABSTRACT
Image segmentation is an important and
interesting digital image pre-processing phase to
enhance the performance of various pattern
recognition and computer vision applications.
Segmentation process enhance images analysis
through the extractions of features from the
relevance part of image only. In this paper, a
comparative study between five different color
segmentation techniques is performed. The
experimental results of PSNR and MSE metrics
show that K-means clustering algorithm has better
results when compared to the other algorithms,
but still need to be modified to deal with different
types of sharp and smooth edges.

KEYWORDS
Image segmentation, HSV color space, K-means,
Digital image processing.

1 INTROCUCTION
Image Segmentation as a pre-processing
phase is an effective process in many fields
such as computer vision, remote sensing,
traffic control, health care, industry, pattern
recognition, and video surveillance. The
segmentation process involves splitting an
image into a number of different regions in
which each region has a set of pixels with
high similarities but with high divergence in
pixels from pixels in other regions. The
splitting process is based on some extracted
features that may include color, intensity,
shape, depth, and/or texture gray-level. One
or more features can be used to perform the
segmentation process [1]. Several researchers
have focused on gray-level features during the
image
segmentation
process
[2].
Segmentation underlie its importance in
detecting specific part of an image that we are

ISBN: 978-1-941968-37-6 2016 SDIWC

Aalaa Albadarne
Princess Sumaya University for Technology
1438 Al-Jubaiha, 11941 Jordan
aalaaalbadarneh83@gmail.com

interested in, which is subsequently increases


the response time of the application, and
consequently increases the success of image
analysis procedures. It can be done using
several techniques based on the feature(s) that
the application is using. Some of these
techniques are based on; clusters, regions,
threshold, salient, and/or Hybrid techniques.
In this paper, different segmentation
techniques and a comparative study between
them are presented. These techniques are;
Fuzzy C-Means, Region Growing, OTSUs
adaptive Thresholding K-means clustering,
Masking with Watershed, and Hill-climbing
with K-Means algorithms [1].
In some application areas such as computer
vision and pattern recognition, gray level
threshold segmentation method is not useful
since it is difficult to detect the borders of
some distorted images [3]. However, in such
application areas, the usage of color
information from images would be very
useful in image analysis process. Therefore,
more efforts are needed to improve on the
methods of color image segmentations as a
result of having larger features of rich
information in such color images over gray
images [3].
This paper is organized as the follows:
section two presents the related literature
review, section three presents the five
different image segmentation algorithms to be
compared, section three presents the image
quality measures used in this study, section
four presents the comparative study results
between these segmentation algorithms,
finally,
the last section presents the
conclusion of this study and the future work.

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

2 LITERATURE REVIEW
Previous work on image segmentation
includes many proposed systems such as the
work of Liu Yucheng who proposed a new
fuzzy morphological system that is a fusion
based image segmentation technique [4]. To
smooth the image, this work uses the opening
and closing morphological operations
followed by the gradient operation on the
image. In addition, they have solved the over
segmentation drawback of Watershed
algorithm. This system proved that the fusion
approach saved the information details of an
image and improved the speed of
segmentation [5].
Fernando C. Monteiro proposed an image
segmentation algorithm that is based on
morphological watershed edge and region
information using the spectral method [6]. It
applies an enhancement of a preprocessing
stage using bilateral filter to reduce the noise
from an image, then preliminary segmentation
is performed using region merging. Also, it
had generated region similarity and then
region based grouping using the Multi-class
Normalized Cut method.
Weihong Cui Yi Zhang [7] image
segmentation algorithm generates multi-scale
image segmentation with an edge based auto
threshold select method, then it calculates
edge weighte. Minimum spanning tree and
edge based threshold method are used to
perform image segmentation. Experiments
results have shown that this method
maintained the object information and kept
object boundaries while segmenting the
image.
R. V. Patil claims that K-means image
segmentation will provide better results if the
number of clusters is estimated in an accurate
manner based on edge detection [8]. Phase
congruency is used to detect the edges. Then
these edges with threshold and Euclidean
distance are used to find the clusters. Kmeans is used to find the final segmentation
of an image. Experiments results have

ISBN: 978-1-941968-37-6 2016 SDIWC

showed that the number of clusters is accurate


and optimal.
Anna Fabijanska [9] used Variance Filter for
edge detection in the image segmentation
process. This process finds the edge positions,
then the Sobel Gradient filter with K-means is
also used to extract the edges of an image.
The effect of filtering in case of larger details
images concluded that a small filtering
window is preferred.
The following subsections present the five
image segmentation algorithms that are used
in our comparative study.
2.1 Otsus Adaptive Thresholding K-mean
Clustering
V.Jumb, M.Sohani, and A.Shrivas proposed a
color image segmentation technique that
initially converts the RGB input image to
HSV color space [1]. It starts by extracting
the V channel of the HSV color space. They
found that the Separation Factor (SF) had a
significant affect on image segmentation
through the V channel of the HSV color space
thresholding. If the SF is initialize to zero,
then no edges would be detected, hence, SF
will be increased slowly until it reaches near
one in order for the main edges to be detected.
To find the optimal separation factor, Otsu
[10] used the notion of maximum classes
variance to maximize the variance between
classes.
Also, Otsu defines N, the number of classes of
segmentation, to be initially equal to two, this
means to start from two classes one for
foreground and the other for background. The
N value will be increased each time the SF
value is increased until it reaches one. In this
method, objects are grouped based on their
minimized variance. The first K-mean
clustering method initializes the two class
centres randomly which are called centroids,
then by calculating the value of the histogram
bin distance between each image pixel and its
class centroid, each image pixel is then
reassign to its closest class centroid. This

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

calculation is further done on the same group


to adjust centroids positions. Finally,
morphological processing is used as an
enhancement step for better image
segmentation.
2.2 Fuzzy C-means
Literature cites that different transformations
have been used on biomedical and remote
sensing images in order to extract or discover
some of their anticipated features. It is almost
concluded that more research is needed to
come up with more realistic classifiers of
features that represent more accurately the
physical
process
[11].
Segmentation
evaluation techniques can be either
supervised
or
unsupervised.
Optimal
supervised segmentation of remote sensing
images can not be achieved [12].
T.Saikumar, et. al [11] algorithm do not
employ any training data in color features
extraction process. Initially, it uses
decorrelation stretching to enhance color
separation of satellite images and then using
the Fuzzy C-means algorithm to cluster the
regions of an image into five different classes.
The color space (L*a*b*) is used instead of
the HSV method. Also, the C-means
clustering is used instead of
K-means.
Finally, the decorrelation stretching is applied
to enhance the image color separation that
would result in the conversion of the RGB
color into a color Space (L*a*b*).
The L*a*b* color space has three layers; a
luminosity layer 'L*', a chromaticity-layer 'a*'
that specifies which color is within the redgreen axis, and a chromaticity-layer 'b*' that
specifies which color is within the red-green
axis and blue-yellow axis [13].
Fuzzy C-Means clustering process is used to
classify the colors in the 'a*b*' space. This
classification separates groups of objects
based on the distance metrics that computes
how close any two objects to each other and
based on the specified number of clusters to
be portioned. Each object in this clustering is

ISBN: 978-1-941968-37-6 2016 SDIWC

treated as having its own space location,


clustered objects are considered to be close to
each other and far away from other objects in
other clusters. Then using the Euclidean
distance metric the objects are classified into
three clusters. Finally, Fuzzy C-means
clustering returns an index of the objects
cluster in their input to be used to label each
pixel in the image and to produce a color
segmented image [11].
2.3 Region Growing
A.Kumar, P.Kumar discussed how widely
used and important is the color image
segmentation in various application areas
such as in multimedia applications. For an
effective scan process of large number of
images that may or may not contain text, it
would be much better to store and sort them
in a directory. Color and texture based are
considered to be the most two important
features of information retrieval, also, it is
very useful in indexing and management of
data, so new research should focus on color
image segmentation [3].
The Watershed Method (WS) is used in a
proposed method that is called region-based
segmentation algorithm [14]. This method
solves the problems existed in current
algorithms in which they are either facing the
problems of features extraction or over
expanding the applicability of the image
segmentation algorithm. This method is
actually used originally in geography: In
which a landscape that has been flooded with
water and its dividing lines are determined
using watersheds. In such a situation, an
image is labeled in a way that all points of a
given swamp spot are assigned a unique label.
Then the region growing method is applied to
collect all similar pixels to form a region.
Labeling of the image is done by uniquely
labelling all points of a given swamp spot that
are different from the labelling done by the
catchment basins. Also, this method initially,
selects a starting seed pixel, then group to this
seed all its similar neighbors and in a
continuous manner each new pixels plays the

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

role of a new seed pixel until no more pixels


can be added [13].
2.4 Hill-Climbing with K-means Algorithm
S. Kochra and S. Joshi had identified salient
regions as the regions that have more power
contrast than their surrounding regions [15].
This identification is applicable in various
image segmentation processes such as object
based adaptive content delivery, smart image
resizing, image retrieval, and adaptive based
image compression. Another definition of
saliency of an image is referred to its local
contrast [16][17].
This work can be summarized as follows;
given an image as input, the goal is to get a
set of visually coherent segments. This is
done by computing the image color histogram
then starting with a non-zero bin color
histogram and moving uphill until a peak is
reached. This iteration process involves a
comparison between the number of pixels in
the current and neighbouring bins histograms.
If they have different number of pixels, the
algorithm moves uphill towards the
neighbouring bin that have the higher number
of pixels, if they have the same number of
bins the algorithm continues checking the
next immediate neighbouring bin until it finds
a pair of neighbouring bins that have different
number of bins. This process is repeated until
a bin is reached that has no more possible
uphill movements. Such a pin is then called a
peak. Each identified peak represents a cluster
in the input image and all neighbouring pixels
that have the same peak forms a cluster in the
image [15].

image, then it extracts each of the color


channels; red (R), green (G), and blue (B) as a
dynamic threshold selection process to
determine the adaptive threshold based on
Gray-threshold function. For an Ndimensional convolution, an N-dimensional
grid space is generated. For smoothing
purposes, they applied image normalization
into an N-dimensional convolution image on
the three colors through masking operations
that are applied during cell and nucleus
making.
The Impose Minima is then used to create a
new minima in the image using the adaptive
selecting threshold operation at some chosen
location. The Minima function uses nucleusmasking and adaptive image masking for
morphological processing based on the
morphological image color channels R, G
and B in which watershed transformation
algorithm is applied. Finally, the labelling
process is done based on thresholding to
determine the background and foreground in
the generated segmented image [18].
3 MEASURES of IMAGE QUALITY
In order to evaluate the quality of processed
images, some measures of digital image
degradation are used. There are two types of
measures that are used in the literature;
subjective and objective. It has been reported
that subjective evaluation measures are
expensive and time consuming [1]. However,
objective evaluation measures such as Mean
Square Error (MSE), Peak Signal-to-Noise
Ratio (PSNR) and Structural Similarity
(SSIM) have been most commonly used to
evaluate the quality of images.

2.5 Masking with Watershed Algorithm


Md. Rahman, and Md. R. Islam presented an
improved modified method that is based on
calculating regions of an image using
watershed [18]. This improvement is done to
smooth images through a convolution
function,
selecting
adaptive
masking
operation, threshold, and local minimum
information. Their work starts from a RGB

ISBN: 978-1-941968-37-6 2016 SDIWC

MSE is defined as the accumulative square


error between the original and the compressed
image. Whereas, PSNR is defined as the
measure of peak error. If I(x,y) is the original
image, then I'(x,y) is the decompressed
version of this image with dimensions of M
and N. Specifically, the Mean Square Error
(MSE) value is calculated by the summation
of all square differences of all pixels divided

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

by the total pixels count. The lower the value


of MSE is the better [18]. Hence, MSE and
PSNR are defined mathematically as follows
in equation 1 and 2:

(1)

(2)
A high value of PSNR is better since it
indicates that the Signal to Noise ratio is
higher where the 'signal' is the original image
and the 'noise' is the error in rebuilding this
image. Therefore, a scheme that has a lower
MSE value and has a higher PSNR value is a
good schema [19].
4 COMPARISON OF SEGMENTATION
ARLGORITHMS
In this section, we present a performance
comparative study between the five color
image segmentation algorithms presented
earlier in this paper. These algorithms are;
Hill-climbing
with
K-means
(HKM)
algorithms, Fuzzy C-Means clustering (FCM),
OTSUs adaptive Thresholding K-means
clustering (KMC), Region Growing (RG),

and Masking with Watershed Algorithm


(MWS).
The OTSUs adaptive Thresholding K-means
clustering algorithm had been applied to
different types of images edges and it had
produced good results. Hill-Climbing
algorithm is easy to implement [1]. Masking
with Watershed has a better performance than
all other algorithms which means it can
perform on real-time applications [18]. Fuzzy
C-Means Clustering try to reduce the
computational time [11].
We have tested all of these algorithms on
Berkeley Segmentation Database [20], the
Peak Signal to Noise Ratio (PSNR) and Mean
Square Error (MSE) have been calculated
between images in order to assess their
segmentation performance. The PSNR is
calculated based on the segmentations of a
color texture image. To be fair and to perform
accurate comparison, we had selected some
images that have been tested by all of these
algorithms and compared the results obtained
by each of them on these images alone. Table
1 presents the results of PSNR and MSE by
each of the five algorithms on each of these
selected images. These results are also
depicted in Figures 1 and 2.

Table 1. MSE and PSNR Performance Comparison of the Five Segmentation Algorithms
Image

Metric(dB)

KMC

FCM

RG

HKM

MWS

Beach

PSNR

58.69

52.58

52.35

57.92

57.92

Bird

MSE
PSNR

0.08
60.18

0.36
54.48

0.15
52.26

0.39
54.89

0.11
59.04

Building

MSE
PSNR

0.06
57.20

0.23
56.53

0.39
52.26

0.21
56.37

0.08
58.08

Car

MSE
PSNR

0.12
57.15

0.14
55.32

0.39
53.58

0.15
55.21

0.10
55.59

Flower

MSE
PSNR

0.12
58.69

0.19
58.10

0.29
51.72

0.20
53.21

0.18
58.93

MSE

0.13

0.08

0.44

0.31

0.08

ISBN: 978-1-941968-37-6 2016 SDIWC

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Table 2. MSE and PSNR Average Performance of


the Five Segmentation Algorithms
Image
PSNR
MSE

KMC
58.38
0.10

FCM
55.40
0.2

RG
52.43
0.33

HKM
55.52
0.25

MWS
57.91
0.11

Table 2 shows that OTSUs adaptive


Thresholding K-means clustering algorithm
(KMC) has the best in PSNR values, and
lowest MSE value, this means that the KMC
algorithm has better performance than the
other four segmentation algorithms.

Figure 2. Histogram of MSE values

Figure 3 presents an example of an image to


be segmented, whereas Figure 4 presents the
result of segmentation of this image produced
by each of the five compared algorithms
[1][18].

Figure 1. Histogram of PSNR values

Figure 3. An Original Image

KMC

FCM

RG

HKM

MWS

Figure 4. Segmentation Results using the five Segmentation Algorithms

5 CONCLUSION AND FUTURE WORK


Extracted color information in image
segmentation is very useful and interesting in
enhancing image analysis during image
preprocessing phase for several applications
such as computer vision and pattern
recognition.
Gray
level
threshold
segmentation is not practical for images that

ISBN: 978-1-941968-37-6 2016 SDIWC

have complex objects, so improving the


methods that perform segmentation on color
image is an important step due to the larger
features of rich information in color images
over gray images.
In this paper, a comparative study between
five different color segmentation algorithms
was performed. Result showed that K-means

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

clustering algorithm do a better job, but still


need to be modified to deal more with
different kind of sharp and smooth edges.
Finally, more work is needed to be done as
future research to come up with a better
performance measures to accurately reflect
the actual differences between various
segmentation techniques.

[10]

[10] N. Otsu, A threshold selection method for


grey level histograms. IEEE Transactions on
Systems, Man and Cybernetics. 1979;9(1):6266.

[11]

T. Saikumar, P. Yugander, P. Sreenivasa, and B.


Smitha, "Colour Based Image Segmentation Using
Fuzzy C-Means Clustering," International
Conference on Computer and Software Modeling
IPCSIT, IACSIT Press, Vol. 14, pp. 180-185,
2011.

[12]

C. Rosenberger, S. Chabrier, H. Laurent, and B.


Emile, (2006). Unsupervised and supervised
image segmentation evaluation. Advances in
image and video segmentation. Y.-J. Zhang, IRM
Press.

[13]

C. Rafael, . Gonzalez,Richard. S. Woods, and S.


Eddins, Digital image processing using
MATLAB,
second
edition,
Gatesmark
Publishing, 2009.

[14]

T. Srikanth, P. Kumar, and A. Kumar, Color


Image Segmentation using Watershed Algorithm,
(IJCSIT) International Journal of Computer
Science and Information Technologies, Vol. 2 (5) ,
2011, 2332-2334.

[15]

R.
Vijayanandh,
and
G.
Balakrishnan,
"Hillclimbing Segmentation with Fuzzy C-Means
Based Human Skin Region Detection using Bayes
Rule," European Journal of Scientific Research
(EJSR), Vol. 76(1), pp.95-107, 2012.

[16]

L. Itti, C. Koch, and E. Niebur, A model of


saliency-based visual attention for rapid scene
analysis. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 20(11):12541259,
November 1998.

[17]

Y. Ma, and H. Zhang, Contrast-based image


attention analysis by using fuzzy growing. In
Proceedings of the Eleventh ACM International
Conference on Multimedia, pages 374381,
November 2003.

[18]

M. Rahman, and M. Rafiqul, "Segmentation of


color image using adaptive thresholding and
masking with watershed algorithm. "Informatics,
Electronics & Vision (ICIEV), 2013 International
Conference on. IEEE, 2013, pp. 1-6.

[19]

Q. Huynh-Thu, and M. Ghanbari, (2008). "Scope


of validity of PSNR in image/video quality
assessment". Electronics Letters 44 (13):
800.doi:10.1049/el:20080522

[20]

D. Martin, and C. Fowlkes, (2001). The Berkeley


segmentation database and benchmark. Computer
Science Department, Berkeley University.
http://www.eecs.berkeley.edu/Research/Projects/
CS/vision/bsds/.

REFERENCES
[1]

[2]

[3]

[4]

[5]

V. Jumb, M. Sohani, and A. Shrivas, "Color


Image Segmentation Using K-Means Clustering
And Otsus Adaptive Thresholding," International
Journal Of Innovative Technology And Exploring
Eng Ineering (IJITEE), Vol. 3, No. 9, Pp. 72-76,
February 2014.
S. Shirazi, N. Khan, A. Umar, M. R. Naz, and B.
AlHaqbani., Content-Based Image Retrieval
Using Texture Color Shape and Region,
(IJACSA) International Journal of Advanced
Computer Science and Applications, Vol. 7, No. 1,
2016, pp. 418-426.
A. Kumar, and P. Kuma, A New Framework for
Color Image Segmentation Using Watershed
Algorithm, Computer Engineering and Intelligent
Systems, www.iiste.or,g ISSN 2222-1719 (Paper)
ISSN 2222-2863 (Online) Vol 2, No.3, pp 41-46.
2010.
L. Yucheng and L. Yubin, "An algorithm of image
segmentation based on fuzzy mathematical
morphology," in International Forum on
Information Technology and Applications,
IFITA'09, pp. 517-520, 2009.
M. Khan,. "A Survey: Image Segmentation
Techniques. "International Journal of Future
Computer and Communication 3.2 (2014).

[6]

F. Monteiro and A. Campilho, "Watershed


framework to region-based image segmentation,"
in Proc. International Conference on Pattern
Recognition, ICPR 19th, pp. 1-4, 2008.

[7]

W. Cui and Y. Zhang, "Graph based multispectral


high resolution image segmentation," in Proc.
International
Conference
on
Multimedia
Technology (ICMT), pp. 1-5, 2010.

[8]

R. Patil and K. Jondhale, "Edge based technique to


estimate number of clusters in k-means color
image segmentation," in Proc. 3rd IEEE
International Conference on Computer Science
and Information Technology (ICCSIT), pp. 117121, 2010.

[9]

A. Fabijanska, "Variance filter for edge detection


and edge-based image segmentation," in Proc.
International
Conference
on
Perspective
Technologies and Methods in MEMS Design
(MEMSTECH), pp. 151-154, 2011.

ISBN: 978-1-941968-37-6 2016 SDIWC

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

AAICS: Aesthetics-Driven Automatic Image Cropping and Scaling


Md Baharul Islam Chen Tet Khuan Muhammad Ehsan Rana Md Kabirul Islam
Asia Pacific University of Technology & Innovation
Technology Park Malaysia, Bukit Jalil 57000, Kuala Lumpur, Malaysia
Daffodil International University, 102 Shukrabad, Dhaka 1207, Bangladesh
{baharulislam.md, dr.tet.khuan, muhd ehsanrana}@apu.edu.my, kislam@daffodilvarsity.edu.bd
ABSTRACT
Due to the availability of digital camera and smart
phones, everyone is a casual photographer. Anyway, the quality of captured photos is questionable due to the lack of photographic composition
knowledge. Professional photographer utilizes this
composition knowledge during their photography,
whereas a casual photographer does not. In this paper, we propose aesthetics-driven automatic image
cropping and scaling (AAICS) approach that enables to recompose our regular photographs based
on the pre-defined composition rules/guidelines,
e.g. rule of thirds and visual balance. Firstly, our
method finds the significant map of the given image using a saliency detection technique. Then,
our system search a cropping window on the significant map based on selected photographic composition rules whose aspect ratio is identical to the
given image. Finally, cropping window rescales to
get the enhanced image. Our method helps the casual photographer to recompose their poorly captured photos to enhance its composition. Experimental results show the effectiveness of our proposed method.

tics1 , one trillion photos were taken in 2015


which is increasing at 16.2% each year due to
the availability of mobile phones with cameras.
Statistics also show that 27,800 and 208,300
photos/minute are uploaded to Instagram, and
Facebook respectively. Most of the photos are
captured by the casual photographers. The
quality of these photos is questionable due to
the lack of composition knowledge of the casual photographers. Spatial arrangement of the
image component is called composition. Professional photographers utilize the composition knowledge in their photography, e.g. rule
of thirds, visual balance composition. However, casual photographers may not be aware of
these composition rules/guidelines during their
photo capture. The quality of the casual photographs is questionable. Research attention is
increasing to find the solution to address the
above research question from last few years.

KEYWORDS
Aesthetics, computational photography, composition rules/guidelines, image cropping and scaling,
image recomposition
Figure 1. Different cropping window overlays on

INTRODUCTION

Nowadays, everyone is a casual photographer.


Even a five year old boy can press the shutter
release button to capture a moment. The popularity of the social media and image sharing
website has been increased remarkably from
the last few years. According to the statis-

ISBN: 978-1-941968-37-6 2016 SDIWC

the image with different aspect ratio. Five color


rectangles represent five cropping window.

The goal of image recomposition tools is to enhance the image quality in terms of aestheticsdriven composition of the given image. Ca1 http://mylio.com/true-stories/next/one-trillion-

photos-in-2015-2

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

sual photographers may utilize these tools to


enhance their poorly taken photos. Image recomposition is a very challenging problem due
to the significant content preservation in the
recomposed image. Many image recomposition methods have been proposed for last few
years. Image cropping is the popular method
due to its artifact free nature and simplicity.
In cropping-based recomposition methods, a
cropping window has been searched for the
given image based on a set of constraints. Figure 1 shows the example of different cropping windows in the given image. All the five
cropping windows retain the significant image content (airplane) in their windows. The
position of the significant content (airplane)
is different from each other. The interactive
and semi-automatic cropping techniques require user interaction to indicate the region of
interest (ROI). With aesthetics enhancement,
an automatic solution is highly required to address the above problem for non-experienced
users. To overcome this limitation, we propose
an automatic aesthetics-driven image cropping
and scaling, namely AAICS method which enables to change the image composition for enhancing the image aesthetics. The core contributions of this paper include:
We propose an automatic method, namely
AAICS that enables to recompose the input image according to the selected photographic composition rules.
Our method enables to retarget the input image to adapt the target display devices, whereas important content can be
preserved. Users may need to initialize
the target image scale prior to optimize
cropping window.
The rest of the paper is organized as follows.
We provide a brief overview of the related previous works and the aesthetics-driven composition rules employed in our proposed work in
Section 2 and 3, respectively. Details work
flow of our proposed method is discussed in
Section 4. The experimental results, comparison of our method with state of the art methods

ISBN: 978-1-941968-37-6 2016 SDIWC

along with the limitation is given in Section 5.


Finally, we conclude our work with some future research directions.
2

RELATED WORKS

State of the art image recomposition methods


can be divided into cropping, discrete, continuous, and hybrid which is utilized more than
one image operators. Recently, Islam et al.
[9] compared the existing image recomposition
methods based on the criteria of image recomposition problem. We are recommending the
readers to go through this survey article for
details. In this section, we only give a brief
summary of the existing image recomposition
methods.
Cropping is the most popular method due to
its distortion free nature. Numerous cropping
methods including gaze-based [18], templatebased [25], rule-based [6], exemplar-based
[17], and recently, learning-based [15, 23, 5]
methods have been proposed. Suh et. al.
[20] first introduced automatic thumbnail cropping. Instead of using a computational saliency
model, Santella et. al. [18] used eye fixation data to identify and compute the ROI of
an image. Similar to thumbnail cropping [20]
method, Stentiford [19] proposed a method
with slight improvement in determining the
significant content. More recent approaches
[25, 15, 23, 17] took a step further by adopting
an aesthetics-driven approach that considered
both the salient content and visual composition
in the cropping process. Greco and Cascia [6]
proposed a saliency based aesthetic cut on images to enhance the subject dominance.
Discrete methods rearrange the patches (group
of pixels) in the image frame to obtain the
recomposed image. Patch transform [4] and
shift-map [16] relocated the image pixels like
a puzzle game and did not consider aesthetic
enhancement during the patch relocation. The
cut-and-paste methods [1, 24, 3] cut one or
more foreground objects from the background
layer. Then, the segmented object(s) are pasted
at optimal position(s) based on a set of composition rules and constraints. Seam carving
[7, 13] maintained the similarity by inserting

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

and deleting seams2 in the image frame. Noticeable artifacts are visible in complex and geometrical background images due to its discrete nature.
Continuous methods utilize the traditional
warping technique to recompose the image content. The non-homogeneous warping method [11] enhanced image aesthetics
by minimizing a set of aesthetic quality errors. Recently, Chang et al. [2] proposed an
exemplar-based warping method to modify the
image composition where prior composition
knowledge is not mandatory. More recently,
Islam et al. [10] applied non-homogeneous
warping to the stereoscopic image domain.
Notably, the noticeable feature distortions are
unavoidable for large scale warping due to over
composition.
Hybrid methods [14, 12, 22] utilized the benefits of more than one image operators. Crop
and retarget [14] used a passive cropping operator and then retargeted the cropping window
to get the aesthetically pleasing image which
was a relatively slow process. To speed up the
total process, crop-and-warping [12] used an
active crop operator to crop the input image
and then applied the non-homogeneous warping [11] on the cropped image. Tearable image warping [22] proposed to unify the cutand-paste [1] and non-homogeneous warping
[11]. This method segmented the foreground
objects, warps the inpainted backgrounds and
then pasted the segmented objects in the optimized background layer. Recently, Tan et al.
[21] improved the single tearable image warping to tearable stereo image warping.

3.1

Rule of Thirds

In this rule, the image frame is divided by two


vertical and two horizontal lines that form four
intersection points, namely power points. The
photographers are suggested to keep the center of mass of most salient object(s) on these
power points. Figure 2 shows the rule of thirds
composition guideline with an example. The
most salient object (yellow flower) is placed
on the right-bottom power point following this
rule, as shown in Figure 2. In our AAICS
method, we minimize the distance between the
power points and the center of salient objects.

Figure 2. Overview of rule of thirds composition with


an example; (left) rule of thirds composition, red DOT
points are representing the four power points, (right) example of a photo that adheres the rule of thirds composition respectively.

Figure 3. Overview of the visual balance composition;


(left) unbalanced composition of two objects (triangle
and rectangle) in the image frame, and (right) visually
balanced composition respectively. Red DOT represents
the visual weight of the two salient objects.

AESTHETICS COMPOSITION

Photographic composition rules can alone enhance the image aesthetics. We select two most
popular composition rules and apply them to
our AAICS method to enhance the image composition of the regular photographs captured.
2A

path of least importance interconnected pixels


from top to bottom in an image is called seam.

ISBN: 978-1-941968-37-6 2016 SDIWC

3.2

Visual Balance

The center of the visual mass of all salient


objects should be placed on the image center by this rule that creates harmony between
salient objects in the image frame. Figure 3
shows an example with an unbalanced and balanced composition for two salient objects (triangle and rectangle). The image center and ob-

10

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Figure 4. An overview of our proposed AAICS algorithm which has mainly three parts; (1) significant map

generation, (2) apply cropping operator based on the generated significant map which follows some pre-defined
composition rules, the red rectangle represents the cropping window in the given image, and (3) rescale the
cropped image to get the recomposed image. The horizontal and vertical yellow lines are representing the rule
of thirds photographic composition.

ject centroid are not located at the same position that creates an unbalanced harmony in the
frame. After minimizing the distance between
the center of visual mass and the image center, the balance of salient objects (triangle and
rectangle) is shown in Figure 3.
4

PROPOSED AAICS ALGORITHM

Our proposed AAICS method aims to enhance the image aesthetics of the regular images which is captured by casual photographers by rearranging the image components.
To maximize the image aesthetics, our proposed AAICS method minimizes a set of aesthetic quality errors, e.g. rule of thirds and
visual balance. Figure 4 shows the overview
and workflow of our proposed AAICS method.
Given an input image, we first compute the significant map using graph-based visual saliency
[8]. The girl is the most important content of
the input image which does not belong to the
rule of thirds composition. In the second stage,
our method minimizes a set of aesthetic quality errors to search an aesthetically pleasing
cropped window. Finally, the cropped window
is rescaled to get the respective enhanced recomposed image.
4.1

Significant Map Generation

N represent the number of pixels in row and


column. A pixel with high saliency value is
generally considered as significant pixel. The
cutoff of the significant content may reduce
the human vision aesthetics. Figure 5 shows
the examples of a set of images with its corresponding significant maps and overlay significant map over the given image. Please note
that we consider saliency of the image is the
corresponding significant map.

Figure 5. Examples of significant map generation.

(left to right) input image, significant map and overlay of significant map to the given images respectively.

We employ the graph based visual saliency [8]


in the given image, I(M N). Where M and

ISBN: 978-1-941968-37-6 2016 SDIWC

11

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

4.2

Aesthetics-driven Cropping Window

We minimized two aesthetic quality errors of


the given image in our AAICS method.
4.2.1

Rule of thirds error

4.3

Rule of thirds composition is the most popular composition in the photography. We already discussed details in Section 3. The rule
of thirds error is defined as the minimization
problem between the most important content
of the image and one of the power points. The
rule of thirds error defined as,

2
E p = D p Oo

(1)

where D p is the power points and Oo centroid


of object o.
4.2.2

Visual balance error

For making harmony in the image component,


visual balance composition plays an important
role. The visual balance error, Evb is defined
as,
Evb = kC(I) C(O)k2

(2)

where C(I) is the image center, C(Oo ) is


weighted centroid of salient objects O. The visual balance error, Evb = 0 for image with only
one salient object.

Figure 6. Search cropping windows that minimize

the aesthetic quality errors; (top) minimize the rule


of thirds of error, the position of the boat is located
at the right power points. (bottom) minimize the
visual balance error. White rectangles represent the
optimized cropping window.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 6 shows the result of cropping window


selection. The selected cropping window minimizes the rule of thirds and visual balance
composition error in the salient content.
Image Scaling

The cropping window is homogeneously


rescaled with respective scaling factors to get
the recomposed image which is the same aspect ratio of the given image.
5

EXPERIMENTAL RESULTS

Our proposed AAICS method is tested on a


system with an Intel core i3 CPU, 2.53GHz
with 8GB memory. The computation time of
our method is about 1-2 seconds to obtain the
aesthetics-driven cropping and scaling results.
The computation time, mainly depends on the
significant map of the input image and the image resolution. An image with large size can
take more computation times.
5.1

Recomposition Results

Figure 7 shows different results of our proposed AAICS method. Our results are visually gaining more appealing compared to the
original images counterpart. The photos of airplane and man (row: 1, 2) do not follow the
rules of thirds composition in the original image, whereas our results relocate the salient
content to the nearest power points to get aesthetically pleasing photos. The location of the
boat in the original image (row: 3) is slightly
far from the right power points. Our recomposed result changes the location of the right
power points accurately which have visually
enhanced its aesthetic value. The landscape
image (row: 4) ideally has no significant object. Our method considers the island as most
salient part of the object instead of the golden
sun. The recomposed result is also pleasing to
the viewers, although the location of the sun is
nearly identical with the original image. For
visual balance, two salient objects (row: 5) are
required to balance its component. The original images seem to be unbalanced due to its

12

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Figure 7. Recomposition results of a set of images; (left to right) original image, graph-based visual saliency

(significant) map, overlay saliency map on the corresponding given image and recomposed image, respectively.
White rectangles represent the optimized cropping window which adheres the rule of thirds and visual balance
composition.

random location. Our result shows more balance and harmony between two men.
Figure 8 shows the comparison of our results
with the state of the art methods. Although,
crop-and-retarget [14] and data-driven cropping [17] generate some promising results, but
these methods may not be able to relocate the
salient objects according to the rule of thirds
composition in their results. Our result is more
convincing to adhere the rule of thirds composition. Figure 9 shows the comparison of
our results with the recent multiple models for
cropping [5]. Both of the recomposed images
enhance the visual aesthetics than the origi-

ISBN: 978-1-941968-37-6 2016 SDIWC

nal images counterpart. Due to unavailability


of objective evaluation tools of the image recomposition methods, its very difficult to tell
which method is good.
5.2

Retargeting Results

Our method enables to generate aestheticsdriven retargeting results by providing the retargeting scale factor whose aspect ratio may
not be identical to the given input image. Figure 10 shows the image retargeting results. The
original image is resized horizontally with two
different scaling factor Sx = 0.6 and Sx = 0.8
scaling factor. Our method is not only en-

13

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Figure 8. Compared our result with the result of the state of the art recomposition methods; (left to right)
original image, result of crop-and-retarget method [14], data-driven cropping method [17] and our proposed
method, respectively.

Figure 9. Compared our results with the results of a recent cropping using triple models [5] method. (left to

right) original images, results of triple models [5] and our results, respectively.

gles) while our method shows the better object


protection.
5.3
Results of image retargeting; (left to
right) original image, 40% and 20% of horizontal
image size reduction.
Figure 10.

abling to retarget the image horizontally but


also vertically and both directions. In Figure
10, we only show the image retargeting horizontally. The significant content (leaf) of the
original image is protected in our retargeting
results. Figure 11 shows the comparison of our
retargeting results with linear scaling. Linear
scaling may not able to protect the salient object, especially during extreme retargeting. All
the given examples show the inability of the
object protection by linear scaling (red rectan-

ISBN: 978-1-941968-37-6 2016 SDIWC

Limitations

Our proposed method has some unavoidable


limitations. The main limitation of our approach is the loss of information, especially
an image with a significant portion of saliency.
However, this is the limitation of all the cropping based methods. The result of our method
completely relies on the significant map generation. Our method may not work well due to
inaccurate significant map. Anyway, any advance saliency detection technique can overcome this limitation. Although cropping methods generate some promising results with sufficient uninteresting background, these methods
suffer loss of information when single or multiple objects are spread out all over the image
frame. Figure 12 shows the failure case of our

14

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Figure 11. Compare our different retargeting results with the results of linear scaling. The reduction of the

image size at 40%, (top) horizontally (Sx = 0.6, Sy = 1), (middle) horizontally and vertically (Sx = 0.6, Sy =
0.6), and (bottom) vertically (Sx = 1, Sy = 0.6). (left to right) original image, result of linear scaling, and ours
respectively. Red rectangles are representing the inability of object protection by linear scaling.

proposed method. The important part of the


salient object (head) is cutoff in our result. The
possible cause may be the full frame significant
content in the original image.
6

CONCLUSION AND FUTURE DIRECTION

In this paper, we present an automatic,


aesthetics-driven image cropping and scaling,
namely AAICS. To maximize image aesthetics, we minimize a set of aesthetic quality errors that is formulated based on two photographic composition rules, namely the rule of
thirds and visual balance. Experimental results
show that our AAICS tool successfully crop
the input image based on composition guidelines and rescale it to enhance its beauty. Com-

ISBN: 978-1-941968-37-6 2016 SDIWC

pared to non-homogeneous warping method,


our method can protect the salient object better
in the global image context. In our future work,
we would explore the aesthetics-driven video
cropping and scaling. The main challenge is to
maintain the motion consistency between two
consecutive frames. The subjective evaluation
of the image recomposition methods is questionable. In the future, an objective evaluation
tool is highly required to compare the state of
the art recomposition methods.
Acknowledgement.
The authors would
like to thank all anonymous photographers
who shared their photos at https://www.
flickr.com. We would also like to thank
Mr. Jacob Sow to for proof reading.

15

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Figure 12. Failure case of our proposed method; (left to right) original image, overlay saliency map to the

original image and our recomposed result respectively.


REFERENCES
[1] S. Bhattacharya, R. Sukthankar, and M. Shah.
A holistic approach to aesthetic enhancement of
photographs. ACM Transactions on Multimedia
Computing, Communications, and Applications
(TOMM), 7(1):121, 2011.
[2] H.-T. Chang, P.-C. Pan, Y.-C. F. Wang, and M.-S.
Chen. R2p: Recomposition and retargeting of photographic images. In ACM international Conference on Multimedia, pages 927930. ACM, 2015.
[3] H.-T. Chang, Y.-C. F. Wang, and M.-S. Chen.
Transfer in photography composition. In ACM International Conference on Multimedia, pages 957
960. ACM, 2014.
[4] T. S. Cho, M. Butman, S. Avidan, and W. T. Freeman. The patch transform and its applications to
image editing. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 1
8. IEEE, 2008.
[5] C. Fang, Z. Lin, R. Mech, and X. Shen. Automatic
image cropping using visual composition, boundary simplicity and content preservation models.
In ACM International Conference on Multimedia,
pages 11051108. ACM, 2014.
[6] L. Greco and M. L. Cascia. Saliency Based Aesthetic Cut of Digital Images. In International Conference on Image Analysis and Processing, pages
151160, 2013.
[7] Y. Guo, M. Liu, T. Gu, and W. Wang. Improving photo composition elegantly: Considering image similarity during composition optimization.
In Computer Graphics Forum, volume 31, pages
21932202. Wiley Online Library, 2012.
[8] J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In Advances in neural information
processing systems, pages 545552, 2006.
[9] M. B. Islam, W. Lai-Kuan, and W. Chee-Onn. A
survey of aesthetics-driven image recomposition.
Multimedia Tools and Applications, pages 126,
2016.
[10] M. B. Islam, W. Lai-Kuan, W. Chee-Onn, and K.L. Low. Stereoscopic image warping for enhancing composition aesthetics. In 2015 3rd IAPR

ISBN: 978-1-941968-37-6 2016 SDIWC

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

Asian Conference on Pattern Recognition (ACPR),


pages 645649. IEEE, 2015.
Y. Jin, L. Liu, and Q. Wu. Nonhomogeneous scaling optimization for realtime image resizing. The
Visual Computer, 26(6):769778, 2010.
Y. Jin, Q. Wu, and L. Liu. Aesthetic photo composition by optimal crop-and-warp. Computers &
Graphics, 36(8):955965, 2012.
K. Li, B. Yan, J. Li, and A. Majumder. Seam carving based aesthetics enhancement for photos. Signal Processing: Image Communication, 39:509
516, 2015.
L. Liu, R. Chen, L. Wolf, and D. Cohen-Or. Optimizing Photo Composition. Computer Graphics
Forum, 29(2):469478, 2010.
J. Park, J.-Y. Lee, Y.-W. Tai, and I. S. Kweon.
Modeling photo composition and its application to
photo re-arrangement. In IEEE International Conference on Image Processing (ICIP), pages 2741
2744. IEEE, 2012.
Y. Pritch, E. Kav-Venaki, and S. Peleg. Shiftmap image editing. In IEEE International Conference on Computer Vision (ICCV), pages 151158.
IEEE, 2009.
A. Samii, R. Mech, and Z. Lin. Data-driven
automatic cropping using semantic composition
search. In Computer Graphics Forum, volume 34,
pages 141151. Wiley Online Library, 2015.
A. Santella, M. Agrawala, D. DeCarlo, D. Salesin,
and M. Cohen. Gaze-based interaction for semiautomatic photo cropping. In ACM Conference on
Human Factors in Computing Systems, pages 771
780. ACM, 2006.
F. Stentiford. Attention based auto image cropping. In Workshop on Computational Attention and Applications (ICVS), volume 1. Citeseer,
2007.
B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs.
Automatic thumbnail cropping and its effectiveness. In ACM Symposium on User Interface Software and Technology, pages 95104. ACM, 2003.
C.-H. Tan, M. B. Islam, L.-K. Wong, and K.-L.
Low. Semantics-preserving warping for stereo-

16

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

[22]

[23]

[24]

[25]

scopic image retargeting. In Pacific-Rim Symposium on Image and Video Technology, pages 257
268. Springer, 2015.
L.-K. Wong and K.-L. Low. Tearable image warping for extreme image retargeting. In Computer
Graphics International (CGI), pages 18, 2012.
J. Yan, S. Lin, S. B. Kang, and X. Tang. Learning the change for automatic image cropping. In
IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pages 971978. IEEE, 2013.
F.-L. Zhang, M. Wang, and S.-M. Hu. Aesthetic
image enhancement by dependence-aware object
recomposition. IEEE Transactions on Multimedia
(TMM), 15(7):14801490, 2013.
M. Zhang, L. Zhang, Y. Sun, L. Feng, and W. Ma.
Auto cropping for digital photographs. In IEEE
International Conference on Multimedia and Expo
(ICME), pages 47. IEEE, 2005.

ISBN: 978-1-941968-37-6 2016 SDIWC

17

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Exploring Concepts and Narratives for First-Person Games

Mohammad Ahmadi, Babak BashariRad and Derrick Jonathan Arrais


Asia Pacific University of Technology and Innovation, Malaysia
L2-5B, Incubator2, Technology Park Malaysia, Bukit Jalil, 57000 Kuala Lumpur, Malaysia
Ahmadi@apu.edu.my , Dr.Babak.Basharirad@apu.edu.my , and D.J.Arrais@hotmail.com

ABSTRACT

1 INTRODUCTION

This paper presents a method of overcoming the


issue of narrative and concept faced by the game
industry. In an age where games can offer more
than just simple graphics and gameplay, many
games from large companies seem to resort to
following a single formula rather than attempting to
diversify their range of games. Why do many
developers prefer to abide by a used gameplay
design? The paper here will do a take on the firstperson genre, a genre that takes place through the
eyes of the player. This paper will examine the
preference of players, to gather data on what makes
them want to play the game.
Video games are favorite pastime for people of all
ages. Ranging from the young child to the middleaged man, games have provided players with
interactive fun as well as provide an interesting
narrative. However, not all video games are made
with fun or interesting story in mind. Narrative and
concept-wise have diversified throughout the years.
Starting from a simple concept of going around
shooting enemies and collecting items, first-person
games have evolved into games with storytelling in
it and going beyond the shooter genre. A major
problem with some of the more popular first-person
games is that they lack any sense of reward or
giving the player any incentive to explore. By doing
a research and developing a project we could
determine a conclusion to check some solutions for
this problem.

Video games are a favorite pastime for the


people of all ages. Ranging from the young
child to the middle-aged man, games have
provided players with interactive fun as well as
provide an interesting narrative. In the 90s, the
first-person genre is normally associated with
shooting games such as Half Life, Doom,
Quake, and a slew of other popular shooter
titles. Nowadays, while the genre is still
associated with shooters, other genres are
mixed in with it as well. Games like Amnesia
utilized the first-person view for immersion in
horror games and the Elder Scrolls series mixes
it with role-playing game [1]. First-person
games have been around since the late 20th
century, and its rise to popularity was
commonly attributed to the games like
Wolfenstein 3D and Doom. While the genre has
evolved to include many other subgenres such
as horror, arena shooters, military shooters, and
son, the core design remains the same. The
game is played through the point of view of the
players character and whatever the character
sees, the player sees.
Much of todays first-person games single
around the multiplayer experience for fun,
usually neglecting the single player narrative in
the process. While a multiplayer game adds
longevity to a games lifespan, not everyone
wants to play a game as the multiplayer only.
Some people prefer to play the game for the
single player experience alone, wanting to relax
more than feel competitive. In a book written
by Rogers (2014), he mentions that the game
world map not only helps the development

KEYWORDS
Narration, First-person Games, RPG, Storytelling,
Artificial Intelligence, Interactive Game, Video
Games.

ISBN: 978-1-941968-37-6 2016 SDIWC

18

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

team to understand the world, but also provide


players with an advantage [2]. In an article by
Rouse III (2001), he mentions that the design of
a good game includes moments where even the
player is able to come up with a solution the
designer has never expected [2]. Moving on the
next article written by Lahti (2013), the game
Battlefield has the option to control the squad,
but choices for squad control is fairly limited
[3]. In another article written by Smith (2012),
the author mentioned that the story is rather
predictable; Despite having different names and
themes, the overall story has changed a little
and players can easily predict how the story
would go next [4].
Narrative and concept-wise, it has diversified
throughout the years. Starting from a simple
concept of going around shooting enemies and
collecting items, first-person games have
evolved into games with storytelling in it and
going beyond the shooter genre [5]. Some have
incorporated other elements such as RPG or
horror to the mix [6]. Nowadays, some of the
more popular first-person games belong to the
shooter genre [5]. Games like Call of Duty and
Battlefield are the top players when it comes to
first-person shooter games, though a game like
Doom has been picking up in popularity [3].
A major problem with some of the more
popular first-person games is that they lack any
sense of reward or giving the player any
incentive to explore. Games like Call of Duty
and Battlefield for example are some of the topselling games of the genre. While there are
many other first-person games out there, these
two are some of the best-selling games. Medal
of Honour used to be in the top spot too, but
has fallen out of favour with a lot of gamers in
recent years.
It can be summarized that the problem that is
being faced here is that the current first-person
games are rather repetitive and dull. They do
not have much variation in terms of gameplay,
they are linear, and the storylines leave much to
be desired. The game can take place in the past,
present, or future, but the essence of the story
always remains the same.

ISBN: 978-1-941968-37-6 2016 SDIWC

2 AIM and OBJECTIVES


The aim of the project was to address the lack
of interesting narratives in modern first-person
games. It also seeks to address the lack of any
sort of interesting gameplay that can provide
players a challenge. With the project done and
tested, it is obvious that there are multiple
issues that need to be looked into and rectified.
Hence, the new aim here is to rectify and fix
existing issues in the project.
In our project we attempted to create a different
narrative and gameplay concepts by introducing
first-person view to a horror game. Instead of a
fast-paced shooter where players get to go in
guns blazing, this is about complex level
designs and some other mechanics to help keep
players on edge and careful. There is an A.I.controlled enemy to provide some challenges to
the player, and it will actively seek out the
player. Aside from that, there is narrative in the
project; a story told which sheds some lore into
the game scenario. This came in the form of
notes that can be picked up. Some present some
background history of the scenario; others are
notes that are directed towards the player. With
the issues identified, they can be looked into.
These are the following objectives for the
completed project:
- To revamp the artificial intelligence
system;
- To rework the map design to work with
the revamped A.I. system;
- To add more interactive elements in the
game;
- To rework the note system.

3 TECHNICAL RESEARCHES
According to Bourg and Seemann (2004), A*
Path-finding is one of the most efficient pathfinding methods to date. It always guarantees
finding the shortest path to the target, but it
does consume some CPU cycles, and may

19

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

cause a slowdown if there are too many units


using it at the same time [7]. This path-finding
method is widely used in the game industry and
almost all games utilize the same method. For
the project, it is a good idea to utilize A*
because there are only a few enemies present in
the game itself. If working side-by-side with an
intelligent A.I., it can surely give the players a
challenge.
Table 1. Similar Systems

Setting
Lighting

Engine
Sound Effects
Combat
Point of View
Enemy Types
Darkness is Handicap
Limited Light Source

Amnesia:
The Dark Descent
Haunted castle
Light comes mostly
from the lantern and lit
torches
HPL
Haunting sounds and
musical cues
None
First-Person
Monsters
Yes
Yes

As shown in Table 1, although having different


combat systems and enemies, the three games
share a lot of similarities. They utilize darkness
as a disadvantage to the player, and the players
ability to light their way through is limited by
battery/lantern oil. They all take place in firstperson point of view, which make it rather
immersive as players cannot look behind them
or peek from corners like a third-person
camera. This not only simulates how an actual
person will experience, but also creates a
challenge and uncertainty.
There is also no doubt that Sounds play an
important role in the atmosphere, filling dread
to the players. Music is played according to the
occasion, so most of the time, players hear only
ambient sounds coming from the environment.
Hence, this does not break the immersion and
distract the player.

ISBN: 978-1-941968-37-6 2016 SDIWC

4 PROBLEMS and SOLUTIONS


It can be summarized that the problem being
faced here is that current first-person games are
rather repetitive and dull. They do not have
much variation in terms of gameplay. They are
linear, and the storylines leave much to be
desired. The game can take place in the past,
present, or future, but the essence of the story
always remains the same. Such a problem can
be attributed to the stifling of creativity by
publishers in favor of a greater bottom line.
Outlast

Afraid of Monsters

Mental asylum
Sparse electrical lighting

Varies
Electrical light sources
and flashlight

Unreal
Ambient sounds and
whispering
None
First-Person
Psychotic people
Yes
Yes

GoldSrc
Dark music and ambient
sounds
Yes
First-Person
Hallucinations
Yes
Yes

The more money they want to earn, the less


likely a development team has the opportunity
to make something truly different. The first part
to the problem is the artificial intelligence
system. By far it is one of the more important
aspects of the game, especially when the game
is a first-person horror survival game. The
enemy A.I. will be the one to provide the player
a challenge when there are no other players to
do so. In this case, there is an A.I. in the game
project, but it does not work particularly well
mainly because it is not designed to register
terrain-based obstacles. So there are two ways
that this can be fixed: either revamp the A.I.
system so that it works with terrain, or
restructure the map.
Another part to this revamping effort is to make
it so the A.I. does not go straight for the player.
Instead, it moves in a set patrol route using

20

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

navigational markers. That way, it can be set up


for future use in stealth mechanics.
The second part of the rationale is linked to the
first point, which is to rework the map design.
Currently, the projects map design utilizes
mostly terrain-based obstacles. With raised
areas serving as obstacles to the A.I., it should
work but the A.I. does not register it. As of
now, it attempts to go straight for the player
without considering the obstacles in front. A
proposed solution to this issue is to rework the
entire map so that it utilizes object-based
obstacles instead. The testing has done in the
test map showed that the A.I. works well with
object-based obstacles instead. Hence, the
proposed solution here is to build the map
object by object so that instead of spending too
much time revamping the A.I. system, time can
be used to build an intricate map. The second
part to this problem is that the map is too
narrow for any sort of stealth-based mechanic.
There are too few places to hide and too narrow
to run past enemies. So the proposed solution is
to widen the map and make more cubby holes
for players to hide in or behind. The third part
to the problem is that it does not have enough
branching paths to really matter. Since the
player cannot interact with the environment to
make their own route, it may as well be a linear
game. The solution here is to allow a degree of
environmental manipulation so that the players
can find their own ways to solve a puzzle
instead of following fixed routes.
The third part is about having a more
interactive game. As of now, the game is
admittedly barebones. There is nothing much
the player can do with the environment except
reading the notes. While there is environment
sound, sound in general does not play a huge
role other than being ambience to the game.
After testing the project and reviewed by some
experts, the proposed solution here will be
connected to the A.I. too. For the A.I. part, it is
proposed that the A.I. will give the ability to
respond to external stimuli such as sight and
sound. If the player is given footstep sounds,
then the A.I. will deviate from its patrol route to

ISBN: 978-1-941968-37-6 2016 SDIWC

investigate the noise. The enemy A.I. will do a


series of vocalizations to indicate its level of
awareness, such as growling softly when
patrolling as usual; Growling faster and louder
when it detects some noise or sees something
unusual, and roaring if it confirms the players
presence. This brings to mind the proposal to
use thrown objects to distract the enemy with.
If its possible, light should also affect the A.I.
Since the player has a flash light, the A.I.
should respond if light falls on them. Given that
the light is from an artificial source, it should
react immediately to it.

Figure 1. Flash light being used by player in the project

The last point is very important. The note


system works as intended but the notes size
should not be too big to obstruct the players
view. The simplest proposed solution here is to
reduce the image of the note so that it fits to, for
example, at the bottom part of the screen. That
way, players can still take note of their
surroundings without having the quickly exit a
note and miss out on the story details.

5 POTENTIAL BENEFITS
The tangible benefits for this project are of
course a more interesting game itself. The game
is developed for those who have an interest not
only in horror games, but also games that
require some thinking as well.

21

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

The intangible benefit of this project is the


satisfaction of playing a different game and
having different experience. They include
having fun with an interesting map layout and
level design, functional artificial intelligence,
and interesting gameplay mechanics. These are
all the intangible benefits that the player can
enjoy.

[3]

C. Watters. 2013. Battlefield 4 Review. [ONLINE]


Available
at:
http:www.gamespot.com/reviews/battlefield-4review/1900-6415517/.

[4]

E. Makuch. 2013. Sequels a "big problem" for horror


genre, says Mikami. [ONLINE] Available at:
http://www.gamespot.com/articles/sequels-a-bigproblem-for-horror-genre-says-mikami/11006411736/.

[5]

P. Hernandez. 2012. Do Modern Shooters Take


Themselves Too Seriously?. [ONLINE] Available
at: http:kotaku.com/5943247/do-modern-shooterstake-themselves-too-seriously.

[6]

R. Rouse III. 2001. Elements of Gameplay.


[ONLINE]
Available
at:
http://www.gamasutra.com/view/feature/131472/ga
me_design__theory_and_practice_.php.

[7]

Bourg, D. M. & Seemann, G. 2004. AI for Game


Developers. 1st ed. Sebastopol: O'Reilly Media, Inc.

6 CONCLUSIONS
In this paper we tried to determine a conclusion
to fix giving the player any incentive to explore
problem. The project has been done and tested,
and issues were found in it that can be rectified
for future improvement efforts. The project
itself works fine and can be improved so that it
better resembles an actual game.
There were initially goals to actually make a
harder gameplay by involving a time limit. The
game gets progressively harder as time goes on,
making it harder and harder for the player to
escape the castle. Compared to the original
grand idea, it resembles some part of it.
However, given enough time and help, the
project can indeed be reworked to be
completely playable, fun, and interactive.
Hopefully, the above solutions can be carried
out and done so that the game project can show
the problems and solutions provided. It is
possible to make more levels, details, and new
types of enemy A.I., but all that will require a
substantial amount of time and manpower to
work on to have this game on market.
REFERENCES
[1]

B. Albert. 2014. Call of Duty: Advanced Warfare


Review.
[ONLINE]
Available
at:
http:www.ign.com/articles/2014/11/03/call-of-dutyadvanced-warfare-review.

[2]

Rogers, S. 2014. Level Up! The Guide to Great


Video Game Design. 2nd ed. West Sussex: John
Wiley & Sons Ltd.

ISBN: 978-1-941968-37-6 2016 SDIWC

22

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Automatic Speech Recognition for the Holy Quran, A Review


Bilal Yousfi1 and Akram M Zeki2
International Islamic University Malaysia
Kuala Lumpur, Malaysia
1
yousfi.bilal@hotmail.fr, 2akramzeki@iium.edu.my

ABSTRACT
Computer science and speech recognition have
enjoyed a long and successful relationship. Speech
recognition has been a useful tool to detect and
record voices. In computer science, a great
challenge is to interpret the speech signals into
purposeful and important data and to develop
algorithms and applications to establish an interface
between human voice signals and the computer.
Significant interest has been raised in speech
processing, especially of the Quran to provide a
second opinion on diagnosis with less error and
higher accuracy and reliability than the results
normally achieved by human experts. This paper
offers an overview of the use of this technology
related to the Holy Quran.

2. BACKGROUND ABOUT ASR


Speech recognition is a process by which a
computer takes a speech signal (recorded using a
microphone) and converts it into words. It is
achieved by following certain steps using speech
recognition system.
The objective of automatic speech recognition is to
extract, characterise, and recognise, the information
about speech identification. The system consists of
three basic stages: pre-processing, feature
extraction, and features classification. Figure 1
demonstrates a basic representation of a speech
recognition system.

KEYWORDS: Automatic Speech Recognition;


tajweed; Quran; HMM; MFCC; CMU Sphinx;
recognition rate.

1. INTRODUCTION
The Holy Quran is the Holy Book or Scripture of
the Muslims. It is believed to be the word of Allah
revealed to His Prophet Muhammad (PBUH). A
Muslim must be careful to read it without making
mistakes. These mistakes may include misreading
words, and incorrect utterance, punctuation, and
pronunciation.
Nowadays, there are many applications related to
the Holy Quran based on the techniques of
Automatic Speech Recognition (ASR). Such
software helps people read the Quran verse
properly, such as detection/correction of specific
pronunciation errors, verification of recitation,
memorising Quran, auto reciter, Quran explorer
and checking the tajweed rules etc. This paper
highlights the progress in automatic speech
recognition for the Holy Quran.

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 1. The basic representation of speech recognition


system.

23

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

A. Pre-processing:
In speech recognition, the first phase is the preprocessing where recording speech signals is passed
through the pre-processing block to remove the
noise contained in the speech signal and separate
voice from unvoiced speech to reduce the group of
attributes which assure only the information that
wants to be conveyed while discarding useless or
irrelevant information. The other main process is
endpoint detection. The start point and end points
are based on both energy and zero-crossing rates. In
this task, the information will be reduced to a group
of attributes which ensures only the intended
information is conveyed and discards useless and
irrelevant information.

B. Feature extraction techniques


The most prominent phase in speech recognition is
feature extraction which is designed to extract
unique, robust, discriminative, and computationally
efficient characteristic from the speech signals [1].
The recognition performance depends on this phase.
The common feature extraction techniques used in
speech recognition are: [2]
Perceptual Linear Prediction (PLP).
Linear Predictive Coding (LPC).
Mel-Frequency Cepstrum Coefficients
(MFCC).
Linear Prediction Cepstral Coefficients
(LPCC).
Discrete Wavelet Transform (DWT).
Linear Discriminant Analysis (LDA).

C. Features classification techniques


There are three approaches for features
classification purposes: [3]
1) Acoustic Phonetic Approach
2) Pattern Recognition Approach
3) Artificial Intelligence Approach

ISBN: 978-1-941968-37-6 2016 SDIWC

Figure 2. Classification techniques in speech


recognition.

Acoustic Phonetic Approach


Acoustic phonetic approach is related to the study
of different sounds and phonemes in the language.
The acoustic phonetic is based on the existence of
finite and exclusive phonemes in spoken language
and these are broadly characterised by a set of
acoustic properties extracted from the speech signal
over time although the acoustic properties of
phonetic units are highly variable, both with
speakers and with neighbouring phonetic units. [2]
Segmentation and labelling phase is the first step in
the acoustic phonetic approach which describes the
acoustic properties of different phonetic units after
the segmentation of speech signals into discrete
region. The next step is the determination of valid
words from the phonetic labels provided by the
segmentation and labelling phase. [2]

Pattern Recognition Approach


The pattern recognition approach has become the
predominant method for speech recognition in the
last six decades. The pattern recognition approach
consists of two phases. The training phase (training
of speech samples) and comparison phase. This
approach utilises a well formulated mathematical
framework and determines consistent speech pattern
representations for reliable pattern comparison from
a collection of labelled training samples using a
formal training algorithm. A speech pattern
representation generally takes the form of a speech
template or a statistical model which is equally
applicable to a sound, a word, or a phrase. During
pattern comparison phase, a direct comparison is

24

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

made between the spoken words to be recognised


with each possible pattern learned in the training
stage in order to determine the identity of the
unknown based on the goodness of match of the
patterns. [4] there are many techniques that can be
used to classify the features such as Hidden Markov
Model (HMM), Dynamic Time Warping (DTW),
Vector Quantisation (VQ) etc.

Artificial Intelligence Approach


The artificial intelligence approach tries to
mechanise the recognition procedure based on the
way a person using his intelligence in order to
visualise, analyse and lastly making a decision on
the measured acoustic features. AI approach is a
hybrid of the pattern recognition approach and
acoustic phonetic approach. In this, there exists
several methods namely Multilayer Perceptron
(MLP), Self-Organising Map (SOM), Time Delay
Neural Network (TDNNs) and Back-propagation
Neural Network (BPNN). [1]

3. PERFORMANCE OF AUTOMATIC
SPEECH RECOGNITION SYSTEM
Word Error Rate (WER) is the widely used metric
to evaluate speech recognition. The general
difficulty of measuring performance depends on the
fact that the recognised word series could have a
different length from the reference word sequence.
The word error rate is derived from the Levenshtein
distance, working at the word level instead of the
phoneme level. WER is defined as the sum of
substitutions (the reference word is replaced by
another word), deletion (a word in the reference
transcription is missed) and insertion (a word is
hypothesised that was not in the reference) words
by the number of reference words. [5]

After calculating the performance of a speech


recognition system, sometimes Word Recognition
Rate (WRR) is used instead:

WRR = 1 WER = 1 (S+D+I) / N

= (H I) / N
Where H = (N-S-D) is the correctly recognised
words.

ISBN: 978-1-941968-37-6 2016 SDIWC

4. RELATED WORK
For several years, great effort has been devoted to
the study of Arabic speech by many researchers.
There already exists a detailed reviews of the
evolution of Arabic ASR. In this paper, only works
related to Holy Quran are highlighted where
several theories have been proposed to evaluate and
obtained high accuracy for Arabic Quran speech
recognition.
In recent years, there has been numerous research
related to Al-Quran or Arabic speech recognition
that uses CMU Sphinx (The Carnegie Mellon
University Sphinx 4 tools were used to train and
evaluate a language model which is a statistical
speaker-independent set of tools using the Hidden
Markov Models (HMM). It provides a more flexible
framework for research in speech recognition.) tools
to recognise the Arabic phoneme. However, Y.
Yekache describes the process of designing a taskoriented continuous speech recognition system for
Arabic, based on CMU Sphinx4, to be used in the
voice interface of Quranic reader. The concept of
the Quranic reader controlled by speech is
presented, the collection of the corpus and creation
of acoustic model is described in detail, taking into
account a specificities of Arabic language and the
desired application. [6]
Another system developed by T. Hassan et al.
demonstrates an analysis and implementation of a
Quranic verses delimitation system using the
sphinx tools. In this study, MFCC and HMM are
used as features extraction to extract verse from the
audio file and feature classier, respectively. This
research discovered that the verse of the holy AlQuran automated delimiter is possible to build. The
system carried two different types of test, where the
first test recorded 85% and 90% of mean
recognition ration for normal Arabic speaking for
short Al-Ikhlas for both females and males,
respectively. Meanwhile, the results from the
second test among 13 professional reciters for
tajweed and tarteel was 90% and 92% of the mean
recognition ration, respectively. The system also
had a problem with the extra noise due to audio file
compression and poor quality during the recording
process. [7]
El Amrani et al. investigated the use of a simplified
set of Arabic phonemes for the Arabic Automatic
Speech Recognition (AASR) using the CMU
Sphinx tools for the Holy Quran. The results

25

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

showed that it is possible to obtain very interesting


recognition accuracy with this simplified list of
phonemes instead of using the Romanization
method which is more elaborate to generate. The
Word Error Rates (WER) obtained from a number
of experiments was 1.5%, even with a very small set
of audio files used during the training phase. [8]
Hafeez et al. proposed the development of speakerdependent live Quranic verses recitation
recognition system using (CMU) Sphinx-4 based on
HMM. The system aims to recognise and evaluates
the accuracy of the recitation of Quranic verses. In
order to generate the acoustic model Hafeez et al.
used transliteration mechanism for the Arabic
language. This system was trained using recitation
from competent reciters and also the users
themselves. After being trained in four different
ways, the experiments showed that the accuracy in
recognition word obtained was 67% for Arabic
word with Arabic alphabet, 96% for transliteration
words with syllables, 94% for transliteration
compound words with syllables, and 81% for
transliteration syllable with syllables. [9]
B. Putra et al. developed a Speech Recognition
System for Quranic Verse Recitation Learning
Software to perform evaluation and correction in
Quran recitation. The system is built on Mel
Frequency Cepstral Coefficient (MFCC) features
and Gaussian Mixture Model (GMM) modelling. In
order to test the reliability and accuracy of
correction, they used an experiment of ten speakers
to read the readings in wrong and right manner for
each. They achieved a 70% for correction accuracy
of the software for pronunciation, 90% of recitation
law and 60% for the combination of pronunciation
and recitation law. [10]
Ibrahim et al. reported the design, implementation,
and evaluation of a research work for developing an
Automated tajweed Checking Rules Engine for
Quranic learning which focuses on recitation rules
of Quran to help learners to recite Al-Quran by
using an interactive way of learning. The system
built based on MFCC features and HMM model.
The system was trained on small Quranic chapter
(Surat Al-Fatihah). Its recognition rate reached
91.95% (ayates) and 86.41% (phonemes). [11]
Muhammad et al. proposed an intelligent system to
help Muslims in the recitation and memorisation of
Quran E-Hafiz. The goal of E-Hafiz is to facilitate
the learning/memorising and minimising errors or
mistakes of the Holy Quran. Users of the system

ISBN: 978-1-941968-37-6 2016 SDIWC

can enhance their skills in recitation and finding


mistakes. The system was built using MFCC and
VQ. They reported achieving a recognition rate
92%, 90%, 86% for man, children, and woman
respectively. Where the system was trained with ten
speakers for each category. [12]
Abor et al. demonstrated a design and
implementation of Quran recognition for the
purpose of memorisation. The system was
implemented by using MFCC for feature extraction
and ANN for classification. The experiment was
done in six verses of Surah Al-Nas. Two types of
data sets were created in this research. The first was
by a fluent reader of Quran to ensure that there are
no errors. The second was created with by
recitations in term of the scope of this study to
determine and recognise the mistakes/errors in
memorisation. The rate obtained using training
without error class was 93%. And 72% accuracy
after using data with missing words which was as
error data where the wrong samples were classified
as true verses. The second experiment was done on
both groups of data; dataset that includes the correct
wires and a group that contains all the wrong verses.
The system achieved a recognition accuracy of 65%
having 100% accuracy for error data and 0%
accuracy for correct data. [13]
Hassan et al. represented the technique of
classification used to recognising Qalqalah Kubra
pronunciation (a tajweed rule). This study described
the use of Multilayer Perceptron to classify the
pronunciation of Qalqalah Kubra using MFCC for
features extraction. The data were recorded from
five words They Achieved results ranged from 95%
to 100%.in this study Hassan, H et al reported that
the uses of Multilayer Perceptron for classification
and MFCC for features extraction to differentiate
between correct and incorrect Qalqalah Kubra
pronunciation. [14]
Another system for Qalqalah checking was
developed by Ismail et al. In this paper, a novel
hybrid MFCC-VQ architecture for Qalqalah
tajweed rule checking was presented. The objective
of this research is to help students revise and recite
Al-Quran properly by themselves and to recognise
the types of bouncing sound in both Qalqalah rule
sugar and kubrah on the five phonemes .
In this paper, Ismail et al. described the MFCC-VQ
procedure to develop the tajweed Rule Checking
Tool. The overall real-time factor obtained in the
proposed method was faster by 86.928%, 94.495%

26

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

and 64.683% than the conventional MFCC


algorithm of real-time factor for male, female, and
children respectively. The overall recognition
accuracy achieved was 83.9%, 82.1%, and 95.0%.
This research reported that the MFCC-VQ achieved
results better than the MFCC in terms of speed
performance. [15]
Abdou et al. developed techniques based on
automatic speech recognition in order to implement
a Computer Aided Pronunciation Learning (CAPL)
System. The HAFSS system was built to teach
recitation of the Holy Quran (tajweed) to students
by accepting their reading of a given verse and then
assessing the quality of their recitation and help
non-native speakers to learn Arabic pronunciation.
To achieve this goal, MLLR was used in Speaker
Adaptation block diagram, which mainly for
adapting acoustic models to each users acoustic
properties, in order to boost system performance.
HAFSS was represented in the form of a linear
lattice that is flexible enough to support error
hypothesis addition, deletion and overlapping of
probable mispronunciation. HAFSS is based on
the tri-phone tied state model HMM. Each state is
modelled by a mixture of Gaussians. The database
used for testing consists of consisted of 507
utterances tested and evaluated by language experts
and labelled with actual pronounced phonemes that
are used to evaluate the system. The experiments
show that the system performance with confidence
able to reach to 96.87% of correct Quranic
recitation, whereas the message based system
performance with confidence is 80.13% accuracy of
correct Quranic recitation. [16]
Mourtaga et al. described a speaker-independent
Quranic recogniser using the tri-phone model. The
TABLE I.

Reference

01

Yekache, Y. et al. (2012). [6]

dataset used for the training and testing contained


the 30th Chapter of the Quran recited by five
famous readers of the Quran. This chapter includes
2000 distinct words. Mourtaga et al. built a
Quranic speaker-independent recogniser based on
the Maximum Likelihood Linear Regression. The
process for adaptation includes global, regression
class tree, and MLLR adaptation. Adaptation is
performed by using the MLLR to estimate a series
of transforms or a transformed model set, which
minimise the mismatch between the current model
set and the adaptation data. The MLLR makes use
of an RCT to group the Gaussians in the model set.
The RCT was used to estimate a set of linear
transformation for the mean and variance
parameters of a Gaussian mixture HMM system.
The system has an obstacle with the recognition
time required because of the large number of the
states of the Hidden Markov Model. The level
of accuracy ranged from 68% to 85%. [17]
Another article presents a new application of
recitation verification done by Wahidah et al. They
proposed a new way to learn reciting Al-Quran
based on correct makhraj. The system uses MFCC
as feature extraction and Mean Square Error (MSE)
as a pattern matching technique. An experiment has
been set up by dataset contain recorded speech from
people aged between 21 and 23 and experts in
makhraj. The system performance measured in
terms of accuracy based on False Reject Rate (FRR)
and Wrong Recognition (WR). The percentage of
FRR for all recitation is 0% where it shows this
system performed with 100% accuracy. On the
other hand, this system has an obstacle for one-tomany recognition where the percentage of WR was
very high due to the range of threshold value. [18]

APPROCHES USED BY PREVIOUS RESEARCHERS.

Feature
Extraction and
Classification
Techniques
CMU Sphinx
Tools

02

T. Hassan. et al. (2007) [7]

CMU Sphinx
Tools

03

El Amrani, M. et al.(2015) [8]

CMU Sphinx
Tools

ISBN: 978-1-941968-37-6 2016 SDIWC

Performance

Not stated
85% and 90% of mean recognition ration
for Arabic speaking (sourate Al-Ikhlas)
for both female and male.
Tajweed and tarteel was 90%and 92% of
mean recognition ration
. The Word Error Rates (WER) obtained
was 1.5%

Real-time
Factor
Non
Reported

Non
Reported

Non
Reported

27

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

04

Hafeez, A. et al. (2014) [9]

CMU Sphinx
Tools

05

Putra, B. et al. (2012). [10]

MFCC
(GMM) modelling

06

Jamaliah Ibrahim, N. et al.


(2013). [11]

MFCC features
and HMM model

07

Muhammad, A. et al. (2012).


[12]

08

Abro, B. et al. (2012). [13]

09

Hassan, H. et al. (2012). [14]

MFCC and VQ
MFCC for feature
extraction
and ANN for
classification
MFCC
MLP

hybrid MFCC-VQ
architecture
10

11

67% for Arabic word with Arabic


alphabets,
96% transliteration word with syllable,
transliteration
94% compound word with syllable,
82% transliteration syllable with syllable.
70% for correction accuracy of the
software for pronunciation, 90% for
recitation law and 60% for combination
of pronunciation and recitation law
91.95 % for ayates
86.41 % for phonemes
The recognition rate obtained was: 92%,
90%, 86% for man, children and woman
respectively.
72-93%
The system Achieved results ranged from
95% to 100%.

83.9%, 82.1% and 95.0% 261 for male,


female and
children respectively

Non
Reported

Non
Reported
Non
Reported
Non
Reported
Non
Reported
Non
Reported
0.156,
0.161 and
0.261 for
male,
female and
children
respectively

Ismail, A. et al. (2014). [15]

MFCC

95.4%, 92.5% and 85.0% for male,


female and
children respectively

HMM

97.58% - 96.96%
System. Performance with
confidence=96.87%

Abdou, S.et al. (2006). [16]


MLLR
Speaker adaptation

12

E. Mourtaga, A. et al. (2007),


[17]

13

A. N. Wahidah, M. et al. (2012)


[18]

Tri-phone/HMM model
MFCC
Mean Square Error
(MSE)

5. CONCLUSION
Researchers focusing on the particular promising
and challenging area of automatic speech
recognition specifically for Quranic verse recitation

ISBN: 978-1-941968-37-6 2016 SDIWC

Message based system.


Performance with
confidence=80.13%
The accuracy of the recognizer ranged
from 68% to 85 %.
FRR for all recitation is 0% where it
shows this system perform 100%
accuracy
WR was very high

1.192,
2.928 and
0.740 for
male,
female and
children
respectively

Non
Reported

Non
Reported
Non
Reported

of the Holy Quran are collectively moving towards


helping Muslims learn and recite the Quran
properly. An effort has been made through this
paper to present an extensive study and progress of
automatic speech recognition for the Holy Quran
and to help the reader understand different aspects

28

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

posed by researches in this area. This is significant


because many researchers have different approaches
to implementing their software, often not realising
there are other points of view. There has been much
research and discussion conducted on using sphinx
tools where the results were acceptable, including
the highest accuracy achieved.
Based on the review conducted, the most relevant
technique used was MFCC. It is the most popular,
effective, prevalent, and dominant for feature
extraction. Others tried to improve recent works by
applying the hybrid approach MFCC-VQ to
accelerate the execution time where the response
time is a very important factor for real-time
application. Features classification, HMM is the
most suitable method used for classification and this
method has been successfully applied by most
researchers in Arabic speech recognition

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

M. R. Gamit, P. K. Dhameliya, and N. S. Bhatt,


Classification Techniques for Speech Recognition:
A Review, vol. 5, no. 2, pp. 5863, 2015.
S. K. Gaikwad, B. W. Gawali, and P. Yannawar, A
Review on Speech Recognition Technique, Int. J.
Comput. Appl., vol. 10, no. 3, pp. 1624, 2010.
R. Dixit and N. Kaur, Speech Recognition Using
Stochastic Approach:, Int. J. Innov. Res. Sci. Eng.
Technol., vol. 2, no. 2, pp. 356361, 2013.
W. Ghai and N. Singh, Literature Review on
Automatic Speech Recognition, Int. J. Comput.
Appl., vol. 41, no. 8, pp. 4250, 2012.
S. T. S and C. Lingam, Review of Feature
Extraction Techniques in Automatic Speech
Recognition, Eng. Technol., vol. 484, no. 2, pp.
479484, 2013.
Y. Yekache, Y. Mekelleche, and B. Kouninef,
Towards Quranic reader controlled by speech,
arXiv Prepr. arXiv1204.1566, 2012.
H. Tabbal, W. El Falou, and B. Monla, Analysis
and implementation of a Quranic verses
delimitation system in audio files using speech
recognition techniques, in Information and
Communication Technologies, 2006. ICTTA06.
2nd, 2006, vol. 2, pp. 29792984.
M. Y. El Amrani, M. M. H. Rahman, M. R.
Wahiddin, and A. Shah, Towards Using CMU
Sphinx Tools for the Holy Quran Recitation
Verification.
A. H. Hafeez, K. Mohiuddin, and S. Ahmed,
Speaker-dependent live quranic verses recitation
recognition system using Sphinx-4 framework, in
Multi-Topic Conference (INMIC), 2014 IEEE 17th
International, 2014, pp. 333337.
B. Putra, B. Atmaja, and D. Prananto, Developing
Speech Recognition System for Quranic Verse
Recitation Learning Software, Int. J. Informatics
Dev., vol. 1, no. 2, 2012.

ISBN: 978-1-941968-37-6 2016 SDIWC

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

D. Raja-Jamilah Raja-Yusof and Fadila Grine, N.


Jamaliah Ibrahim, M. Yamani Idna Idris, Z. Razak,
and N. Naemah Abdul Rahman, Automated tajweed
checking rules engine for Quranic learning,
Multicult. Educ. Technol. J., vol. 7, no. 4, pp. 275
287, 2013.
A. Muhammad, W. M. M. Z. ul Qayyum, S.
Tanveer, A. Martinez-Enriquez, and A. Z. Syed, Ehafiz: Intelligent system to help Muslims in
recitation and memorization of Quran, Life Sci. J.,
vol. 9, no. 1, pp. 534541, 2012.
B. Abro, A. B. Naqvi, and A. Hussain, Quran
recognition for the purpose of memorisation using
Speech Recognition technique, in Multitopic
Conference (INMIC), 2012 15th International, 2012,
pp. 3034.
H. A. Hassan, N. H. Nasrudin, M. N. M. Khalid, A.
Zabidi, and A. I. Yassin, Pattern classification in
recognizing Qalqalah Kubra pronuncation using
multilayer perceptrons, in Computer Applications
and Industrial Electronics (ISCAIE), 2012 IEEE
Symposium on, 2012, pp. 209212.
A. Ismail, M. Y. I. Idris, N. M. Noor, Z. Razak, and
Z.
Yusoff,
MFCC-VQ
APPROACHFOR
QALQALAH TAJWEED RULE CHECKING,
Malaysian J. Comput. Sci., vol. 27, no. 4, 2014.
S. M. Abdou, S. E. Hamid, M. Rashwan, A.
Samir, O. Abdel-Hamid, M. Shahin, and W. Nazih,
Computer aided pronunciation learning system
using speech recognition techniques., in
INTERSPEECH, 2006.
E. Mourtaga, A. Sharieh, and M. Abdallah, Speaker
independent Quranic recognizer based on maximum
likelihood linear regression, in Proceedings of
world academy of science, engineering and
technology, 2007, vol. 36, pp. 6167.
A. N. Wahidah, M. S. Suriazalmi, M. L. Niza,
H. Rosyati, N. Faradila, A. Hasan, A. K. Rohana,
and Z. N. Farizan, Makhraj recognition using
speech processing, in Computing and Convergence
Technology (ICCCT), 2012 7th International
Conference on, 2012, pp. 689693.

29

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Differential Qiraat Processing Applications using Spectrogram Voice Analysis


Muhammad Saiful Ridhwan, Akram M. Zeki and Akeem Olowolayemo
International Islamic University Malaysia, Malaysia
akramzeki@iium.edu.my

ABSTRACT
This study provides a review of efforts made so far
to develop applications that assist in proper Quranic
recitation based on the spectrogram voice analysis.
The voice recorded from users recitation is used to
compare with the sample recitation features
collected from expert reciters that have been stored
in the database. These comparison is utilized to
evaluate the performance result of the various
algorithms that have been applied in Quranic
recitation processing. Before reviewing how to
develop the actual applications, a study was made to
determine the effectiveness of spectrogram voice
analysis. Samples are often collected to measure the
articulation of Arabic letters. The collected samples
are analyzed using Fourier analysis technique. The
sound features samples waveforms are transformed
into spectrums that are basically frequency
representations of the signals. The spectrogram is
used to determine the formant frequency and are
observed for each subject to determine its mean
formant frequency. Subsequently after features
extraction, a classification algorithm is employed to
compare the features extracted in real-time with that
available in the knowledge base. Based on the
results obtained, the processes are often developed
into mobile applications for portability.

KEYWORDS
Fast Fourier Transform, formant frequency, Signal
analysis, spectrogram, voice recognition.

1 INTRODUCTION
Quran is written in Arabic language and for the
general observance of religious duties, it is read
in Arabic by majority of the Muslims as well as
non-Muslims to learn and understand the
content. People who are intent to learn the

ISBN: 978-1-941968-37-6 2016 SDIWC

Quran must possess basic knowledge of Arabic


in order to read, memorize and understand the
Quran. In recent times, the use of technology
has been studied by researchers to assist the
process of learning and memorization of the
Quran. One of the methods is the adoption of
speech recognition techniques that have been
attracting attention since the 1990s [1]. Speech
is considered as a medium for communication
between human being. A look at speech
recognition technology nowadays, would reveal
that many algorithms and methods have been
developed and tested to detect the content of
users speech and have been found to be very
useful in numerous voice recognition
applications. Previous authors such as [2] has
outlined two categories of speech sound vowels and consonants. It is understood that
consonants are produced by compressing the air
inside the lungs whereas the vowel sound is
produced according to the tongue and lips
movements.
As mention earlier, the research effort in testing
and developing speech applications has been
growing considerably, however, often the real
concern on those application is not the
implementation or use of modern technology
but the processing capabilities of the algorithms
for generating and interpreting speech, that
cannot be said to be perfect or completely
effective at the moment.
In recent development, researchers have found
the use of spectrogram to analyze sound. The
spectrogram analysis uses Fast Fourier
Transform (FFT) to decompose the speech for
spectrogram to process the features of the
speech [3] [4]. This technique in voice
processing is to analyze the power spectrum of
a speech sample. If the power spectrum is

30

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

expressed in terms of intensity, one is able to


produce an image called spectrogram, showing
how the power distribution changes with time.
The spectrogram reflects the location of the
formants [4]. Formant is "the spectral peaks of
the sound spectrum of the voice. It is often
measured as an amplitude or peak in the
frequency spectrum of the sound, when using a
spectrogram or a spectrum analyzer. For
different speakers, due to the frequency
differences in their speech, even when uttering
the same word, their major resonance
frequencies will most likely be different when
processed with spectrum analyzer [3] [4]. The
purpose of this write up is to review previous
breakthrough that have been made in applying
spectrogram voice analysis to Quranic
recitation to aid learners in recitation where
there may not be an expert teacher to guide
such learners.
The remaining parts of this paper are organized
thus; the succeeding section reviews the
existing literatures related to the topic of the
discuss. It basically outlines a review of voice
recognition related researches, algorithms and
extraction techniques as well as details such as
algorithm
information,
application
programming interface, data collection and
features extraction procedures as well as data
analysis techniques employed. The final section
concludes the treatise by identifying other areas
for improvement.
2 PREVIOUS WORK
Quran is written in Arabic language and as a
result, over the years, modern Arabic speakers
have used Arabic letters in different ways in
writing and pronunciations of conversational
speech. In order to properly recite the Quran,
people have to practice until precision is
achieved. This proper pronunciation will take
into consideration the rules of Tajweed.
Tajweed, by definition is enhancing, improving,
and becoming excellent in Quran recitation or
Qiraat. Functionally, it means articulating every
letter in the Quranic verses in its correct timing

ISBN: 978-1-941968-37-6 2016 SDIWC

and from its proper point of articulation [5]. For


people who just started to learn the Quran, they
have to understand letters used in the Quran,
the ways to pronounce those letters and practice
often before they can reach fluentness in
recitation. When people start to learn the Quran
they will normally consult a hafiz or hafizah
(who can be understood as teachers). A hafiz
(male) or hafizah (female) is a person that
memorizes the whole Quran and have vast
knowledge in Tajweed (rules and regulations
when reciting the Quran). A hafiz or hafizah
has to practice every day to memorize and to
improve themselves such that they can recite
the whole sacred book without seeing it.
Memorizing verses in the Quran is encouraged
and people who are trying to memorize verses
continues to practice recitation under the
guidance and presence of an expert. Even
though there may be challenges encountered in
learning the Quran, people are still encouraged
to learn as well as to memorize as much as they
can, even a few verses from the Quran. This
shows that an automated aid may be handy to
assist the disparate learners. Most of the time,
people would try to practice independently by
listening to recitation by famous reciters. This
independent learning is to proactively try to
learn with or without the presence of people
[6]. Self-determining learning helps learners to
make informal choices and to take
responsibility for deciding what learners need
to do in order to learn [6]. One of the strategies
used as mentioned earlier is by using
technology such as media content. Using this
strategy can attract people or ease learning to
recite the Quran. Nowadays, the advancement
of mobile technology has opened up a new door
for researchers and developers to create new
type of contents in mobile platform. This
encourages researchers and developers to
develop and test mobile applications such as
Quran speech recognition, Quran reader, and
Quran book in digital format. Those
applications linked with recitation of samples of
the Quran that can be easily retrieved from the
Internet. Looking at the speech recognition

31

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

technology, many algorithms and methods have


been developed and tested for differential
Qiraat processing. However, there is lack of
adequate efforts that have been done related to
mobile applications in the context of Qiraat
voice analysis. Efforts have largely been
concentrated on non-automated applications
that are capable to evaluate the recitation
accuracy of Quranic verses in comparison to
recorded user recitation.
The literature provides information on the area
that research should be focused and conducted
and other necessary issues to achieve an
acceptable level of accuracy. The usual
research objectives of most of the previous
work centers around selection of appropriate
features extraction techniques to build efficient
knowledge base, adoption of feature
classification algorithms to compare features
extracted in real time with features previously
stored in the knowledge base, as well as making
efforts to implement the feature extraction and
classification processes on a mobile application
that will use the technology of spectrogram for
voice analysis suitable for learners to correct
their mistakes during recitation or practice.
The final part is often to study the performance
of every stage in the process of voice analysis
as highlighted. This survey is significant in the
sense that it can be viewed through its
contribution to two major areas, namely, theory
and practice. Theoretically, this study
contributes in evaluating the progress that has
been achieved so far in developing and testing
phases of Qiraat voice recognition. Apart from
that, the challenges and progress made so far in
presenting the process on mobile application
using the tools and techniques available to
provide benefits and ease the practice of all the
people who are relentlessly trying to read
correctly or memorize the Quran.
The previous section outlined the background
of important matters related to conducting
research in video recognition for Qiraat
differentiation. In this section, a study based on
previous research work has been done related to
this research topic. This review is a compilation

ISBN: 978-1-941968-37-6 2016 SDIWC

of report in written forms (journal) on the


research topic which have been written by
researchers who are authorities in this field.
Before a detail review on voice recognition
technology is presented, first, discussion on the
concept of the Quran and Tajweed were
presented. Then, experimental studies from the
literature are presented, which shows the gap
and differences between normal voice analysis
and voice analysis in Quran. As a final point, a
summary is presented in the form of table for
normal voice analysis and voice analysis in
Quran context. Three tables are presented in the
next frames to provide a background of letters
that depict different sounds in Quranic
recitation as well as relevant voice recognition
techniques that have been applied previously.
Table 1 presents basic summary of the letters in
Quran. In Table 2 likewise, the different
symbols and their respective meanings in Quran
recitation are presented. Table 3 on the other
hand provides a summary list of speech
recognition techniques that have been famously
used by researchers. In this table, points that
were outlined are related to numerous types of
voice analysis extractions, classifications and
matching techniques.
2.1 Reciting and Memorizing the Quran.
Quran is continuously been learnt by many
people around the world which include Muslim
and non-Muslim alike. In trying to learn the
Quran, they have to understand the
fundamentals of Quranic alphabets. They have
to pronounce correctly so that they guide
against mistakes that are critical to Quran
recitations. In the process of learning Quran,
people normally consult a hafiz or hafizah to
learn the proper Tajweed of reciting the Quran.
Table 1 below contains the list of letters used in
the Quran. People who beginners in the study
of the Quran have to understand these letters,
the correct ways to pronounce those letters and
practice often before they can perfect the act of
recitation of the Quran.

32

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

2.2 Voice recognition techniques


Voice recognition technologies such as
automatic speech recognition and text-tospeech have been under development since the
early days of computing technology. Voice
recognition development has been carried out

by many researchers since early 1990s [7].


Today, users have access to very powerful,
large-vocabulary speech recognition techniques
for the creation of new analysis methods and
experiment.

Table 1. Letters in Quran [8].

Table 2. Symbols and Meanings in Quran Recitation [8].

ISBN: 978-1-941968-37-6 2016 SDIWC

33

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

The aim of designing and developing Qiraat


voice recognition applications is mainly to
guide and assist the users who wish to practice
reading or memorization of the Quran where
they may not be a hafiz or as independent effort
in addition to what is learnt from a hafiz. With
the use of voice comparison analysis, mobile
applications can be developed to provide
assistance where there is no human experts or
in addition to the efforts of human experts in
reading the Quran. Generally, the application
will have a collection of expert Quran reciters
voice features stored in the database which are
subsequently used to compare with real-time
recitations by learners, to provide responses
based on whether they recited correctly or
incorrectly. Then the comparison will
acknowledge the users level of accuracy. To

achieve the research objectives, researchers


often adopt different stages as described below:
(i) Samples of Quran recitation is collected
from various famous reciters.
(ii) Those collected samples will go through
spectrogram analysis to build a database
of different verses of the Quran.
(iii) The collected spectrogram analysis will
be tested with user recorded recitation in
order to find accuracy of recognition.
(iv) Mobile application will be created to
connect the process together in one single
or integrated system after reasonable level
of performance.
(v) The mobile application is evaluated with
different reciters from the database to
measure
the
performance
and
effectiveness.

Table 3. Summary of Speech Recognition Techniques.


Method

Description

Settings

Principal
Component
analysis (PCA)

Principal component analysis is a useful


statistical technique that has applications in
fields such as face recognition and image
compression, and is a common technique for
finding patterns in data of high dimension [9].

Non-linear feature extraction method, linear map, fast,


eigenvector-based [10].

Independent
Component
Analysis (ICA)

Independent Component Analysis (ICA) is a


statistical technique, perhaps the most widely
used, for solving the blind source separation
problem [11].

Extract voice feature vectors based on these higher order


statistics from natural scenes and music sound. These features
are localized in both time (space) and frequency. Nonlinear
feature extraction method, linear map, iterative non- Gaussian
[12].

Linear
Predictive
coding

Linear predictive coding (LPC) is defined as a


digital method for encoding an analog signal
in which a particular value is predicted by a
linear function of the past values of the signal
[13].

Uses mathematical approximation of the vocal tract


represented by this tube of a varying diameter. At a particular
time, t, the speech sample s(t) is represented as a linear sum
of the p previous samples. Static feature extraction method,
10 to 16 lower order coefficient [13].

Cepstral
Analysis

Cepstral analysis is used to separate the


speech into its source and system components
without any a prior knowledge about source
and / or system [14].

Cepstral analysis is used for transforming the multiplied


source and system components in the frequency domain to
linear combination of the two components in the cepstral
domain. Static feature extraction method and Power spectrum
[10].

ISBN: 978-1-941968-37-6 2016 SDIWC

34

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Mel-frequency
cepstrum
(MFFCs)

MFCC have the ability to transform the


frequency from a linear scale to a nonlinear.
MFCC is based on human hearing perceptions
which cannot perceive frequencies over 1Khz.
MFCC is based on known variation of the
human ears critical bandwidth with frequency
[15].

Have two types of filter which are spaced linearly at low


frequency below 1000 Hz and logarithmic spacing above
1000Hz. A subjective pitch is present on Mel Frequency
Scale to capture important characteristic of phonetic in
speech. Power spectrum is computed by performing Fourier
Analysis [16].

Hidden Markov
Model

Hidden Markov Models (HMMs) provide a


simple and effective framework for modelling
time-varying spectral vector sequences. Each
word is trained independently to get the best
likelihood parameters [15].

Based on mathematical framework and implementation


structure. In terms of the mathematical framework, method's
consistent statistical methodology and the way it provides
straightforward solutions to related problems.
Implementation structure: the inherent flexibility the method
provides in dealing with various sophisticated speechrecognition tasks and the ease of implementation [16].

Artificial
Neural Network

Artificial Neural Network (ANN), also known


as Neural Network (NN). ANN is also mainly
used as feature matching or recognition for
speech processing.

It is normally used to classify a set of features, which


represent the spectral-domain content of the speech (regions
of strong energy at particular frequencies). The features then
will be converted into phonetic-based categories at each
frame. Search is used to match the neural-network output
scores to the target words (the words that are assumed to be
in the input speech), in order to determine the word that was
most likely uttered

Vector
Quantization

VQ codebook consists of a small number of


representative feature vectors, which used as
an efficient means of characterizing speakerspecific features [17].

A speaker-specific codebook is generated by clustering the


training feature vectors of each speaker. In the recognition
stage, an input utterance is vector-quantized using the
codebook of each reference speaker and the VQ distortion
accumulated over the entire input utterance is used to make
the recognition decision [17] [18].

Spectrogram

The interpretation of spectrum to spectrogram


is to show the speech signal varying with time
[3]. Most importantly, spectrogram is used
widely to calculate and analyze speech signal.
The horizontal axis represents time while
vertical axis represents frequency in a
spectrogram [4].

Spectrogram usually encodes amplitude of the signal using


gray color instead of colors [4]. The peaks of the spectrum
are black and the valley is white in gray color. Fast Fourier
Transform (FFT) coding is utilized to extract noise of each
data. The formant frequencies are the average value of the
appropriate phoneme [3].

2.3 Review on speech recognition method


with Quran
In previous sections, all different methods or
approaches have been discussed. Those
methods and approaches are used in general
voice recognition or voice analysis. In this part,
a look into the various studies and researches
conducted related to Quran voice recognition is
discussed. Most of the algorithms are
developed using Matlab and Java development
platform which is tested on a personal

ISBN: 978-1-941968-37-6 2016 SDIWC

computer. In the last part of this review, a


summary that gives an overview of previous
conducted studies in this area is presented.
Studies in [19] and [20] developed a system
using spectrographic extraction method. This
spectrographic analysis is conducted on
different frequency band intensity. The
performance rate of accuracy is 93.33%.
Likewise, study in [21] developed a system
called Arabic Text-to-Speech (TTS). The
system was developed in Human Language

35

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Technologies laboratory of IBM, Egypt. It uses


IBM trainable concatenate speech synthesizer.
The study result is presented in a mean opinion
score for the synthesized speech.
In [22] voice recognition technology for
recognizing Arabic speech using the neural
network was developed. This technique is able
to classify nonlinear problems based on
research that was performed using neural
networks. They proposed a fully-connected
hidden layer, between the input and state nodes
and the output. Furthermore, they investigated
and showed that this hidden layer makes the
learning of complex classification tasks more
efficient. The difference between LPCC (linear
predictive cepstrum coefficients) and MFCC
(Mel-frequency cepstral coefficients) in the
feature extraction process is also investigated.
To test the effectiveness of the system they use
six speakers (a mixture of male and female) in a
quiet environment.
Another work, [23], introduced a new method
where the verses is extracted from audio files
using speech recognition technique. The
technique implemented utilized the Sphinx IV
framework to develop the system. The
developed system uses MFCC features to
extract those verses from audio files. Hidden
Markov Model is used as recognizer to translate
into Arabic word using hash map and breadth
first search, combined with beam search for the
dictionary database.

HMM Toolkit used for Arabic automatic


speech recognition engine was developed by
[24], which can recognize both continuous
speech and isolated words. One of the main
functions included was an Arabic dictionary
composing words for its phones. The speech
feature vectors are extracted using Mel
Frequency Cepstral Coefficients (MFCC).
Subsequently, the training of the engine based
on tri-phones is employed to estimate the
parameters for a HMM. The system was tested
with thirteen Arabian native speakers.
The research work in [10] developed a system
to conduct an automated Tajweed checking
rules engine for Quranic verse recitation. The
feature extraction technique used is the Mel
frequency Cepstral Coefficients (MFCC) which
was utilized to extract the characteristics from
Quranic verses recitation. Similarly, Hidden
Markov Model (HMM) was employed for
classification and recognition purposes. The
researcher noted that the developed system was
able to achieve recognition rate that exceeded
91.95% and 86.41%.
Spectrogram voice analysis uses Fourier
analysis technique. The sound waveform is
transformed into spectrum which is the
frequency representation of the signal. Then
spectrogram is used to determine the formant
frequency. The formant frequency of each
collected signal is determined through
observation on the spectrogram.

Table 2.4: Summary of Speech Recognition Technique used in Quran experiment.


Extraction Method
Spectrographic
Analysis
MFCC
MFCC
LPCC
MFCC
MFCC

Recognition Technique
Spectrographic Analysis based on
different frequency band of intensity
Hidden Markov Model (HMM)
Recurrent Neural Network
(RNN)
Hidden Markov Model (HMM)
Hidden Markov Model (HMM)

MFCC
FFT
MFCC
MFCC

Hidden Markov Model (HMM)


Spectrogram
Not Available
VQ

Another work by [4], conducted a research on


spectrogram voice analysis where it was shown
that second and third formant frequency (F2

ISBN: 978-1-941968-37-6 2016 SDIWC

Performance
93.33%

References
[19][20]

90.2%
MFCC 95.9% 8.6%
LPCC 94.5%- 99.3%
85% - 92%
90.62%, 98.01 % and 97.99% for sentence correction,
word correction and word accuracy respectively.
86.41% to 91.95%
Over 90%
90%-92%
82.1-95%

[21]
[22]
[23]
[24]
[10]
[4]
[25]
[18]

and F3) increased as the articulation is made


through the mouth. Again, [25] developed and
tested a novel system called E-hafiz. The

36

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

system was built based on Mel-Frequency


Cepstral Coefficient (MFCC) technique to
extract voice features from Quranic verses
recitation. Those extracted voice features are
mapped with the data collected during the
training phase, thereby able to identify the
mismatch or mistakes using comparison
analysis.

REFERENCES
[1]

[2]

[3]

3 CONCLUSION & FUTURE WORK


Based on previous studies, it is understood that
every letter in Arabic have different point
formant. This finding could be used to study the
implementation flow that could be crucial for
the mobile application. Considerable samples
should be examined to study the different
formant point. These studies should not be
limited to a type of participant (e.g.
male/female) rather from different sources and
genders. The focus of this treatise is to describe
the background of the approaches to differential
Qiraat processing based on voice recognition
techniques. It provides the basic approaches
and stages involved as well as issues and
challenges related to voice recognition based
Qiraat processing. Highlights of achievements
so far were provided as well as significance and
contribution of the study, specifically with
respect
to
performance
issues
and
implementation on portable mobile devices
which are often utilized by Quranic readers.
This paper reviews the various researches of
spectrogram in mobile platform. Spectrogram
uses FFT to extract data from audio files. FFT
is believed to have a faster transform and
requires less computational power to process a
given amount of information. This therefore has
appealing advantages of energy consumption in
mobile multimedia devices such as smart
phones [26]. Implementation on mobile
platforms are seldom reported in this area
which calls for a review and further research
on implementation, system processes and
interface design. These should be the future
direction in this area.

ISBN: 978-1-941968-37-6 2016 SDIWC

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K.,


and Lang, K. J., 1989, Phoneme recognition using
time-delay neural networks, IEEE Transactions on
Acoustics, Speech and Signal Processing, vol. 37,
pp. 328-339.
Hassan, A., 1984, Linguistik Am Untuk Guru
Bahasa Malaysia, 5thed. Selangor, Malaysia: Fajar
Bakti, 1984.
Dzati, A. R., Salina, A. S., and Aini, H., 2007,
Preprocessing Techniques for Voice-Print Analysis
for Speaker Recognition, The 5th Student
Conference on Research and Development
SCOReD 2007 11-12 December 2007, Malaysia.
Abdul-Kadir, N.A., Sudirman, R. and Safri, N. M.,
2010, Modelling of the Arabic Plosive Consonants
Characteristics based on Spectrogram 2010 Fourth
Asia
International
Conference
on
Mathematical/Analytical Modelling and Computer
Simulation. ISBN: 978-0-7695-4062-7/10, 2010
IEEE.
Harrag, A. and Mohamadi, T., 2010, QSDAS: New
Quranic Speech Database for Arabic Speaker
Recognition, The Arabian Journal for Science and
Engineering, Volume 35, Number 2C.
Shafiqah, S., 2011, Independent Learning of Quran
(ILoQ)-Alphabet, Degree Thesis, Faculty of
Computer Systems & Software Engineering,
University Malaysia Pahang, Pahang, Malaysia.
Oberteuffer, J.A., 1995, Commercial applications of
speech interface technology: An industry at the
threshold. In Proceedings of the National Academy
of Science. Vol. 92, pp. 10007-10010
Ramzi A.H., Omar E.A., 2007. CASRA+: A
Colloquial Arabic Speech Recognition Application".
American Journal of Applied Sciences 4(1):23-32,
2007 Science Publication.
Nidaa, A. A., 2009, Speech Scrambling Based on
Principal Component Analysis, MASAUM Journal
of Computing, October 2009, Vol.1(3).
Jamaliah, N., 2010, Automated Tajweed Checking
Rules Engine for Quran verse Recitation, Master
Thesis, Faculty of Computer Science and
Information Technology, University of Malaya,
Kuala Lumpur, Malaysia.
Ganesh, R. N. and Dinesh, K K., 2011, An
Overview of Independent Component Analysis and
Its Applications, School of Electrical and Computer
Engineering, RMIT University, Australia
J.-H. Lee, H.-J. Jung, T.-W. Lee, and S.-Y. Lee,
1998, Speech feature extraction using independent
component analysis, Proc. ICASSP,
vol. 2,
pp.1249 -1252 1998.
Bradbury, J., 2000, Linear Predictive Coding,
[Online]
Available
at:
http://my.fit.edu/~vKepuska/ece5525/lpc_paper.pdf
retrieved on 25 December 2013.
Sakshat Virtual Labs, n.d., Speech signal processing
laboratory: Cepstral analysis of speech, [Online]
Available
at:
http://iitg.vlab.co.in/?sub=59&brch=164&sim=615&
cnt=6 retrieved on 25 December 2013.
Gales, M. and Young, S., 2007, The Application of
Hidden Markov Models in Speech Recognition,
Cambridge University Engineering Department,
Trumpington Street, Cambridge, Foundations and

37

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

[16]

[17]

[18]

[19]

[20]

[21]

[22]

Trends in Signal Processing, Vol. 1, No. 3 (2007)


195304.
B. H. Juang and L. R. Rabiner, 1991, Hidden
Markov Models for Speech Recognition, Speech
Research Department AT&T Bell Laboratories.
American Statstical Association and the American
Society for Quality Control, 1991, VOL. 33, NO. 3.
Kanade, T., Jain, A., Ratha, N. K, 2005, Audio and
Video Based Biometric Person Authentication, 5th
International Conference, AVBPA 2005, Hilton Rye
Town, NY, USA, July 2005 Proceedings
Ismail, A., Idris, M. Y. I., Noor, M. N., Razak, Z.,
Yusoff, Z. M., 2014 MFCC-VQ Approach for
Qalqalah Tajweed Rule Checking, American
Journal of Applied Sciences 27(4):275-293, 2014.
Bashir, M.S., Rasheed, S.F., Awais, M.M., Masud,
S. and Shamail, S., 2003. Simulation of Arabic
phoneme identification through spectrographic
analysis. Department of Computer Science LUMS,
Lahore Pakistan.
Anwar, M.J., Awais, M.M., Masud, S. & Shamail,
S., Automatic Arabic Speech Segmentation
System. Department of Computer Science, Lahore
University of Management Sciences, Lahore,
Pakistan.
Youssef, A. & Emam, O., 2004, An Arabic TTS
based on the IBM Trainable Speech Sythesizer,
Department of Electronics & Communication
Engineering, Cairo University, Giza, Egypt.
Ahmad, A.M., Ismail, S., Samaon, D.F., 2004,
'Recurrent Neural Network with Backpropagation
through Time for Speech Recognition,' IEEE

ISBN: 978-1-941968-37-6 2016 SDIWC

[23]

[24]

[25]

[26]

International Symposium on Communications &


Information Technology, 2004. ISCIT 04.Volume
1, pp. 98 102.
Tabbal, H., El-Falou, W. and Monla, B., 2006,
'Analysis and Implementation of a Quranic verses
delimitation system in audio files using speech
recognition techniques',In: Proceeding of the
IEEE Conference of 2nd Information and
Communication Technologies, 2006. ICTTA
06.Volume 2, pp. 2979 2984.
Bassam, A. Q. Al-Qatab, Ainon, R. N., 2010,
Arabic Speech Recognition Using Hidden Markov
Model Toolkit (HTK), Software Engineering
Department, Faculty of Computer Science and
Information Technology, University Of Malaya,
Malaysia.
Waqar, M.M., Rizwan, M., Aslam, M., Martinez, E.
A. M.,2010, Voice Content Matching System for
Quran Readers, Ninth Mexican International
Conference on Artificial Intelligence, 2010 IEEE
Hasan, M.R., Jamil, M., Rabbani, M.G. & Rahman,
M.S., 2004, Speaker Identification Using Mel
Frequency Cepstral Coefficients, 3rd International
Conference on Electrical & Computer Engineering
ICECE 2004, 28-30 December 2004, Dhaka,
Bangladesh ISBN 984-32-1804-4 565.
Andersson, M., 2012, A Faster Fourier Transform:
A mathematical upgrade promises a speedier digital
world,
[Online]
Available
at:
http://www2.technologyreview.com/
article/427676/a-faster-fourier-transform/ retrieved
on 25 December 2013.

38

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Execution of an Advanced Data Analytics by Integrating Spark with MongoDB


Ms. C. Vijayalakshmi, M.C.A., M.Phil.,
Department of Computer Science and Engineering, Jubail University College (Female Branch)
PO Box 10074, Jubail Industrial City - 31961, Kingdom of Saudi Arabia
vijimichael50@gmail.com

ABSTRACT
Spark has several advantages compared to other
big data and MapReduce technologies like
Hadoop and Storm.
Spark provides a
comprehensive, unified framework to manage
big data processing requirements with a variety
of data sets that are diverse in nature (text data,
graph data etc.) as well as the source of data
(batch vs. real-time streaming data). Spark
SQL is an easy-to-use and power API provided
by Apache Spark. Spark SQL makes it much
easier reading and writing data to do analysis.
MongoDB Connector for Apache Spark is a
powerful integration that enables developers
and data scientists to create new insights and
drive real-time action on live, operational and
streaming data. This paper demonstrates some
experimentation on the MongoDB Connector
for Apache Spark that how Spark SQL library
can be used to store, retrieve and execute the
structured/semi-structured datasets such as
BSON against the Non-Relational database
MongoDB, an open-source and leading NoSQL
database.
KEYWORDS
Spark SQL, MongoDB, NoSQL databases,
MongoDB Connector for Apache Spark, Data
Analytics with Spark SQL and MongoDB, Data
Mining on NoSQL data

1 INTRODUCTION
In the era of Big data, the Complex data is
growing. Unstructured data will account for
more than 80% of the data collected by

ISBN: 978-1-941968-37-6 2016 SDIWC

organizations. Data increasingly stored in NonRelational databases as shown in Table 1. The


Non-Relational databases (such as MongoDB,
HBASE etc.) are very efficient in the domains
of Big data, Content Management and
Delivery, Mobile and Social Infrastructure,
User Data Management and Data Hub etc. as
MongoDB provides document oriented storage,
index on any attribute, replication, high
availability, Auto-sharding, fast in-place
updates and rich queries.
Table 1. Comparison of Data

Volume
Structure

Database

Schema
Structure

GBs-TBs
Structured

TBs-PBs
Structured, Semistructured
and
Unstructured
Relational
Non-Relational
Datbases:
Datastores:
(Oracle, SQL (Hadoop,
Server,
HBASE,
MySQL etc.) MongoDB etc.)
Fixed
Dynamic/Flexible
DBAApplicationcontrolled
controlled

Apache Spark started as a research project at


UC Berkeley in the AMPLab, which focuses on
big data analytics. Their goal was to design a
programming model that supports a much wider
class of applications than MapReduce, while
maintaining its automatic fault tolerance. In
particular, MapReduce is inefficient for multipass applications that require low-latency data
sharing across multiple parallel operations.
These applications [1] are quite common in
analytics, and include:

39

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Iterative algorithms, including many


machine learning algorithms and graph
algorithms like PageRank.
Interactive data mining, where a user would
like to load data into RAM across a cluster
and query it repeatedly.
Streaming
applications that
maintain
aggregate state over time.

SPARK

Apache Spark is an open source big data


processing framework built around speed, ease
of use and sophisticated analytics. Spark is
written in Scala Programming Language and
runs on Java Virtual Machine (JVM)
environment. Creating and running Spark
programs are faster due to its less read and
write of code. It supports other languages such
as Java, Python and R etc. for developing
applications. In addition to Map and Reduce
operations, it supports SQL queries, streaming
data, machine learning and graph data
processing [2] as shown in Figure 1.

node crash, it can be reconstructed from the


original sources.
Spark Streaming can be used for processing the
real-time streaming data. This is based on micro
batch style of computing and processing. Spark
Streaming provides as abstraction called
DStream (Discrete Streams) which is a
continuous stream of data. DStreams are
created from input data stream or from sources
such as Kafka, Flume or by applying operations
on other DStreams. A DStream is essentially a
sequence of RDDs.
Spark SQL provides the capability to expose
the Spark datasets over JDBC API and allow
running the SQL like queries on Spark data
using traditional BI1 and visualization tools.
Spark SQL can read both SQL and NoSQL data
sources. StreamSQL is a Spark component that
combines Catalyst2 and Spark Streaming to
perform SQL queries on DStreams.
Spark MLlib [3] is Sparks scalable machine
learning library consisting of common learning
algorithms and utilities including classification,
regression, clustering, collaborative filtering,
dimensionality reduction as well as underlying
optimization primitives.

Figure 1. Spark Unified Stack

Spark operates on RDDs (Resilient Distributed


Dataset) which is an in-memory data structure.
Each RDD represents a chunk of the data which
is partitioned across the data nodes in the
cluster. RDDs are immutable and a new one is
created when transformations are applied.
RDDs are operated in parallel using
transformations/actions like mapping, filtering
etc. These operations are performed
simultaneously on all the partitions in parallel.
RDDs are resilient, if a partition is lost due to a

ISBN: 978-1-941968-37-6 2016 SDIWC

Spark GraphX [3] is the new (alpha) Spark API


for graphs and graph-parallel computation. At a
high level, GraphX extends the Spark RDD by
introducing the Resilient Distributed Property
Graph: a directed multi-graph with properties
attached to each vertex and edge. To support
1

Business Intelligence (BI) is a broad category of


computer software solutions that enables a company or
organization to gain insight into its critical operations
through reporting applications and analysis tools.
2
Catalyst Optimizer optimizes the Relational algebra and
expressions. It does Query optimization.

40

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

graph computation, GraphX exposes a set of


fundamental operators (such as subgraph,
joinVertices, and aggregateMessages etc.) as
well as an optimized variant of the Pregel API.
In addition, GraphX includes a growing
collection of graph algorithms and builders to
simplify graph analytics tasks.

3 SPARK SQL
SparkSQL is a distributed and fault tolerant
query engine. It allows users to run interactive
queries on structured and semi-structured data.
Spark SQL can support Batch or Streaming
SQL. It runs SQL/Hive queries, connects
existing BI tools to Spark through JDBC. It
binds Python, Scala, Java and R. Spark SQLs
data source API can read and write
DataFrames3 using a variety of formats such as
JSON, CSV, Parquet, MySQL, HDFS, HIVE,
HBASE, Avro or JDBC etc.
3.1 Capabilities of SparkSQL [5]

Seamless Integration
Spark SQL allows us to write queries inside
Spark programs, using either SQL or a
DataFrame API. We can apply normal
spark functions (map, filter, ReduceByKey
etc) to SQL query results.
Supports variety of Data Formats and
Sources
Data Frames and SQL provide connection
to access a variety of data sources,
including Hive, Avro, Parquet, Cassandra,
CSV, ORC, JSON or JDBC. We can load,
query and join data across these sources.
Hive Compatibility
We neither to make any changes to our data
in existing hive metastore to make it work
with Spark nor to change our hive queries.

Spark SQL reuses the Hive frontend and


metastore, giving full compatibility with
existing Hive data, queries and UDF4s.
Standard Connectivity for JDBC or
ODBC
A server mode provides industry standard
JDBC and ODBC connectivity for Business
Intelligence tools. We can use our existing
BI tools like tableau.
Performance Scalability
At the core of Spark SQL is the Catalyst
optimizer, which leverages advanced
programming language features like
columnar storage and code generation in a
novel way to build an extensible query
optimizer. It scales to thousands of nodes
and multi hour queries using the Spark
engine, which provides full mid-query fault
tolerance.

4 MONGODB
MongoDB is an open-source document
database that provides high performance, high
availability and automatic scaling. MongoDB
obviates the need for an Object Relational
Mapping (ORM) to facilitate development. A
record in MongoDB is a document, which is a
data structure composed of field and value
pairs. MongoDB documents are similar to
JSON objects. The values of fields may include
other documents, arrays and arrays of
documents.
Table 2. Comparison of RDBMS terminology with
MongoDB [9]

RDBMS
Database
Table
Tuple/Row
Column

MongoDB
Database
Collection
Document
Field

4
3

DataFrame is a distributed collection of rows organized


into named columns. It is an abstraction for selecting,
filtering, aggregating and plotting structured data.

ISBN: 978-1-941968-37-6 2016 SDIWC

User-Defined Functions (UDF) is a feature of Spark


SQL to define new Column-based functions that extend
the vocabulary of Spark SQLs DSL to
transform Datasets.

41

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Table Join
Primary Key

Embedded Documents
Primary Key (Default key
_id is provided by
MongoDB itself)

MongoDB stores documents in collections.


Collections are analogous to tables in relational
databases. Unlike a table, however, a collection
does not require its documents to have the same
schema. In MongoDB, documents stored in a
collection must have a unique _id field that acts
as a primary key.
MongoDB stores data
5
records as BSON documents.
4.1 Advantages of MongoDB over RDBMS

Schema less (Number of fields, content


and size of the document can be
differing from one document to
another.)
Structure of a single object is clear
No complex joins
Deep query-ability (MongoDB supports
dynamic queries on documents using a
document-based query language.)
Tuning
Ease to scale
Conversion / mapping of application
objects to database objects is not
needed
Uses internal memory for storing the
(windowed) working set, enabling
faster access of data

5 MONGODB SPARK CONNECTOR

We can also use the connector with the Spark


Shell.
The connector enables developers to build more
functional applications faster and with less
complexity, using a single integrated analytics
and database technology stack. With industry
estimates assessing that data integration
consumes 80% of analytics development, the
connector enables data engineers to eliminate
the requirement for shuttling data between
separate
operational
and
analytics
infrastructure. Each of these systems demands
their unique configuration, maintenance and
management requirements.
Written in Scala, Apache Sparks native
language, the connector provides a more natural
development experience for Spark users. The
connector exposes all of Sparks libraries,
enabling MongoDB data to be materialized as
DataFrames and Datasets for analysis with
machine learning, graph, streaming and SQL
APIs, further benefiting from automatic schema
inference. The connector also takes advantage
of MongoDBs aggregation pipeline and rich
secondary indexes to extract, filter and process
only the range of data it needs.
To maximize performance across large,
distributed data sets, the MongoDB Connector
for Apache Spark can co-locate Resilient
Distributed Datasets (RDDs) with the source
MongoDB node, thereby minimizing data
movement across the cluster and reducing
latency as shown in Figure 2.

The MongoDB Spark


Connector provides
integration between MongoDB and Apache
Spark. With the connector, we have access to
all Spark libraries for use with MongoDB
datasets: Datasets for analysis with SQL
(benefiting from automatic schema inference),
streaming, machine learning, and graph APIs.
5

BSON is a binary representation of JSON documents,


though it contains more data types than JSON.

ISBN: 978-1-941968-37-6 2016 SDIWC

42

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Start the MongoDB Spark Connector, while


starting the Spark shell (with Scala)
Create a BSON file in Spark
Save the BSON file (from Spark) into
MongoDB (as a Collection)
When needed, Load the MongoDB
collection into Spark (as a DataFrame)
Perform required SQL queries on the
DataFrame of Spark
Store the output of the SQL queries (from
the DataFrame of Spark) into MongoDB (as
a Collection)

6.2 Model Implementation:


Figure 2. Using MongoDB Replica Sets to Isolate
Analytics from Operational Workloads

Step (1): To Start Mongo Shell


hduser@ubuntu:~$
start

sudo

service

mongod

6 EXPERIMENTATION
This experimentation demonstrates on how the
MongoDB Connector for Apache Spark can
access to all Spark libraries to analyze
MongoDB dataset [10][11][12].
MongoDB Spark Connector is compatible with
MongoDB 2.6 or later, Apache Spark 1.6.x and
Scala 2.10.x (if using the mongo-sparkconnector_2.10 package)
or
Scala
2.11.x (if using
the mongo-spark
connector_2.11 package). This experimentation
uses the Spark shell with Mongo shell6.
6.1 Proposed Model:
The MongoDB Connector is a plugin for both
Hadoop and Spark that provides the ability to
use MongoDB as an input source and/or an
output destination for jobs running in both
environments.
Start the Mongo shell
6

The mongo shell is an interactive JavaScript interface to


MongoDB and is a component of the MongoDB
package. We can use the mongo shell to query and
update data as well as perform administrative operations.

ISBN: 978-1-941968-37-6 2016 SDIWC

hduser@ubuntu:~$ mongo
MongoDB shell version: 3.2.8
connecting to: test
>

Note: When we run mongo without any


arguments, the mongo shell will attempt to
connect to the MongoDB instance running on
the localhost interface on port 27017.
Step (2): To show Current databases of
MongoDB
> show dbs

Step (3): To Start Scala


hduser@ubuntu:~$ cd /usr/local/scala
hduser@ubuntu:/usr/local/scala$
bin/scala

43

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Step (4): To Start MongoDB Spark Connector


hduser@ubuntu:/usr/local/scala$
cd
../spark
hduser@ubuntu:/usr/local/spark$
bin/spark-shell
--conf
"spark.mongodb.input.uri=mongodb://127
.0.0.1/Sivijaya.primer?readPreference=
primaryPreferred"
\--conf
"spark.mongodb.output.uri=mongodb://12
7.0.0.1/Sivijaya.primer"
\--packages
org.mongodb.spark:mongo-sparkconnector_2.10:1.0.0

Note:
The bin/spark-shell to start the Spark shell.
The --conf option
to
configure
the
MongoDB Spark Connector. These settings
configure the SparkConf object.

The spark.mongodb.input.uri specifies the


MongoDB server address (127.0.0.1), to
connect the database (Sivijaya) and the
collection (primer) from which to read data
with the read preference.
The spark.mongodb.output.uri specifies the
MongoDB server address (127.0.0.1), to
connect the database (Sivijaya) and the
collection (primer) to which to write data.
The --packages option to download the
MongoDB Spark Connector package
mongo-spark-connector_2.10 to use with
Scala 2.10.x

Step (5): To import necessary packages


scala> import com.mongodb.spark._
scala> import org.bson.Document
scala> import com.mongodb.spark.sql._
scala> import
org.apache.spark.sql.SQLContext
scala> val sqlContext =
SQLContext.getOrCreate(sc)

Step (6): To create a BSON file in Spark, We


took a sample JSON file named primerdataset.json7
scala> val mydocs = """
| {"address": {"building":
"1007", "coord": [-73.856077,
40.848447], "street": "Morris Park
Ave", "zipcode": "10462"}, "borough":
"Bronx", "cuisine": "Bakery",
"grades": [{"date": {"$date":
1393804800000}, "grade": "A", "score":
2}, {"date": {"$date": 1378857600000},
"grade": "A", "score": 6}, {"date":
{"$date": 1358985600000}, "grade":
"A", "score": 10}, {"date": {"$date":
1322006400000}, "grade": "A", "score":
9}, {"date": {"$date": 1299715200000},
"grade": "B", "score": 14}], "name":
"Morris Park Bake Shop",
"restaurant_id": "30075445"}
| {"address": {"building": "469",
"coord": [-73.961704, 40.662942],
"street": "Flatbush Avenue",
"zipcode": "11225"}, "borough":
7

https://raw.githubusercontent.com/mongodb/docsassets/primer-dataset/primer-dataset.json

ISBN: 978-1-941968-37-6 2016 SDIWC

44

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

"Brooklyn", "cuisine": "Hamburgers",


"grades": [{"date": {"$date":
1419897600000}, "grade": "A", "score":
8}, {"date": {"$date": 1404172800000},
"grade": "B", "score": 23}, {"date":
{"$date": 1367280000000}, "grade":
"A", "score": 12}, {"date": {"$date":
1336435200000}, "grade": "A", "score":
12}], "name": "Wendy'S",
"restaurant_id": "30112340"}
| {"address": {"building": "351",
"coord": [-73.98513559999999,
40.7676919], "street": "West
57
Street", "zipcode": "10019"},
"borough": "Manhattan", "cuisine":
"Irish", "grades": [{"date": {"$date":
1409961600000}, "grade": "A", "score":
2}, {"date": {"$date": 1374451200000},
"grade": "A", "score": 11}, {"date":
{"$date": 1343692800000}, "grade":
"A", "score": 12}, {"date": {"$date":
1325116800000}, "grade": "A", "score":
12}], "name": "Dj Reynolds Pub And
Restaurant", "restaurant_id":
"30191841"}
| {"address": {"building":
"2780", "coord": [-73.98241999999999,
40.579505], "street": "Stillwell
Avenue", "zipcode": "11224"},
"borough": "Brooklyn", "cuisine":
"American ", "grades": [{"date":
{"$date": 1402358400000}, "grade":
"A", "score": 5}, {"date": {"$date":
1370390400000}, "grade": "A", "score":
7}, {"date": {"$date": 1334275200000},
"grade": "A", "score": 12}, {"date":
{"$date": 1318377600000}, "grade":
"A", "score": 12}], "name": "Riviera
Caterer", "restaurant_id": "40356018"}
| {"address": {"building":
"8825", "coord": [-73.8803827,
40.7643124], "street": "Astoria
Boulevard", "zipcode": "11369"},
"borough": "Queens", "cuisine":
"American ", "grades": [{"date":
{"$date": 1416009600000}, "grade":
"Z", "score": 38}, {"date": {"$date":
1398988800000}, "grade": "A", "score":
10}, {"date": {"$date":
1362182400000}, "grade": "A", "score":
7}, {"date": {"$date": 1328832000000},
"grade": "A", "score": 13}], "name":
"Brunos On The Boulevard",
"restaurant_id":
"40356151"}""".trim.stripMargin.split(
"[\\r\\n]+").toSeq

ISBN: 978-1-941968-37-6 2016 SDIWC

Step (7):
To save the BSON file into
MongoDB collection
scala>
sc.parallelize(mydocs.map(Document.par
se)).saveToMongoDB()

Step (8): To view the stored collection


(primer) of the database (Sivijaya) from
MongoDB
>
>
>
>

show dbs
use Sivijaya
show collections
db.primer.find()

> db.primer.find().pretty()

45

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

scala> val
mydf2=sqlContext.sql("select
restaurant_id,address,grades,borough,c
uisine from primer where borough ==
'Queens' or cuisine == 'American '")
scala> mydf2.show()

Step (9): To load the stored collection (primer)


from MongoDB into Spark (as DataFrame)
scala> val
sample=MongoSpark.load(sqlContext)

Step (10): To Store (or To Register) the


DataFrame into Table
scala>
sample.registerTempTable("primer")

Step (11): To perform SQL queries on the


loaded data (registered as Table) in Spark
scala> val
mydf1=sqlContext.sql("select
restaurant_id,name,borough,cuisine
from primer where cuisine == 'American
'")
scala> mydf1.show()

Step (12): To save the outputs of the (Step 11)


SQL queries from Spark into MongoDB
scala>
mydf1.write.option("collection",
"Query1_output").mode("overwrite").for
mat("com.mongodb.spark.sql").save()
scala>
mydf2.write.option("collection",
"Query2_output").mode("overwrite").for
mat("com.mongodb.spark.sql").save()

Step (13): To view the stored outputs of the


Step 12) SQL queries (as collections) from the
database (Sivijaya) of MongoDB

ISBN: 978-1-941968-37-6 2016 SDIWC

46

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

> use Sivijaya


> show collections

other non-relational database, including tools


from Actuate, Alteryx, Informatica, Qliktech,
and Talend etc.
A range of ODBC/JDBC connectors8 for
MongoDB provide integration with additional
analytics and visualization platforms including
Tableau [13] and others.
At the most
fundamental level, each connector provides
read and write access to MongoDB. The
connector enables the BI platform to access
MongoDB documents, parse them before blend
them with other data. Result sets can also be
written back to MongoDB if required. More
advanced functionality available in some BI
connectors [14] includes integration with the
MongoDB aggregation framework for indatabase analytics and summarization, schema
discovery and intelligent query routing within a
replica set.

> db.Query1_output.find()
> db.Query2_output.find()

8 CONCLUSION

Step (14): To Stop Mongo Shell


hduser@ubuntu:~$ sudo
stop
mongod stop/waiting

service

mongod

7 INTEGRATING MONGODB & SPARK


ANALYTICS WITH BI TOOLS &
HADOOP
Real time analytics generated by MongoDB and
Spark can serve both online operational
applications and offline reporting systems,
where it can be blended with historical data and
analytics from other data sources. To power
dashboards,
reports
and
visualizations,
MongoDB offers integration with more of the
leading BI and Analytics platforms than any

ISBN: 978-1-941968-37-6 2016 SDIWC

The MongoDB Connector is a plugin for both


Hadoop and Spark that provides the ability to
use MongoDB as an input source and/or an
output destination for jobs running in both
environments. The connector directly integrates
Spark with MongoDB and has no dependency
on also having a Hadoop cluster running.
Input and output classes are provided allowing
users to read and write against both live
MongoDB collections and against BSON
(Binary JSON) files that are used to store
MongoDB snapshots. Also, JSON formatted
queries and projections can be used to filter the
input collection, which uses a method in the
connector to create a Spark RDD from the
MongoDB collection.

http://www.simba.com/
https://www.progress.com/datadirect-connectors

47

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

MongoDB natively provides a rich analytics


framework within the database. Multiple
connectors are also available to integrate Spark
with MongoDB to enrich analytics capabilities
by enabling analysts to apply libraries for
machine learning, streaming and SQL to
MongoDB data.

REFERENCES
[1]

http://spark.apache.org/research.html

[2]

http://spark.apache.org/

[3]

https://www.infoq.com/articles/apache-sparkintroduction

[4]

http://www.slideshare.net/SparkSummit/addingcomplex-data-to-spark-stackneeraja-rentachintala

[5]

Poonam Ligade, Processing JSON data using Spark


SQL Engine: DataFrame API, October 2015,
Article in edupristine

[6]

http://www.kdnuggets.com/2015/09/spark-sql-realtime-analytics.html

[7]

http://www.slideshare.net/databricks/sparkdataframes-simple-and-fast-analytics-on-structureddata-at-spark-summit-2015

[8]

https://www.mongodb.com

[9]

http://www.tutorialspoint.com/mongodb

[10]

https://docs.mongodb.com/

[11]

https://www.mongodb.com/press/mongodb-enablesadvanced-real-time-analytics-on-fast-moving-datawith-new-connector-for-apache-spark

[12]

A MongoDB white paper, Apache Spark and


MongoDB Turning Analytics into Real-Time
Action

[13]

http://www.tableau.com/

[14]

http://www.simba.com/webinar/connect-tableau-bigdata-source/

ISBN: 978-1-941968-37-6 2016 SDIWC

48

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Content Based Image Retrieval Using Uniform Local Binary Patterns


Sumaira Muhammad Hayat Khan1, Ayyaz Hussain2
1

Department of Computing and IT, Asia Pacific University, Malaysia


1-5, Incubator -2, Technology Park Malaysia, Bukit Jalil, 57000 Kuala Lumpur Malaysia
sumaira@apu.edu.my
2
Department of Computer Science, International Islamic University Islamabad, Pakistan
Department of computer science, International Islamic University Islamabad, 44000 Islamabad,
Pakistan
ayyaz.hussain@iiu.edu.pk

ABSTRACT

KEYWORDS

This paper proposes a simple yet effective method

Content based image retrieval, feature vector,

for image retrieval based on uniform local binary

similarity measure, and local binary patterns.

patterns. The proposed method is based on


identification of several local binary patterns,
called uniform local binary patterns. These are the
essential properties of image texture and their
histogram is verified to be a dominant texture
feature. A generalized gray-scale and rotation
invariant operator is constructed that identifies

1.0.

INTRODUCTION

Retrieving required images from image


repository (image database) is of great
importance in computer vision. Its application
in almost every field is increasing for the ease

uniform patterns for each spatial resolution and

and convenience of users. The process of

for all quantization of the angular space. Color

image retrieval is based on features extracted

and shape features are also utilized in the

automatically from the images themselves.

calculation of feature vector. Additionally fourier

These systems are thus named as Content

descriptors (FD) and edge histogram descriptors

Based Image Retrieval (CBIR) systems and

(EHD) are computed to extort information at the

have received great attention in the literature

edges thus increasing the performance of the

of image retrieval. CBIR systems have three

system by giving higher precision. Euclidean


distance is used as a similarity measure to find the
distance between query and database image. Our
proposed method demonstrates promising results
for COREL image database compared to several
recent CBIR systems.

ISBN: 978-1-941968-37-6 2016 SDIWC

main tasks named as extraction, selection and


classification [1]. In a large collection of
images it is possible that duplicates of images
exist. There is a similarity in the content and
some parts are shred having different shape,

49

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

color, contrast and position. These similar

descriptors (EHD) are also calculated to

images are called image families. Object

extract maximum information for accurate

recognition and image retrieval are very

retrieval. One of the important texture

important

Many

features i.e. local binary patterns is also

technologies have been established for fast

considered during feature vector calculation.

retrieval, indexing and management of digital

LBP was first introduced in [5] as a

images.

for image

corresponding measure for the local image.

retrieval include text based image retrieval

Local binary pattern is the finest method for

methods but their performance is not good

the extraction of structural properties of

because of the inaccurate description of

texture. Its effectiveness originates from

image content by human language. Also

finding various micro patterns such as points,

manual annotation of images is a time

edges, constant areas etc. Local binary

consuming task because of the large database

patterns have shown remarkable performance

sizes. For the solution to such problems CBIR

in several applications where texture is an

systems have been introduced since early

important feature [6, 7, 8]. At present there

1990. CBIR greatly enhances the accuracy of

are some CBIR systems where different

information retrieval and is an important

versions of LBP features are included. In the

substitute for traditional text based image

proposed technique uniform binary patterns

retrieval [3]. The major problem in image

are

retrieval is to determine the similarity of two

transformation in grayscale, rotation and

images. Properties based on the color, shape

contrast. Main contributions of the proposed

and texture are most commonly used to

technique are listed as follows

in

such

Traditional

images

methods

[2].

measure the similarity. In these systems users


can query easily [4].

calculated

which

is

invariant

to

It is realized in the study that some


image features such as FD and EHD

In this study a CBIR system is proposed

when used in the calculation of feature

utilizing all the key image features such as

vector improve the retrieval efficiency

color, texture and shape to improve the

of CBIR systems.

retrieval efficiency and accuracy. In addition


to the three main categories of image features
fourier descriptors (FD) and edge histogram

ISBN: 978-1-941968-37-6 2016 SDIWC

Uniform local binary patterns are


calculated that are the fundamental

50

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

properties of image texture and are

procedure. This system follows continuous

invariant to grayscale and rotation.

approach using nearest neighbor approach for


image

The operator is further improved by


adding a contrast invariant operator
jointly used to form a powerful tool
for texture analysis.

retrieval.

SIMBA,

CIRES,

SIMPLIcity, Image Retrieval in Medical


Applications (IRMA) and FIRE also use the
same

approach

for

image

retrieval.

VIPER/GIFT system describes the local and

Manuscript is structured as: Section 2

global

explains the related work of CBIR.

The

incorporating color and texture features [12].

proposed system is presented in Section 3.

The structure of the system allows for

Section 4 describes experimentation and

extremely high dimensional feature spaces (>

evaluation

by

80, 000 dimensions) but for each image 3,

comparing it with other methods from

000 to 5, 000 features are active only [13].

literature. Conclusion is given in Section 5

Hierarchal perceptual combination of basic

together with some future instructions.

image features and r elationship

of

proposed

technique

properties

of

the

image

by

between

different features for structure categorization


2.0.

has

RELATED WORK

been

proposed

in

[14].

Vector

QBIC system from IBM [9] is one of the

quantization has been used on image blocks

image retrieval systems that were available in

to codebooks for retrieval and representation

the beginning. It uses color histogram with

inspired from text based techniques and

shape and texture feature. Photobook system

data compression. Another hybrid technique

from Massachusetts Institute of Technology

proposed in [15] uses rectangular blocks for

(MIT) [10] is also one of such systems. Face

segmentation of background or foreground on

recognition technology was also added to this

users

system for searching images of particular

Signifying images as groups of vectors in

persons. Blobworld [11] is another image

RGB-space.

retrieval system is developed at UC Berkley.

proposed in the recent years [17, 18, 19]

Here images are characterized by sections

assume that feature space is a multiple fixed

that

expectation-

in Euclidean space. Design of the interface

segmentation

has been improved by clustering [20]. A

are

maximization

established
(EM)

in
like

query

region

of

number

interest

of

[16].

techniques

renowned technique proposed in [21] is based


ISBN: 978-1-941968-37-6 2016 SDIWC

51

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

on

feature

extraction

at

global

level.

mode and standard deviation are calculated

Advanced color and texture (ACT) features

from the histogram. Different shape features,

are used in the calculation of feature vector.

FDs and EHDs are also added to the feature

A CBIR technique proposed in [22] utilized


local binary patterns, an important texture
feature in the calculation of feature vector.
Two different approaches are proposed based

vector that help in object recognition,


boundary detection etc.

An imperative

texture feature i.e. uniform local binary


patterns are also calculated. The operator
assigns a unique label to each pixel

on the division of image and calculation of


LBP histogram. In the first approach image is

and is invariant to translation and rotation. It

divided into blocks LBP histogram is

functioned well with another operator

calculated from each block while the second

to add contrast invariance to the system.

approach calculated a single histogram of the

These operators are discussed in detail in

query image. Another technique proposed in

section 3.1. Retrieved results are greatly

[23] introduced local derivative patterns

improved by the addition of LBP operator.

(LDP) for indexing and image retrieval of

Feature vectors having 288 different features

CBIR systems. Local tetra patterns LTrPs are

are constructed and stored in excel sheet for

introduced

vector

all the database images. Distance between the

calculation which develops a scheme to

image given to the system and database

calculate n-th order LTrP with (n-1) th order

images is found by calculating the difference

vertical and horizontal derivatives. Compared

between

to LBP which can encode images with two

Euclidean distance.

in

[24]

for

feature

their

feature

vectors

through

different values, LTrPs can encode images


with

four

different

values

extracting

comprehensive information.

3.0.

PROPOSED TECHNIQUE

Image retrieval starts with the feature


extraction process from the whole image.
Color feature is extracted through calculation
of color histogram. Values for mean, median,

ISBN: 978-1-941968-37-6 2016 SDIWC

52

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Step 5: Edge histogram descriptor EHD is

Algorithm
Step 1: Color histogram is created for the
for each

extraction of color feature :


fragment of image , where i=3.

computed from four different edges that are


horizontal, 45 degree, vertical, 135 degree
and one non-directional edge.
Step 6: Euclidean distance calculates the

Step 2: A statistical texture feature is

distance between the query image and


database images by finding the difference

constructed

by

computing

Co-

occurrence matrix C(i, j).

between the feature vector of query image q


and feature vector of database images di .

Step 3: Structural information related to


image texture is added through uniform local

Uniform Local Binary

3.1.

Patterns

binary patterns. The operator


is used for the calculation of local binary
patterns which is represented by the formulas

Uniform local binary patterns are detected at


the spherical zones of every quantization of
angular space. The

explained in section 3.1

operator used is

applicable on a broad range of I members that


th

Step 4: The coordinates of the i pixel on the

are circularly symmetric neighbors on a circle

boundary of a 2D shape with n pixels is given

having radius R. the performance of each

as x|i| and y|i|, thcomplex number is formed

operator is evaluated with specific values of

as

(I,R) and responses of various operators are

z|i| = x|i|+py|i|, FD of this shape is given as

then combined to analyze multi-resolution.

the DFT of Z|i|

Histogram of these uniform patterns is then


constructed over the whole image or a region
of an image. Hence the structural and
(1)

statistical approaches of texture feature are


combined

effectively

where

structural

information is extracted through local binary


patterns

and

statistical

information

is

calculated through occurrence histogram.


Spatial structure and contrast are two

ISBN: 978-1-941968-37-6 2016 SDIWC

53

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

image

values of (I, R) is shown in figure 3. Subtract

structure. Spatial structure is measured well

the center pixels value Sc from all the pixels

important

properties

by the
calculate

the

regarding

operator but it is deprived to

forming a symmetric circular neighborhood

contrast.

= t(sc , s0 - sc, s1 -

Therefore

it

combined with another operator i.e

is
.

Combination of these two operators can work


well

for the rotation invariant texture

classification.

is now a

grayscale, rotation and contrast invariant

sc. sI-1 - sc)

(3)

It is assumed that si sc are independent of sc


and t(sc) does not give any useful information
about texture feature of image.
= t(sc , d(s0 - sc), d(s1 - sc),...
d(sI-1 - sc))

operator for texture feature extraction.

(4)

Where

Texture is defined as a mutual distribution


of gray levels of I image pixels,( I >1) and is
given by equation
= t(sc , s0, . SI-1)

(2)

Where the gray value of center pixel is


denoted by sc and the gray values of pixels on
the circle of radius R is given by s(i =0,.,I1). If the values of the coordinates of sc are

By allocating a binomial factor


sign
into a single

for every

, the equation 3 is converted


figure that illustrates the

three-dimensional arrangement of texture of


an image.

(0,0), then the values of coordinates of si are Rsin(2 i/I)

and

Rcos(2 i/I).

Circular

(5)

arrangement of neighbor pixels for unlike


.

Figure 3: Circular arrangement of neighbor pixels for different values of (I, R)

ISBN: 978-1-941968-37-6 2016 SDIWC

54

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

operator produces 2I dissimilar

circularly symmetric neighboring pixels of

output values; matching with 2I diverse

for 8-bit output is shown in figure 4.

The

binary patterns formed using I pixels in the

For I = 8,

neighboring set of pixels. The value of sc will

values. For instance the first pattern in figure

move consistently along with the boundary of

4 detects a bright spot, forth pattern detects

circle around sc with the rotation of the image.

edges and last pattern detects dark spots and

A uniform rotation invariant patterns for

flat areas [25].

can have 36 distinct

Figure 4: Uniform rotation invariant patterns for circularly symmetric neighboring pixels of

for 8-bit output.

containing vehicle, beach, dinosaur etc.


4.0.

Experiments were performed on 10 randomly

EXPERIMENTAL RESULTS

chosen image queries one from each category.


4.1.

Data Set and Performance


Measures

The performance measures used to find the

Proposed technique is tested on a database of


1,000 natural images from COREL image
database available free on the internet. Images
are saved in JPEG format using size 384
256 or 256 384.The entire database has 10
different

image

categories,

where

each

category contains 100 images. All these


image categories comprise diverse semantics

ISBN: 978-1-941968-37-6 2016 SDIWC

retrieval efficiency of the suggested technique


are constructed on the measures used in
image retrieval. These metrics comprise of
precision and recall. Precision is calculated by
finding the relation of the number of retrieved
relevant images to the total number of
retrieved images

Precision indicates

accuracy and is represented as follows

55

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Precision = R / N, R = Number of retrieved

not give any information about the irrelevant

images that are also relevant and N = Total

images in the database that could also have

number of retrieved images. The best

been retrieved.

precision is the one where each image


retrieved is relevant. However it does not
provide any information declaring whether all
the relevant images are retrieved. Recall is the
ratio of the number of relevant retrieved
images to the total number of relevant images
in the database. Recall signifies completeness
and is represented as follows Recall = R / M,
R = Number of retrieved images that are also
relevant and M = Total number of relevant
images. The best recall is the one where all
image retrieved is relevant. However it does

4.2.
The

Results

proposed

system

is

evaluated

by

comparing its retrieval precision with other


CBIR systems using local binary patterns.
Comparison is given in the table 1 which
shows the average percentage precision
values for five different categories of images
from Corel image database of 1000 images.
Local binary patterns were further worked
upon to improve results and in [26] local
ternary patterns (ICLTP) were introduced.

Table 1: Percentage precision values for 5 different image categories from Corel image database

Image

Proposed

LBP_8

LBP-16

ICLTP -

ICLTP

LDP 8

LDP 16

Category

Method

(%)

(%)

8 (%)

16 (%)

(%)

(%)

(%)
Dinosaur

100

96

98

99

100

96

97

Mountain

67

42

40

57

62

40

44

Horse

96

71

72

92

93

77

76

People

78

60

62

76

77

62

66

Elephant

74

42

46

62

65

54

60

Average

83

62

64

77

79

66

69

Results show that the proposed method

set respectively. It is also shown in figure 5

produces optimal results in comparison with

that

ICLTP. Experiments were performed on 3x3

another CBIR system which introduced local

and 5x5 window of 8 and 16 pixels neighbor

derivative patterns (LDP) [23].

ISBN: 978-1-941968-37-6 2016 SDIWC

our

proposed

method

outperforms

56

averagge precision at 10 images retieved

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

100

90
80

Proposed Method

70

LBP_8

60
50

LBP_16

40

ICLTP_8

30

ICLTP_16

20

LDP_8

10

LDP_16

0
Dinosaur

Mountain

Horse

People

Elephant

image categories

Figure 5: Comparison of retrieval precision of the proposed technique with 6 different methods for 5 distinct image
categories

The technique proposed in this study is

utilized for the creation of feature vector.

assessed by finding the retrieval precision for

Comparison of proposed method with global

10 image categories selected randomly. The

method in terms of average retrieval precision

proposed method is also compared with a

at different number of images retrieved is

global method (GM) proposed in [21] in

shown through graphs in figure 6.

which advanced color and texture features are

(a) Dinosaur

ISBN: 978-1-941968-37-6 2016 SDIWC

(b) Vehicle

57

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

(c) Horse

(d) Flower

(e) Beach

(f) Building

(g) Mountain

(h) People

ISBN: 978-1-941968-37-6 2016 SDIWC

58

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

(i) Elephant

(j) Food

Figure 6: Comparison of the proposed method with proposed global method in terms of average retrieval precision at
different number of images retrieved.

operator is used to

added into it. The


5.0.

CONCLUSION

calculate uniform local binary patterns that

A global level CBIR technique is proposed

are invariant to grayscale transformation and

utilizing all the three basic image features;

rotation. It can also be combined with rotation

color, texture, shape along with FDs and

invariant

EHDs. An essential texture feature has been

of texture of the image. Two operators used in

added to extract and add information related

combination in the experiments produced

to the structural arrangement of pixels.

optimal results for image retrieval.

that illustrates the disparity

Texture property of an image is of two main


types, structural and statistical. Structural

REFERENCES

information is provided by the local binary

[1] Ryszard S. Choras, Image feature

patterns

is

extraction techniques and their applications

for

calculated

and
by

statistical

information

co-occurrence

matrix.

cbir

and

biometrics

systems,

generalized rotation and gray scale invariant

International

operator

Biomedical Engineering, Vol. 1, Issue 1,

has been developed that

identifies uniform local binary patterns in

Journal

Of

Biology

And

2007.

spherical. The feature vector created now is


better off as all the important features are

ISBN: 978-1-941968-37-6 2016 SDIWC

59

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

[2] M. Aly et al., Automatic discovery of

[8] M. Pietikainen et al., View-based

image families: global vs. local features,

recognition of real-world textures, Pattern

International

Recognition, Vol 37, pp. 313323, 2004.

conference

on

image

processing, 2009.
[9] Faloutsos et al., Effcient and effective
[3] I. FelciRajam and S. Valli, A survey on

querying by image content, Journal of

content based image retrieval , Life Science

Intelligent Information Systems, Vol. 3, pp.

Jounal, Vol. 10, Issue 2, pp. 2475-2487,

231- 262, July 1994.

2013.
[10] Pentland et al., Photobook: Content[4] G.Ohashi and Y.Shimodaira, Edge-based

based manipulation of image databases,

feature extraction method and its application

International Journal of Computer Vision,

to image retrieval, International Journal of

Vol. 18, Issue 3, pp. 233 - 254, 1996.

Systemics, Cybernatics and Informatics, vol.


1, no. 5, 2003.

[11] Carson et al., Blobworld: Image


segmentation using expectation maximization

[5] T. Ojala et al., A comparative study of

and its application to image querying, IEEE

texture measures with classification based on

Transaction on Pattern Analysis and Machine

feature distribution, Pattern Recognition,

Intelligence, Vol. 24, Issue8, pp. 1026 - 1038,

Vol. 29, 1996, pp. 5159.

Aug. 2002.

[6] T. Ahonen, Face recognition with local

[12] M. Squire et al., Content-based query of

binary patterns, Lecture Notes in Computer

image databases, inspirations from text

Science, Vol. 3021, pp. 469481 , 2004.

retrieval:

Inverted

weights

and

les,

frequency-based

relevance

feedback,

[7] T. Maenpaa et al., Real-time surface

Scandinavian Conference on Image Analysis,

inspection by texture, Real-Time Imaging,

pp. 143-149, Greenland, June 1999.

Vol. 9, 2003, pp. 289296.


[13] Thomas Deselaers, Image retrieval,
object
models,
ISBN: 978-1-941968-37-6 2016 SDIWC

recognition,
doctoral

and

discriminative

dissertation,

RWTH
60

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

Aachen University, Germany, December


2008.

[19] X. He et al., Learning an image


manifold for retrieval, ACM proceedings of

[14] Q. Iqbal and J. K. Aggarwal, Retrieval

Multimedia, 2004.

by classification of images containing large


manmade objects using perceptual grouping,

[20] R. Datta et al. Content-based image

Pattern Recognition Journal, Vol. 35, Issue 7,

retrieval approaches and trends of the new

pp. 14631479, 2002.

age, Multimedia Information Retrieval, Nov.


2005.

[15] C. Dagli and T. S. Huang, A framework


for grid-based image retrieval, IEEE

[21] S.Soman et al., Content based image

proceedings of International Conference on

retrieval using advanced color and texture

Pattern recognition, 2004.

features,

International

Conference

on

Computational Intelligence, 2012.


[16] C. Theoharatos, N. A. Laskaris, G.
Economou, and S. Fotopoulos, A generic

[22] V.Takala, Block based methods for

scheme for color image retrieval based on the

image retrieval using local binary patterns,

multivariate wald-wolfowitz test, IEEE

Scandinavian Conference on Image Analysis,

Transaction on Knowledge and Data

pp. 88289, 2005.

Engineering, Vol. 17, Issue 6, pp. 808819,


2005.

[23] P.V. N. Reddy and K.S. Prasad, Content


based image retrieval using local derivative

[17] X. He, Incremental semi-supervised

patterns, Journal of Theoretical and Applied

subspace learning for image retrieval, ACM

Information Technology, Vol. 28, Issue 2,

proceedings of Multimedia, 2004.

June 2011.

[18] N. Vasconcelos and A. Lippman,A

[24] T.Prathiba and G. Soniah Darathi, An

multiresolution manifold distance for

efficient content based image retrieval using

invariant image similarity, IEEE Transaction

local tetra pattern, International Journal of

on Multimedia, Vol. 7, Issue 1, pp. 127142,

Advanced Research in Electrical, Electronics

2005.
ISBN: 978-1-941968-37-6 2016 SDIWC

61

Proceedings of the International Conference on Data Mining, Multimedia, Image Processing and their Applications (ICDMMIPA), Kuala Lumpur, Malaysia, 2016

and Instrumentation Engineering, Vol. 2,


Issue 10, October 2013.
[25] T.Ojala, Multiresolution gray-scale and
rotation invariant texture classification with
local binary patterns, IEEE Transactions On
Pattern Analysis And Machine Intelligence,
Vol. 24, Issue. 7, July 2002.
[26] P.V.N. Reddy etal., Color image
retrieval

using

International

mixed
Journal

binary
of

patterns,

Engineering

Sciences Research, Vol 4, Issue 1, 2013

ISBN: 978-1-941968-37-6 2016 SDIWC

62

You might also like