Professional Documents
Culture Documents
Y, xxxx 1
T. Jasperline*
Dr. G.U. Pope College of Engineering,
Pope Nagar, Sawyerpuram, Thoothukudi 628251, India
Email: t.jasperlin@gmail.com
*Corresponding author
D. Gnanadurai
St. Joseph University,
Virgin Town, Ikishe Model Village,
Dimapur, Nagaland-797115, India
Email: j_dgd@yahoo.com
1 Introduction
2 Background
This section presents the related review of literature with respect to dictionary learning.
Basically, the signals possess so much information and thus, it is difficult to extract the
needed information from the signal. Dictionary learning (DL) methods address this issue
by utilising an underlying dictionary, which represents all the possible signals of a class
by means of sparse representation.
Sparse representation is the linear sequence of atoms which are formed from a
dictionary and the dimension of a signal is lesser than the count of atoms. Sparse
representation is one of the popular ways to represent a signal (Bruckstein et al., 2007;
Aharon et al., 2006; Elad and Aharon, 2006; Mairal et al., 2008; Bryt and Elad, 2008).
This potential of DL makes it suitable for several real time applications (Zhang and Li,
2010; Chen et al., 2013). The basic concepts of dictionary learning are presented below.
Consider a group of vectors Vi = {v1, v2, v3, …, vn} where i = (1, 2, 3, …, n). When
these vectors are passed as input to a dictionary learning algorithm such as K-SVD, the
base dictionary is constructed by solving the following equation.
2
( D, φ) = arg min V − Dˆ , φˆ F
subject to γˆi 0
≤ T0 ∀i (1)
Dˆ , φˆ
where γˆi is the ith column of φ̂ and T0 is the sparsity parameter. The columns of φ̂ are
2
the sparse solutions of the vector V. V − Dˆ , φˆ F
is the Frobenius norm and γˆi 0
are the
valued components. There are abundant numbers of algorithms for sparse coding (Engan
et al., 1999; Mallat and Zhang, 1993). The main objective of sparse coding is to detect the
representation vectors for every Vi, by which the dictionary is formed and is updated
column by column.
Motivated by the dictionary learning techniques and the scarcity of CBIR systems for
fabric images, this paper presents a content-based fabric image retrieval (CBFIR) system
by dictionary learning approach, which omits the training process. The following sections
elaborate the proposed CBFIR.
4 T. Jasperline and D. Gnanadurai
This section presents the overall view of the proposed approach and elaborates all the
sub-phases involved in the system.
Figure 1 Overall flow of the proposed work (see online version for colours)
The dictionaries are built for each and every cluster by DK-SVD technique, which is
proven to have better discrimination ability. The clusters revise the dictionaries by
DK-SVD algorithm and then the dictionaries modify the clusters by SOMP algorithm.
This procedure continues until the clusters converge. In the testing phase, the test image
is compared with the dictionaries and the clustered images being linked with the
dictionary are compared by means of a similarity measure such that the relevant images
are retrieved. The following subsections elaborate all the phases involved in the proposed
approach.
preserves the edges. This operation is carried out for every 11 × 11 pixels, in which the
median value of the pixels are computed and the value of the centre pixel is modified
accordingly. This way of pre-processing smoothen the fabric image. The two important
features of fabric images are colour and texture. The fabric images have to be
pre-processed in such a way that the colour feature can be extracted successfully. This
work converts the fabric image from red green blue (RGB) to hue saturation value (HSV)
colour model because, the RGB information are concentrated. On the other hand, HSV
model distributes the colour in a consistent manner, which supports in colour feature
extraction. The process of feature extraction is presented in the forthcoming subsection.
12
⎛ 1
∑ ( prx, y − α rn, y )2 ⎞⎟
Np
β rn, y = ⎜ y = H , S, V (3)
⎝ Np i =1
⎠
13
⎛ 1
∑ i =1 ( prx, y − βrn, y ) ⎞⎟
Np 3
skrn, y = ⎜ y = H , S, V (4)
⎝ Np ⎠
In equations (2)–(4), prx,y is the probability of the yth colour channel at the xth place of a
fabric image. Np is the total count of pixels in region and rn is the count non-overlapping
regions of the images. The size of the colour feature vector being extracted by the colour
moment (CM) is given by
FCM size = rn × nCM × ncc (5)
where nCM and ncc are the count of colour moments and colour channels respectively. The
feature vector formed by colour moments is given by
The above equation states that the colour moments are calculated for all the colour
channels in order to improve the accuracy rate. The overall algorithm for the proposed
work is given below:
CBFIR algorithm
Input: Fabric image database
Output: Set of images with respect to the query image
Begin
//Image pre-processing
1. Denoise the images by median filter;
2. Perform image enhancement;
//Feature extraction
3. Extract colour features by colour moments;
4. Extract texture features by GIST descriptor;
Form feature vector;
//Clustering
5. Produce initial clusters based on feature vector by FCM;
6. Construct dictionary for the cluster by D-KSVD algorithm;
7. Update cluster for each dictionary by SOMP algorithm;
8. Repeat steps 5 to 8 until the clusters converge;
//Image retrieval
Get query image;
Compare the query image with the image cluster by searching the associated dictionary;
Return the results;
End
The texture feature is extracted by GIST feature descriptor, which is known for its
multi-directional and multi-scalability functions. The functionality of GIST feature
descriptor is based on Gabor filter. Thus, the texture features extracted by GIST are
compact and robust. Consider a greyscale image g(x, y) of size m × n and is divided into
several blocks of size bk × bk such that bBL = bk × bk. A single grid is denoted as
m n
Ki (1, 2, …, bBL) of size m′ × n′. m′ = , n′ = . The Gabor filter with bn channels
bk bk
convolutes the image by bn = sc × b; where sc and b are the scale and direction. The GIST
feature is given by
GDiK ( x, y ) = cat ( g ( x, y ) × f scb ( x, y ) ) ( x, y ) ∈ K i (7)
bn
where GDK’s dimension is given by bn × m′ × n′. GDK is the given by GDK ⊆ GDI and
GDI is given by
The dimension of GDI is given by bn × bG. The fabric images are decomposed into
several blocks of size 4 × 4. The GIST features are extracted in four scales and
eight orientations respectively, which means 4 × 8 = 32 features.
∑ ∑
np CLCN 2
{CL1 , CL2 , … , CLCN } = arg min
i =1 j =1
μijf xi − yi (9)
{CL1′ ,CL2′ ,…,CLCN }
In the above equation, np is the total count of pixels and CLCN is the total number of
clusters. μ and f are the fuzzy membership value and fuzzy factor respectively. xi is the ith
pixel of the fabric image and yj is the centroid of the jth cluster. The CLi possesses the
cluster members in the matrix format, which is associated with the ith cluster.
As soon as the initial clusters are generated, the dictionaries are built. For instance,
whenever the initial clusters {CL1, CL2, …, CLCN} are built, the dictionaries {D1, D2, …,
DCN} are formed by D-KSVD algorithm. D-KSVD algorithm is the enhancement of
K-SVD algorithm. The dictionaries are built by
⎛ Y ⎞ ⎛ D ⎞
( Di , Wi , α i ) = arg min ⎜⎜ ⎟⎟ − ⎜⎜ ⎟⎟ ∗ α subject to α 0 ≤T (10)
D ,W ,α ⎝ γ × H ⎠ ⎝ γ ×W ⎠ 2
where D is the dictionary, Y is the matrix of the input signal, W is the linear classification
parameter, α is the sparsity coefficient and γ controls all the associated terms. The
dictionaries are upgraded by the following equation by SOMP algorithm.
( d k , α k ) = arg min uk − emi ∗ α k 2 (11)
d k ,α k
F is the Frobenius norm. Whenever a query image qim is passed, the CBFIR system finds
the cluster of images that is similar to the query image by searching the associated
dictionary and is done as follows.
2
ρ = arg min qim − Dω 2
subject to ω 0 ≤ T0 (13)
ω
When the cluster that is similar to the query image is detected, then the relevance of the
cluster images with respect to the query image is computed by a similarity metric. The
similarity measures employed by this work are Euclidean distance, Jaccard distance and
8 T. Jasperline and D. Gnanadurai
Mahalanobis distance. Thus, the proposed work is unsupervised, though suitable for all
kinds of images.
4 Experimental analysis
This section presents the experimental results of the proposed approach. The proposed
work is tested with the fabric dataset that contains nearly 500 images
(http://www.textures.com/browse/lace-trims/23436). The experimental analysis is carried
out in MATLAB environment on a computer with 4 GB RAM. Some of the categories of
the images are Camouflage, carpet, lace trims, leather, patterned, Persian carpets, plain
fabric, wicker, wool, wrinkles and so on. The sample fabric images are presented in
Figure 2.
The proposed CBFIR is compared by varying the clustering, feature extraction techniques
(with colour, texture and colour + texture) and similarity measures in terms of precision
and recall. Additionally, the accuracy of the proposed work is measured by varying the
dictionary size. The sample image retrieval results of the proposed work are given in
Figure 3.
The figures enclosed in the black box are the query or test images and the figures
enclosed in the grey coloured box are the results returned by the CBFIR system. The time
consumption of the proposed work is analysed and compared with the colour histogram
and image segmentation (CHIS) technique (Shekhovtsov et al., 2008; Yousefi et al.,
2012; Karadağ and Vural, 2014; Cho and Bui, 2014; Sztandera et al., 2013; Pojala and
Somnath, 2014; Walia et al., 2014). CHIS works by extracting the colour and texture
features of fabric images by colour histogram (CH) and Markov random model
respectively. The similarity between the query image and the database is computed by
Euclidean distance. This technique shows good range of precision but lower recall rates
(Wang et al., 2014; Solís et al., 2014; Guo et al., 2014; Zhang et al., 2014).
Content-based fabric image retrieval system 9
Precision: precision is computed by computing the ratio of the count of retrieved relevant
images to the actual number of images that are retrieved. This can be represented as
follows:
total relevant images retrieved
prec = (14)
total images retrieved
Recall: recall is the standard measure of a CBIR system, which is the total count of
retrieved relevant images to the total count of relevant images in the image dataset
(Jasperlin and Gnanadurai, 2016).
total relevant images retrieved
rec = (15)
total relevant images
The experimental results of the proposed work are presented through Figures 4–7.
Figure 4 K-means clustering with different similarity measures (see online version for colours)
10 T. Jasperline and D. Gnanadurai
Initially, k-means clustering is employed to form the initial clusters and the similarity
measures are varied for analysis. The precision and recall rates are not convincing, as it
shows only 68.7 and 49.6. As the k-means algorithm cannot prove its ability, FCM is
planned to be employed for the purpose of forming initial clusters. The precision and
recall rates being shown by the FCM is presented below.
Figure 5 FCM clustering with different similarity measures (see online version for colours)
The FCM clustering algorithm is then employed to form initial set of clusters and this
algorithm proved better results, when compared to the k-means algorithm. It is observed
that FCM with Euclidean distance performs well than with other distance measures. The
maximum precision and recall rates of FCM with Euclidean distance are 93.2 and 82.7
respectively.
Figure 6 Analysis by varying the feature extraction techniques (see online version for colours)
Figure 6 presents the precision and recall rates of various feature extraction techniques.
This graph is presented in order to show the importance of combining the colour and
texture features. Here, the colour features and texture features are extracted by colour
moments and GIST descriptor respectively. The fabric images are well-described by
colour and texture features. Thus, the colour and texture features cannot serve well
independently, as the images being handled are fabric images. The highest precision and
Content-based fabric image retrieval system 11
recall rates are met, when the colour and texture features are combined together. The next
result presents the accuracy rate with respect to the size of the dictionary.
Figure 7 Accuracy rate w.r.t the dictionary size (see online version for colours)
The column size of the dictionary is varied between 60 and 85 and the accuracy rate is
measured. In order to find the optimal size of the dictionary for this work, the dictionary
size is varied. It is found that the optimal size of the dictionary is 65, as the maximum
accuracy rate is attained here. Finally, the time consumption of the proposed work is
analysed and compared with the existing CHIS technique. The experimental results are
presented in Table 1.
Table 1 Time consumption analysis w.r.t image type
It is observed that the time consumption of the proposed approach is greater than the
existing CHIS technique. However, the performance of the proposed approach is
satisfactory, when compared to the existing technique. As this work combines colour and
texture features together, the time consumption is more than the existing technique. The
average precision and recall rates of the CHIS scheme are 80.66 and 39.33 respectively.
On the other hand, the proposed work shows improved precision and recall rates, which
are 89.3 and 60.3 respectively. The time consumption of the CHIS scheme is 0.973
seconds and the proposed scheme is 9.022 seconds. Though there is a huge difference in
time consumption, the same difference is observed in the performance also. Thus, the
12 T. Jasperline and D. Gnanadurai
objective of the CBFIR system to reach maximum accuracy, precision and recall rates is
met.
5 Conclusions
This paper presents a CBFIR system by dictionary learning approach. Recognising the
fact that fabric images are rich in colour and texture properties, this work extracts both
colour and texture features. The colour and texture features are extracted by colour
moments and GIST descriptor respectively. The initial level clusters are formed by FCM
algorithm and the dictionaries for each and every cluster are generated by D-KSVD
algorithm. The dictionary upgradation and sparse representation are done by SOMP
algorithm. The performance of the proposed approach is analysed by varying the
clustering algorithm, distance measure, feature extraction techniques and dictionary size
with respect to the precision and recall. The efficacy of the proposed approach is proved
by the achieved experimental results. In future, the time consumption of the proposed
approach is intended to be reduced.
References
Aharon, M., Elad, M. and Bruckstein, A. (2006) ‘The K-SVD: an algorithm for designing
overcomplete dictionaries for sparse representation’, IEEE Transactions on Signal Processing,
Vol. 54, No. 11, pp.4311–4322.
Bruckstein, A.M., Donoho, D.L. and Elad, M. (2007) ‘From sparse solutions of systems of
equations to sparse modeling of signals and images’, SIAM Review, Vol. 51, No. 1, pp.34–81.
Bryt, O. and Elad, M. (2008) ‘Compression of facial images using the K-SVD algorithm’, Journal
of Visual Communication and Image Representation, Vol. 19, No. 4, pp.270–282.
Chen, Y.C., Sastry, C.S., Patel, V.M., Phillips, P.J. and Chellappa, R. (2013) ‘In-plane rotation and
scale invariant clustering using dictionaries’, IEEE Trans. Image Process., June, Vol. 22,
No. 6, pp.2166–2180.
Cho, D. and Bui, T.D. (2014) ‘Fast image enhancement in compressed wavelet domain’, Signal
Process., Vol. 62, pp.86–93, DOI: 10.1016/j.sigpro.2013.11.007.
Elad, M. and Aharon, M. (2006) ‘Image denoising via sparse and redundant representations over
learned dictionaries’, IEEE Transactions on Image Processing, Vol. 15, No. 12,
pp.3736–3745.
Engan, K., Aase, S. and Hakon-Husoy, J. (1999) ‘Method of optimal directions for frame design’,
Proceedings of IEEE International Conference on Acoustics, Speech, Signal Processing,
Vol. 5, pp.2443–2446.
Guo, L.Q., Dai, M. and Zhu, M. (2014) ‘Quaternion moment and its invariants for color object
classification’, Inform. Sci., Vol. 273, pp.132–143, DOI: 10.1016/j.ins.2014.03.037.
http://economictimes.indiatimes.com/industry/services/retail/indias-e-commerce-market-expected-
to-cross-rs-2-lakh-crore-in-2016-iamai/articleshow/52638082.cms (accessed 7 June 2016).
http://www.textures.com/browse/lace-trims/23436 (accessed 7 January 2015).
Jasperlin, T. and Gnanadurai, D. (2016) ‘Histopathological image analysis by curvelet based
content based image retrieval system’, Journal of Medical Imaging and Health Informatics,
Vol. 6, No. 8, pp.1–6.
Karadağ, Ö.Ö. and Vural, F.T.Y. (2014) ‘Image segmentation by fusion of low level and domain
specific information via Markov random fields’, Pattern Recogn. Lett., Vol. 46, pp.75–82,
DOI: 10.1016/j.patrec.2014.05.010.
Content-based fabric image retrieval system 13
Mairal, J., Sapiro, M. and Sapiro, G. (2008) ‘Sparse representation for color image restoration’,
IEEE Transactions on Image Processing, Vol. 17, No. 1, pp.53–68.
Mallat, S. and Zhang, Z. (1993) ‘Matching pursuits with time frequency dictionaries’, IEEE Trans.
Signal Process., Vol. 41, No. 12, pp.3397–3415.
Pojala, C. and Somnath, S. (2014) ‘Detection of moving objects using multi-channel kernel
fuzzy correlogram based background subtraction’, IEEE Trans. Cybern., Vol. 44, No. 6,
pp.870–881.
Shekhovtsov, A., Kovtun, I. and Hlavac, V. (2008) ‘Efficient MRF deformation model for
non-rigid image matching’, Comput. Vis. Image. Understand., Vol. 112, No. 1, pp.91–99.
Solís, F., Hernández, M., Pérez, A. and Toxqui, C. (2014) ‘Static digits recognition using rotational
signatures and Hu moments with a multilayer perception’, Engineering, Vol. 6, No. 11,
pp.692–698.
Sztandera, L.M., Cardello, A.V., Winterhalter, C. et al. (2013) ‘Identification of the most
significant comfort factors for textiles from processing mechanical, handfeel, fabric
construction, and perceived tactile comfort data’, Text. Res. J., Vol. 83, No. 1, pp.34–43.
Walia, E., Vesal, S. and Pal, A. (2014) ‘An effective and fast hybrid framework for color image
retrieval’, Sens. Imag., Vol. 15, No. 1, pp.1–23.
Wang, X.Y., Niu, P.P., Yang, H.Y. et al. (2014) ‘A new robust color image watermarking using
local quaternion exponent moments’, Inform. Sci., Vol. 277, pp.731–754, DOI: 10.1016/
j.ins.2014.02.158.
Yousefi, S., Azmi, R. and Zahedi, M. (2012) ‘Brain tissue segmentation in MR images based on a
hybrid of MRF and social algorithms’, Med. Image Anal., Vol. 16, No. 4, pp.840–848.
Zhang, Q. and Li, B. (2010) ‘Discriminative K-SVD for dictionary learning in face recognition’,
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,
pp.2691–2698.
Zhang, W.G., Liu, C.X., Wang, Z.J. et al. (2014) ‘Web video thumbnail recommendation with
content-aware analysis and query-sensitive matching’, Multimed. Tool Appl., Vol. 73,
pp.547–571.