You are on page 1of 4

IEEE - INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING & TECHNOLOGY (ICAET-2014) | ISBN No.

: 978-1-4799-4949-6

Firefly Algorithm Approach for the


Optimization of Feature Selection to
Perform Classification
P.Shunmugapriya1, S.Kanmani2, P.Sindhuja3, V.Yasaswini3, G.Koperundevi3

1 Associate Professor, Department of Information Technology,


Dr.SJS Pauls College of Engg. & Tech.,
2 Professor, Department of Information Technology,
3 Final Year, Department of Information Technology,
Pondicherry Engineering College,
Puducherry,
India.
1
pshunmugapriya@gmail.com

Abstract—Classification is an important task of Data classification [3]. FS is seen as an optimization


Mining and its performance is highly affected by the problem, because obtaining the optimal subset of
redundant, irrelevant and noisy features present in features is very important. This problem of
the dataset related to any domain. So, Feature optimization has been widely addressed by
Selection (FS) and optimization of FS are very
evolutionary and Swarm Intelligent (SI) algorithms
important in order to filter out these features so as to
obtain optimal feature subset. Firefly Algorithm (FA) and in this regard, we have extensive ongoing
is one of the latest metaheuristic search algorithms research [4-9].
that is found to be applicable for a number of Motivated by the usage of SI algorithms for FS
optimization problems. In this paper, FS optimization
optimization, we have previously experimented
has been attempted using FA and hence FA-FS
algorithm has been proposed. The proposed algorithm with Ant Colony Optimization (ACO) and
has been evaluated on 10 datasets from UCI Artificial Bee Colony (ABC) algorithms for
(University of California, Irvine) and the results prove optimizing the selection of features [10 – 14].
the effectiveness of the proposed FA-FS in optimizing These two attempts have yielded optimal feature
FS. subsets showing improvements compared to the
existing literature. Apart from our proposals, in
Keywords—Firefly Algorithm; Feature Selection; literature there exists a huge volume of papers for
Classification; Optimization FS optimization using ACO, ABC, Genetic
Algorithm and Particle Swarm Optimization [4-9].
I. INTRODUCTION AND BACKGROUND The scenario is that, out of these proposals, one
shows better results than the other and none has
Pattern Classification or simply ‘Classification’ claimed to be a consistent performer for the entire
is an important step in Data Mining and Machine application domain. In this regard, we have our
Learning. Classification has achieved remarkable next attempt for optimizing FS using the Firefly
success in solving complex real-world problems algorithm (FA-FS). FA-FS algorithm has been
like credit card approval, medical diagnosis, fraud proposed, experimented using UCI datasets and
detection, web page categorization etc. that are not evaluated in comparison with ABC based FS
easily solvable by humans or would take months to optimization (ABC-FS) [12].
provide solutions. In real-world, objects are Firefly Algorithm (FA) is a relatively new
characterized by a set of measurements called stochastic search algorithm proposed by Xin-She
‘attributes’ or ‘features’. Classification is the action Yang in 2007 [15] and since then has been used
of assigning an object to a category according to the widely for different optimization problems [16 -
characteristics of the object [1 and 2]. 18]. Inspired by the results obtained from FA for
Feature Selection (FS) is an important pre- other optimization problems, it has been utilized to
processing step for the task of classification and it optimize the selection of feature subsets in this
is highly effective in improving the performance of paper.

Organized by CSE & IT


IEEE - INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING & TECHNOLOGY (ICAET-2014) | ISBN No.: 978-1-4799-4949-6

This paper is organized as follows: Section II (i) All fireflies within a population are
gives a brief description about feature selection and unisex, so that one firefly will be attracted to other
classification related to feature selection. The fireflies irrespective of their sex;
concept of Firefly algorithm is explained in Section (ii) Attractiveness between fireflies is
III. Details of the proposed methods are given in proportional to their brightness, implying that for
Section IV. Computations and results are discussed any two flashing fireflies, the less bright one will
in Section V and Section VI concludes the paper. move towards the brighter one. Attractiveness and
brightness both decrease as the distance between
II. FEATURE SELECTION AND
fireflies increases. If there is no brighter firefly
CLASSIFICATION
within its visible vicinity, then a particular firefly
A. Feature Selection will move randomly;
(iii) The brightness of a firefly is
Feature Selection (FS) is a commonly
determined by the landscape of the objective
used pre processing step in data mining, especially function.
when dealing with high dimensional space of
features. The main objective is to choose a subset IV. OPTIMIZATION OF FEATURE
of features from the original set of features as a SELECTION USING FIREFLY
representation of the entire domain [1]. The process ALGORITHM (FA-FS)
of FS is extensive and it spreads throughout many
In FA-FS algorithm, FA searches through the
fields which includes text categorization, machine
feature space and generates possible feature subset
learning, pattern recognition, and signal processing.
combinations; each time the feature subsets are
Considering the entire features may slowdown the
generated, they are evaluated by the Prediction
learning process and may reduce the performance
Accuracy given by the classifier. The detailed
of the classifier because of redundant and irrelevant
description of FA-FS algorithm is as follows:
features. Thus it is essential to reduce the number
The number of fireflies is set equal to the
of features by selecting the most relevant features
number of features in the dataset and each firefly is
to represent a dataset. FS allows the reduction of
assigned a single feature from the dataset. A binary
feature space, which is crucial in reducing the training
time and improving the prediction accuracy. This is
string (of length equal to the number of features) is
achieved by removing irrelevant, redundant, and noisy assigned to each firefly to represent the selection of
features [3]. features. The support to classification extended by
each feature (i.e. Predictive Accuracy (PA)) is
B. Classification considered as the brightness ( ) of the firefly
A classifier works on the entire feature set of holding that particular feature. PA is the percentage
an application to perform the task of classification. of instances that have been correctly classified as
Features of the application domain affect the instances of their original category [22]. The firefly
performance of the classifier in different manners. with the highest PA is termed as the “BRIGHTEST
So, there are higher chances for the irrelevant and FIREFLY” (BF) of the iteration.
noisy features to degrade the performance of the When the search procedure begins, each firefly
classifier and hence they should be removed [1 and selects the feature pointed by the BF for
2]. Not only the irrelevant, the redundancies in the combination. This is because of the principle of the
features should also be removed for effective FA, “fireflies get attracted towards the brightest
classification [6]. firefly in its vicinity”. So the feature subset
combinations that may result will definitely have
III. FIREFLY ALGORITHM the feature with the highest accuracy. Thus the
It is a wonderful sight to view the flashing of principle of the FA refines the search concentrating
lights from the fireflies in the summer sky. There only on more promising features than the others.
are a number of species of fireflies and most of the The feature of the BF has only been considered
species have a sort of unique pattern in the flash and before forming the feature subset combinations,
behavior. The fundamental principle behind this the new value of needs to be computed by using
flashing behavior is either to attract other fireflies the equation (1) as follows:
either for mating or as a prey [19 and 20]. This
principle is exploited and variants of FA have been = + exp − ( − ) + (1)
proposed.
The FA employed in the proposed work where, is the solution pointed by the current
depends on the variation of light intensity and the firefly (Classification Accuracy) and is the
formulation of attractiveness [adopted from [21]]. It solution pointed by the BF. is the attractiveness
depends on three idealized rules and uses them in measure between 0 and 1. γ is the variation of
framing the algorithm. The rules are: attractiveness whose value is chosen between 0.1
and 10. Distance is set to 1. α is a randomization

Organized by CSE & IT


IEEE - INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING & TECHNOLOGY (ICAET-2014) | ISBN No.: 978-1-4799-4949-6

parameter normally selected within the range [0,1] TABLE I DATASETS DESCRIPTION
and is a vector of random numbers drawn from
either a Gaussian or uniform (generally [-0.5,0.5]) Dataset Instances Features Classes
distribution [21]. Heart-C 303 14 2
If the newly found is greater than the Dermatology 366 34 6
previous value, the feature of the BF and the Hepatitis 155 19 2
present firefly are combined, passed on to the Lung Cancer 32 56 2
classifier for evaluation and the PA extended by the Pima Indian Diabetes 768 8 2
combined features is now set as the brightness of Iris 150 4 3
the firefly; otherwise nothing is done and the firefly Wisconsin 699 9 2
will just be holding the feature it was previously Lymphography 148 18 4
Diabetes 768 9 2
holding. Once, all the fireflies have completed one
Heart-Stalog 270 13 2
iteration, the value of BF is reset based on the
highest value of PA computed. The above
As the algorithm proceeds, the possible
procedure is repeated for a pre-determined number
feature subsets that are generated are passed on to
of times or until an optimal configuration of feature
the J48 algorithm for evaluation. After every
subset with a reasonably good accuracy is reached.
iteration, the optimal feature subsets are chosen and
As we have considered the number of fireflies
the firefly is made to point to them. When the
equivalent to the number of features, they are
algorithm comes to a halt, FA-FS algorithm would
distributed throughout the available space and
have yielded optimal feature subset along with
hence the FA has an extremely quicker
maximum PA.
convergence [21]. The steps of the proposed FA-FS
The results obtained from FA-FS
are summarized and given in Fig. 1.
algorithm are presented in comparison with ABC-
1.Cycle =1
FS [12] in Fig. 2 and Fig. 3. The comparison of size
2. Initialize FA parameters of the feature subset obtained for each of the dataset
3. Evaluate the fitness of each individual feature from both FA-FS and ABC-FS is given in Fig. 2. It
4. Repeat could be seen from Fig.2 that, for some datasets,
5. Construct solutions by the fireflies ( Xi ) ABC-FS holds good and for the others FA-FS holds
 Select the BRIGHTEST FIREFLY good in substantially reducing the size of the
 Assign feature subset configurations (binary bit feature set. The PA obtained for all the datasets
string) to each Firefly from both ABC-FS and FA-FS are compared in
 Produce new feature subsets
Fig. 3. Except for Hepatitis and Lung Cancer
 Pass the produced feature subset to the classifier
 Evaluate the fitnes of the feature subset by datasets, FA-FS has performed better than ABC-FS
yielding more PA. Thus, the proposed FA-FS
computing the new value of Xi algorithm has promising behavior and has selected
 Reset the value of Xi based on either consideration the features in a way so as to maximize the
of rejection of the BRIGHTEST FIREFLY classification accuracy.
6. Calculate the best feature subset of the cycle
7. Cycle = Cycle + 1
8. Until pre-determined number of cycles is reached 80
9. Employ the same searching procedure of fireflies to generate 70
the optimal feature subset configurations 60
Fig.1 Steps of FA-FS algorithm 50 No.of
40 Features ABC
V. EXPERIMENTAL RESULTS AND 30
DISCUSSION No.of
20
Features FA
10
The performance of the proposed FA-FS
0 Total
algorithm has been tested with 10 different UCI
Dermatol…

Heart…

Lung…

Features
Lymph

Wisconsin
Pima
Diabetes

Hepatitis
Iris

Sonar
Heart C
Audiology

datasets [23] and their description is given in Table


1. Since, we wanted to have a comparison with our
previous proposals on FS optimization, we have
adapted the same 10 datasets that had been Fig. 2 Chart representing the comparison of features selected by
ABC-FS and FA-FS
exploited in all our previous proposals [11-14].
Classification is implemented using Decision Tree In order to prove the efficiency of the proposed
(J48 algorithm) from WEKA (Waikato algorithm, the spread over of features of a one of
Environment for Knowledge Analysis) tool [24]. the datasets is shown in Fig. 4. Fig. 4 provides the
To start with, the parameters of the FA are visualization of the Pima dataset from which its
initialized to values in the permitted range. Each Feature A is along the X axis and Class is along the
feature is evaluated by using the J48 algorithm and Y axis.
the concerned fireflies are assigned brightness.

Organized by CSE & IT


IEEE - INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING & TECHNOLOGY (ICAET-2014) | ISBN No.: 978-1-4799-4949-6

[5] Nadia Abd-Alsabour and Marcus Randall, Feature Selection


for Classification Using an Ant Colony System, Proc. Sixth
IEEE International Conference on e–Science Workshops,
2010, pp 86- 91.
[6] Kabir M.M., Shahjahan, M. and Murase, K. (2008) ‘A new
hybrid ant colony optimization algorithm for feature
selection’, Journal of Applied Soft Computing, Vol. 8, pp.
687–697.
[7] Tu, C.J., Chuang, L., Chang, J. and Yang, C. (2007) ‘Feature
Selection using PSO-SVM’, IAENG International Journal
of Computer Science, Vol. 33, No. 1, pp. 18 – 23.
[8] N.Suguna and K.G.Thanushkodi, “An Independent Rough
Set Approach Hybrid with Artificial Bee Colony Algorithm
for Dimensionality Reduction”, American Journal of
Applied Sciences 8 (3): 261 – 266, 2011.
[9] N.Suguna and K.G.Thanushkodi, “A novel Rough Set
Fig.3 Chart representing the comparison of PA obtained from Reduct Algorithm for Medical Domain based on Bee
ABC-FS and FA-FS Colony Optimization”, Journal of Computing, Vol. 2(6),
2010, 49 -54.
[10] P. Shunmugapriya and S. Kanmani, “Designing Classifier
Ensemble through Multiple-Pheromone Ant Colony based
Feature Selection”, Journal of Computing, Vol.4, No.5,
pp.39-44, May 2012.
[11] P. Shunmugapriya and S. Kanmani, “Classifier Ensemble
Design using Artificial Bee Colony based Feature
Selection”, International Journal of Computer Science
Issues, Vol.9, No.3, pp.522-529, May 2012.
[12] P. Shunmugapriya and S. Kanmani, “Artificial Bee Colony
Approach for Feature Selection”, International Journal of
Computer Science Issues, Vol.9, No.3, pp.432-438, May
2012.
[13] P. Shunmugapriya, S. Kanmani, Devipriya. S, Pushpa. J
and Archana. J, “Investigation on the Effects of ACO
Parameters for Feature Selection and Classification”, in the
proceedings of Springer - Third International Conference
on Advances in Communications, Networks and
Fig. 4 Distribution of features for Pima Dataset Computing, CNC 2012, LNICST, Chennai, India,
February 22-23, pp.136-145, 2012.
[14] P. Shunmugapriya, S. Kanmani, R. Supraja, K. Saranya and
Blue indicates C1 and red indicate C0 which Hemalatha, “Feature Selection Optimization through
Enhanced Artificial Bee Colony Algorithm”, IEEE
indicates either positive or negative. The clustered Proceedings of Third International Conference on Recent
format of the features is due to the presence to Trends in Information Technology, pp. 56-61, 2013.
indicate either C1 or C0. The proposed FA-FS [15] Iztok Fister , Iztok Fister Jr. , Xin-She Yang , and Janez
Brest, “A Comprehensive review of Firefly algorithms”
algorithm has given a good increase in PA for the Elsevier – Swarm and Evolutionary Computation, Vol. 13,
Pima dataset with such a mixed distribution of pp. 34–46, 2013.
features belonging to different class categories. [16] Theofanis Apostolopoulos and Aristidis Vlachos,
“Application of the Firefly Algorithm for Solving the
VI. CONCLUSION Economic Emissions Load Dispatch Problem”,
International Journal of Combinatorics, vol. 2011, Article
Nature inspired algorithms being powerful ID 523806, 23 pages, 2011. doi:10.1155/2011/523806.
[17] J. Kwiecień and B. Filipowicz, “Firefly algorithm in
optimization algorithms, have been applied in a optimization of queueing systems”, Bulletin of the Polish
large number of domains for problem solving. Academy of Sciences: Technical Sciences, Vol. 60, Issue 2,
Previously having attempted on ACO and ABC for pp. 363–368, Oct. 2012.
[18] Firefly Algorithms for Multimodal Optimization”, Springer
FS optimization and since the results were –Verlag Proceedings of the 5th International Conference
remarkably good, we attempted on using FA - a on Stochastic Algorithms: Foundations and Applications,
relatively new meta-heuristic algorithm. The pp.169-178, 2009.
proposed FA-FS algorithm is found to be more [19] Xin-She Yang, Nature-Inspired Metaheuristic Algorithms,
Luniver Press, Second Edition, 2010.
promising and has performance comparable to that [20] K.N. Krishnanand and D. Ghose, “Glowworm swarm based
of ABC-FS. The concept of “BRIGHTEST optimization algorithm for multi-modal functions with
FIREFLY” that we exploited makes the algorithm collective robotics applications,”Multi-agent and Grid
Systems, Issue 3, Volume 2, 2006, pp. 209 - 222.
to have a good convergence rate and hence a robust [21] RahaImanirad, Xin-She Yang and Julian Scott Yeomans,
and speedy performance. “Modelling To Generate Alternatives Via the Firefly
Algorithm”, Journal Of Applied Operational Research,
REFERENCES 2013.
[22] Han, J., and Kamber, M.: Data mining concepts and
[1] R.O. Duda, P.E. Hart and D.G. Stork, Pattern Recognition, techniques, Academic Press, 2001.
John Wiley & Sons, Inc 2nd edition, 2001. [23] A.Frank, A. Asuncion, UCI Machine Learning Repository,
[2] L.I.Kuncheva, Combining Pattern Classifiers, Methods and (http://archive.ics.uci.edu/ml. Irvine, CA: University of
Algorithms, Wiley Interscience, 2005. California, School of Information and Computer Science
[3] Dash.M and Liu.H., “Feature Selection for Classification”, (2010))
Intelligent Data Analysis, Vol.39, No.1, pp.131–156, 1997. [24] WEKA: A Java Machine Learning Package,
[4] Ahmed Al-Ani: Ant Colony Optimization for Feature Subset http://www.cs.waikato.ac.nz/˜m l/weka/.
Selection. WEC (2) 2005: 35-38.

Organized by CSE & IT

You might also like