Franco 2018

Journal of Cleaner Production 191 (2018) 445e457
Contents lists available at ScienceDirect
Journal of Cleaner Production

journal homepage: www.elsevier.com/locate/jclepro
Clustering of solar energy facilities using a hybrid fuzzy c-means

algorithm initialized by metaheuristics
David Gabriel de Barros Franco*, Maria Teresinha Arns Steiner
(PUCPR), Graduate Program in Industrial Engineering and Systems (PPGEPS), Curitiba, Parana
Pontifical Catholic University of Parana , Brazil
a r t i c l e i n f o a b s t r a c t
Article history: Abandoned areas with substances potentially harmful to the environment or human health have raised
Available online 25 April 2018 numerous concerns around the world. The objective of the present study is to analyze the possibility of
building solar power plants capable of capturing solar energy in unproductive areas, both contaminated
Keywords: and uncontaminated. For this purpose, American data from the National Solar Radiation Database
Solar energy (NSRDB) were used, a collection of hourly solar radiation measurements and meteorological data, as well
Soil reuse
as data from the RE-Powering America's Land project, run by the United States Environmental Protection
Fuzzy c-means clustering
Agency (EPA). In the analysis, the information about “mapped area”, “distances to the transmission lines”,
Differential evolution
Genetic algorithm
“solar direct normal irradiance on a utility scale” and “off-grid direct normal irradiance” were considered.
Particle swarm optimization To define the best locations, the data were initially pre-processed. A new hybrid fuzzy c-means (HFCM)
algorithm was then applied, initialized, comparatively, by three metaheuristics: Differential Evolution
(DE), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), for the data clustering. The number
of clusters obtained was validated by three metrics: Calinski-Harabasz Criterion, Davies-Bouldin Crite-
rion and Silhouette Coefficient, with all three unanimously indicating two clusters as the ideal number:
one cluster for locations with greater potential for allocating facilities to capture solar energy, and
another for locations with a lower potential. With the new approach, an increase of 23.3% in the training
velocity of the HFCM algorithm was identified, which required fewer iterations to achieve the same value
of the objective function. Else, a round of experiments was conducted with six different datasets (in-
stances from literature) and the results showed that the proposed method can achieve better results
(faster convergence and smaller solution cost) than the classic FCM. Visually, the predominance of the
allocation of facilities was perceived in states with a greater average incidence of solar radiation.
Therefore, this was the predominant factor in the convergence of the algorithm, which is in accordance
with expectations for solar energy. Finally, the social, economic and environmental gains were consid-
ered with the revitalization of unproductive land with the possibility of implementing solar power plants
in these areas.
© 2018 Elsevier Ltd. All rights reserved.
1. Introduction They also include brownfields, defined as industrial or commercial

facilities that are difficult to reuse due to dangerous, pollutant or
Abandoned areas with substances potentially harmful to the contaminating substances (U.S. Government Publishing Office,
environment or human health have raised numerous concerns 2002). Other locations are those in the Superfund, a federal gov-
€
around the world (Bergius and Oberg, 2007; Greenberg and Lewis, ernment program for locating and cleaning areas contaminated
2000; Hartmann et al., 2014; Li et al., 2017; van Straalen, 2002). with pollutant or dangerous substances (U.S. Government
These risk areas, in the American context, include abandoned Publishing Office, 2015) and landfills, which in developed coun-
mines with spoil banks and metal processing plants, generally tries basically contain leftover food and packaging (Rong et al.,
contaminated by heavy metals (Kovacs and Szemmelveisz, 2017). 2017). Finally, there are areas covered by the Resource Conserva-
tion and Recovery Act (RCRA), regarding the disposal of solid waste
(U.S. Government Publishing Office, 2011).
On the other hand, the growing occupation of urban and rural
* Corresponding author.
spaces has become a problem in the modern world (Lambin and
E-mail address: david.barros@pucpr.br (D.G.B. Franco).
https://doi.org/10.1016/j.jclepro.2018.04.207
0959-6526/© 2018 Elsevier Ltd. All rights reserved.
446 D.G.B. Franco, M.T.A. Steiner / Journal of Cleaner Production 191 (2018) 445e457
Meyfroidt, 2011; Morio et al., 2013), leading to demands for greater take a variety of forms, such as equations, networks, graphs or sets
efficiency in territorial occupation, especially the reuse of aban- of rules (Roiger, 2017; Witten et al., 2017). In this learning stage, two
doned areas, posing one of the greatest challenges (Morio et al., distinct approaches can be used. In the first approach, supervised
2013; Nuissl and Schroeter-Schlaack, 2009; U.S. Government Pub- learning, previously defined structures and patterns are considered
lishing Office, 2002). This situation is further aggravated if such a priori. In the second approach, unsupervised learning, these
areas are large and contaminated, thus causing, in addition to risks possible structures and patterns are not considered. It is left to the
to health and the environment, economic risks (Apostolidis and algorithm to identify these relationships between the variables
Hutton, 2006; Cao and Guan, 2007; de Sousa, 2003; Kaufman (Orriols-Puig et al., 2013).
et al., 2005; Morio et al., 2013). Clustering is included in the latter, unsupervised learning,
The aim of the present work is to analyze the use of clustering approach, in which the instances are clustered based on some rule
analysis (CA) to determine the best location to install solar power inherent to the structure, such as the distance between them
plants on unproductive areas (whether contaminated or not). By (Bramer, 2016; Roiger, 2017). Put simply, it could be said that
using public information on the main unused areas in the conti- following the definition of the number of clusters and their
nental United States, a new hybrid clustering methodology was respective centers (so-called centroids), there is a first stage of
developed. This was composed of the pre-processing of data and designating the instances to the closest centroid, followed by the
the application of a hybrid fuzzy c-means (HFCM) algorithm, optimization of its location, minimizing the objective function
initialized comparatively by three metaheuristics: Differential (Aggarwal, 2015).
Evolution (DE), Genetic Algorithm (GA) and Particle Swarm Opti- There are varying numbers and types of clustering algorithms
mization (PSO), with a view to classifying these areas as appro- (Fahad et al., 2014; Halkidi et al., 2001; Nanda and Panda, 2014; Xu
priate or inappropriate for the implementation of these power and Wunsch, 2005), of which some shown in Table 1 may be
plants. The main innovation of this new approach consists in speed highlighted, with the respective references from which they
up the FCM algorithm achieving equal or better results in clustering originated.
problems. Subsequently the technique can be exploited and
expanded to other domains. 2.1. State of the art of clustering analysis
As increased energy consumption clashes with the implications
of fossil fuel consumption and the consequent emission of toxic and For the analysis of the state of the art of CA algorithms, the most
greenhouse gases, investments are required for research and frequently cited articles in the Elsevier database for each year from
development in new clean and renewable energy sources (Almeida 2010 to 2017 were used.
et al., 2017; Ban~ os et al., 2011; Cadez and Czerny, 2016; Manan et al., In 2010, Deng et al. presented the Enhanced Soft Subspace
2017; Manzano-Agugliaro et al., 2012; Perea-Moreno et al., 2017). Clustering (ESSC) algorithm for inserting between-cluster infor-
Thus, renewable energy sources, such as solar energy, have become mation (cluster separation) in the objective function of the fuzzy
strong candidates in the new race for productivity and environ- clustering algorithms, as well as within-cluster information (cluster
mental and social well-being (Gonza lez et al., 2017; Lima et al., compactness), improving performance in real and synthetic high
2013). They are also in vogue in political, business and social dimensional datasets. In 2011, Kalogeratos and Likas addressed the
discourse in general (Onat et al., 2014; Simas and Pacca, 2013). In issue of positioning the initial centroids. Their proposal, k-synthetic
this scenario, solar energy is considered an abundant, free and prototypes, consisted of selecting representative centroids of the
clean energy source (Fern andez-García et al., 2015). characteristics of the cluster, applied to text documents, achieving
The present study is organized as follows. After this introduc- better results in small datasets.
tion, Section 2 looks at the theoretical framework regarding the In 2012, a new methodology was proposed to identify a priori
process of knowledge discovery in databases, FCM, DE, GA and PSO, the number of clusters that adapt better to the data and which
including a brief literature revue and the state of the art CA tech- produce robust solutions using the FCM algorithm and graph par-
niques. In Section 3, the data used in the case study are introduced titioning (Mok et al., 2012). Continuing on the initialization of the
with its respective pre-processing, followed by a description of the centroids, Khan and Ahmad (2013) used the points that minimize
proposed methodology. The results are presented and discussed in the dissimilarity between clusters (based on the most significant
Section 4 with a comparison with instances from literature in order attributes) as initial centroids, leading to a more rapid convergence
to check the HFCM efficiency. Finally, the study is summarized and of the K-modes algorithm in categorical datasets.
concluded in Section 5. Again in relation to the a priori definition of the number of
clusters, Shahbaba and Beheshti (2014) proposed the MACE-means,
which uses the minimization of the initial Average Central Error
2. Theoretical framework (ACE) of the K-means algorithm (estimated from the distance
within-cluster) to define the correct number of clusters in real and
A literature review is presented below regarding knowledge synthetic datasets. In 2015, a new approach emerges with Bruneau
discovery in databases (KDD), data mining and clustering. et al., the Cluster Sculptor, an interactive system that allows the
KDD is a field of knowledge dedicated to identifying the user to improve the results of the clustering with the aid of a two-
extraction of significant patterns of information from databases dimensional data projection.
(Fayyad et al., 1996). KDD is applied in multiple stages, beginning In 2016, Saki and Kehtarnavaz proposed an online frame-based
with the selection, pre-processing and transformation of data, clustering algorithm (OFC) consisting of three phases: removal of
which may include removing outliers, substituting missing data, outliers based on density and generation and updating of new
normalization, principal component analysis (PCA) and other clusters. Tests with real and synthetic datasets showed that the
techniques, depending on the algorithm of the subsequent stage, authors' proposal outperformed the CluStream, DenStream and
data mining (or learning), ending with the interpretation of results SVStream algorthisms, also online clustering algorithms. Finally,
and generating knowledge (Fayyad et al., 1996; Gamarra et al., Ramon-Gonen and Gelbard (2017) presented a new temporal
2016; Orriols-Puig et al., 2013). (evolutionary) clustering algorithms, Cluster Evolution Analysis
The data mining stage may involve one or several algorithms in (CEA), which addresses three aspects of the problem that vary over
search of patterns, trends and structures in the database, which can time: changes in the number of clusters, evolution of cluster
D.G.B. Franco, M.T.A. Steiner / Journal of Cleaner Production 191 (2018) 445e457 447
Table 1 global optimization metaheuristics DE, GA and PSO.

Clustering algorithms.
Type Algorithm Reference

2.2. FCM clustering
Partitional K-Means (MacQueen, 1967)
PAM (Partitioning Around (Ng and Han, 1994)
In fuzzy, or soft, clustering, each instance has degrees of
Medoids), CLARA (Clustering Large
Applications), CLARANS (Clustering participation in the interval ½0; 1, in each cluster, so that the total
Large Applications based on sum of the participation for each instance is equal to “1”. The de-
Randomized Search) grees of participation compose the fuzzy partition matrix, whose
K-prototypes, K-mode (Huang, 1998) values will be determined from the minimization of the objective
Hierarchical BIRCH (Balanced Iterative Reducing (Zhang et al., 1996)
and Clustering using Hierarchies)
function of the problem.
CURE (Clustering Using (Guha et al., 1998) This objective function is the FCM functional (Equation (1)), the
Representatives) base for a broad family of fuzzy clustering algorithms (Babuska,
ROCK (Robust Clustering using (Guha et al., 2000) 1998; Bezdek, 1981; Dunn, 1974; Lu et al., 2017). The function
links)
value can be understood as a measurement of the total variance
Density-based DBSCAN (Density Based Spatial (Ester et al., 1996)
Clustering of Applications with between each instance and the center of its respective cluster, the
Noise) centroid (Babuska, 1998). The minimization of this function rep-
DENCLUE (Density-based (Hinneburg and resents a problem of non-linear optimization, which can be solved
Clustering) Keim, 2003, 1998) numerous ways, including genetic algorithms (Babu and Murty,
Grid-based STING (Statistical Information Grid- (Wang et al., 1997)
based method)
1994), simulated annealing (Desarbo, 1982) and grouped coordi-
WaveCluster (Sheikholeslami nate minimization (Bezdek et al., 1987; Hathaway and Bezdek,
et al., 1998) 1991). However, the most popular is FCM (Babuska, 1998; Zhou
Fuzzy (soft) FCM (Bezdek et al., 1984) et al., 2017).
EM (Expectation Maximization) (Dempster
et al., 1977)
X
N X
C
Neural GLVQ (Generalized Learning Vector (Pal et al., 1993)
Z¼ mm 2
ij xi cj (1)
Network-based Quantization), SOFM (Self-
i¼1 j¼1
Organizing Feature Maps)
Kernel-based SVC (Support Vector Clustering) (Ben-Hur
P
et al., 2001) where mij 2½0; 1 is the fuzzy partition matrix ð Cj¼1 mij ¼ 1 ci Þ for
the ith instance (N observations) and the jth centroid (C centroids),
whose m determines the fuzziness of the resulting clusters, varying
characteristics and between-cluster migration. The model was between “1” (no fuzziness, with the algorithm not being fuzzy) and
tested on data from the financial market and proved capable of infinite (total fuzziness and, consequently, no definition of which
identifying the stocks that would lose or maintain their value over cluster belongs to each point), and xi c2j is the squared Euclidean
time.
distance between the instances xi and the centroids cj .
Addressing the centroid initialization problem, we propose a
The consensus among the authors is a value of m in the interval
new HFCM algorithm for classification (clustering) of locations
½1:25; 4, with m ¼ 2 being the general value (Ozkan and Turksen,
suitable or not to install solar power plants on unproductive areas
2007; Wu, 2012; Zhou et al., 2017, 2014).
belonging to the RE-Powering America's Land initiative. This new
The FCM algorithm can be formalized as shown in Fig. 1 (Bezdek,
hybrid approach refers to the FCM initialization optimization by the
1981; Bezdek et al., 1987, 1984; Dunn, 1974; Zhou et al., 2017). The
Fig. 1. Formalization of FCM algorithm.

number C of clusters is in the interval ½2;N. It falls to the researcher belonging to the interval ½0; 1, and randi is a whole number chosen
to determine (with or without the aid of metrics) the ideal number at random in the interval ½1; 2;…;D, with a view to guaranteeing a
of clusters for the problem in question. The error ε is also deter- vi;Gþ1 sxi;G .
mined by the researcher in accordance with the application. The target vector xi;G is compared with the test vector ui;Gþ1 and
the lowest value is selected for the next generation (survival of the
2.3. DE algorithm fittest), as shown in Equation (5).

ui;Gþ1 if f ui;Gþ1 f xi;G
DE is an evolutionary algorithm for stochastic search and global xi;Gþ1 ¼
xi;G else
optimization that maintains, at each iteration G, a population of N (5)
candidates (called target vectors) xi;G ¼ ½xj;i , where j ¼ 1; 2; …; D is
i ¼ 1; 2; …; N
the number of parameters of the model (the dimension of the
problem, ℝD ) and i ¼ 1; 2; …; N, which are submitted to successive In Fig. 2, below, the DE algorithm is presented (Brownlee, 2011;
stages of selection, mutation and recombination. The mutation Chakraborty, 2008; Price et al., 2005; Storn and Price, 1997; Xing
expands the search space, creating a new solution based on the and Gao, 2014).
weighted difference between two members of the previous popu- In addition to the schema presented above, numerous others are
lation added to a third member, as shown in Equation (2), where useful. The classification of variants is given by the notation
vi;Gþ1 is called the donor vector (Brownlee, 2011; Storn and Price, DE=x=y=z, where x specifies the mode of mutation, which may be
1997). rand (random) or best (the element with the lowest cost for the
current population), y is the number of elements of difference and z

vi;Gþ1 ¼ xr1;G þ F* xr2;G xr3;G (2) is the mode of recombination ðbinÞ.
For this notation, the classic method is DE=rand=1=bin (Storn
In iteration G the indices r1 ; r2 ; r3 2f1; 2; …; Ng are selected and Price, 1997), but it should be said that an interesting variant
randomly and are mutually exclusive, as well as different from in- (in terms of population diversification, if it is sufficiently large) is
dex i, which imposes a population N 4. F2½0; 2 is the real DE=best=2=bin, whose mutation is given by Equation (6) (Storn,
weighting constant that controls the amplitude of the differential 1996).
variation. With a view to increasing the diversity of solutions

generated in the mutation stage and reusing fit individuals, a vi;Gþ1 ¼ xbest;G þ F* xr1;G þ xr2;G xr3;G xr4;G (6)
recombination is executed in accordance with Equations (3) and
(4). The vector ui;Gþ1 is called the test vector.

ui;Gþ1 ¼ u1;i;Gþ1 ; u2;i;Gþ1 ; …; uD;i;Gþ1 (3) 2.4. GA algorithm
where GA is an adaptive global optimization technique belonging to

the group of evolutionary algorithms, a branch of evolutionary

vj;i;Gþ1 if randj;i CR ∨ j ¼ randi computation. It was inspired by the genetics and evolution of
uj;i;Gþ1 ¼ (4)
xj;i;G if randj;i > CR ∧ jsrandi populations and uses their structures (such as chromosomes, genes
and alleles) and mechanisms (such as recombination and muta-
i ¼ 1; 2; …; N; j ¼ 1; 2; …; D tion), mimicking the survival of the fittest, a concept taken from
Darwin's natural selection model (Eberhart and Shi, 2007; Karray
In Equation (4), randj;i is a random number with an even dis- and Silva, 2004; Kruse et al., 2016; Steiner et al., 2015; Zgurovsky
tribution in the interval ½0; 1, CR is the recombination constant and Zaychenko, 2017).
Fig. 2. Formalization of DE algorithm.

To a population of N elements, initialized randomly, are applied 41 and 42 are random values in the interval ½0; 1. Ppbest and Pgbest
selection, crossover and mutation operations that, over a previously are the memory of the best solution achieved, in the previous
defined number of iterations (generations) create new solutions iteration by each particle and by the swarm, respectively.
from the previous population and preserve the most apt elements
(in terms of maximum and minimum) according to the value of the 2.6. Selection of the number of clusters
objective function, also known as fitness (Steiner et al., 2015).
Fig. 3 presents the pseudocode of the GA algorithm, where D The number of clusters was validated by three metrics, consid-
represents the number of parameters of the model, or dimension of ered the most frequently used in the literature (Arbelaitz et al.,
the problem ðℝD Þ, and MR and CR are the probabilities of mutation 2013): the Calinski-Harabasz Criterion, Davies-Bouldin Criterion
and crossover, respectively. The stop criteria have to do with the and Silhouette Coefficient.
maximum number of iterations and target value to be achieved for The criterion of the Calinski-Harabasz, also known as the Vari-
fitness (Brownlee, 2011; Engelbrecht, 2007; Kruse et al., 2016). ance Ratio Criterion (VRC), may be defined as the ratio between the
One of the selection models of the parents is stochastic universal mean variance between-cluster and the mean variance within-
sampling, also known as roulette wheel selection, where each in- cluster (Equation (7)). The higher its value, the better the data
dividual is given a probability of selection proportional to its fitness. partitioning.
In other words, the more adequate the individual is to the solution
Pk
of the problem, the greater the probability of being selected. ni ci x2 ðN kÞ
In the recombination phase, or crossover, the parents are par- CHk ¼ Pk i¼1
P * (7)
2 ðk 1Þ
i¼1 x2ki x ci
titioned in determined cut-off points, determined by CR, and have
their coordinates (for the clustering problem) exchanged, giving
where ni is the number of elements in the cluster i; ci is the centroid
birth to children who then suffer mutation in the determined
of the cluster i; x is the mean of the data set; N is the number of
points from MR, with a view to increasing the diversity of available
elements in the data set; k is the number of clusters and x is an
solutions.
element of the data set belonging to cluster ki :k$k is the Euclidean
distance.
2.5. PSO algorithm The Davies-Bouldin criterion is based on the ration between the
distances within-cluster and between-cluster, in accordance with
PSO is a global optimization algorithm characterized by Equation (8). The lower its value, the better the result.
sweeping the search space using a swarm of particles. It belongs to ( )
the fields of swarm intelligence and collective intelligence, 1X k
di þ dj
branches of computational intelligence. It was originally inspired DBk ¼ maxjsi (8)
k i¼1 di;j
by the social behavior of some animals, such as flocks of birds and
shoals of fish (Kruse et al., 2016).
where k is the number of clusters; di is the average distance be-
The search that the algorithm makes consists of N individuals
exploring the neighborhoods of the swarm up to a certain distance tween each point of the ith cluster and the centroid of the respective
and returning information to the members on the discoveries being cluster (analogously for dj ) and di;j is the Euclidean distance be-
made. It can be viewed as a method that combines gradient-based tween the centroids of clusters i and j.
search and population-based search, which requires that the The Silhouette Coefficient, for each point of the data set, is the
function to be optimized should be of the type f : ℝD /ℝ measurement of how similar one point is to the other points in the
(Brownlee, 2011; Engelbrecht, 2007; Kruse et al., 2016). same cluster compared with the points of another cluster. Its value,
Fig. 4 shows the pseudocode of the PSO algorithm, where w Si 2½ 1; 1, for the ith point is defined in accordance with Equation
corresponds to the inertia factor, C1 and C2 are two constants and (9).
: , , ,
:
1. ( , )
2. ( )
3. ¬
4. ( , )
5.
6. ( , )
7. ( , )
8.
9. ( )
10.
11.
12.
Fig. 3. Formalization of the GA algorithm.

Fig. 4. Formalization of the PSO algorithm.
meteorological data, and the RE-Powering America's Land project

bi a i of the United States Environmental Protection Agency (EPA), whose
Si ¼ (9)
maxðai ; bi Þ purpose is to identify abandoned and contaminated land consid-
ered ideal for renewable energy projects. Fig. 6 shows the union
where ai is the average distance between point i and the other between the mean solar direct normal irradiance for 1998 to 2014,
points in its cluster and bi is the minimum average distance be- measured in kWh=m2 =day, and contaminated areas of over 100
tween point i and the points belonging to the other clusters.
acres (approximately 400,000 m2 ), with potential for implement-
If most of the points have a high Silhouette Coefficient value,
ing solar energy plants. This is the first time that this new dataset
then the solution is appropriate.
has been used to classify abandoned areas for the installation of
The three tests were unanimous, indicating two clusters as the
solar power plants.
ideal number: one cluster for locations with a high potential for the
A total of 5063 instances (abandoned areas) were examined. The
allocation of facilities for capturing solar energy (especially in states
four variables in the database are “mapped area”, measured in
with the most solar irradiance) and another for locations with low
acres, “distances to transmission lines”, in miles, “direct normal
potential.
irradiance on a utility scale” (requiring potential generation to be
greater than 6.5 MW=year; and mean DNI greater than 5.0
3. Methodology
kWh=m2 =day, with the possibility of exporting energy to the
network) and “off-grid direct normal irradiance” (requiring that the
In this section, the methodological proposal used in this work is
presented, from the collection and pre-processing of the data used mean DNI should be greater than 2.5 kWh=m2 =day with no possi-
in the case study to the description of the HFCM algorithm. Fig. 5 bility of exporting energy to the network).
below summarizes the flowchart of the methodology. The missing values, a total of 12 instances for the variable
“distance to the transmission lines” were replaced by the mean of
3.1. Data collection and pre-processing the set. This was followed by the normalization of the data, which
came to have a mean “0” and standard deviation “1”, to continue
The data used are from the National Solar Radiation Database the analysis of the principal components, the aim being to identify
(NSRDB), a collection of hourly measurement of solar radiation and and remove the attributes that are correlated with one another and
Fig. 5. Flowchart of the methodology.
Fig. 6. Direct normal irradiance and contaminated areas in the continental United States.
contribute little to the variance of the set (Theodoridis and proposed using three distinct metaheuristics (as detailed in Sec-
Koutroumbas, 2009). tions 2.2, 2.3 and 2.4): DE (Storn, 1996; Storn and Price, 1997) and
For the variables tested, only “off-grid direct normal irradiance” GA (Goldberg, 1989), considered evolutionary strategies, and PSO
was removed, as it did not contribute to the variance of the data set (Kennedy et al., 2001; Kennedy and Eberhart, 1995; Poli, 2008), one
(only 0.5829%, compared with 50.5270% for “mapped area”, of the swarm strategies.
25.9960% for “distance to the transmission lines” and 22.8941% for The pseudocode of the resulting HFCM algorithm is shown in
“direct normal irradiance on a utility scale”). Therefore, the final set Fig. 8, below. Line 1 is the initialization of the fuzzy partition matrix
used in the tests had three variables. This care is necessary to avoid (U0 ) by one of the three metaheuristics: DE, GA or PSO.
correlated independent variables being included in the model Only one iteration was performed for the initialization phase
(collinearity). Fig. 7, below, provides details of the correlation and a population of 4 individuals was used (minimum number of
analysis for the four variables in the original data set and shows the individuals for the DE algorithm).
strong correlation (Pearson's linear correlation coefficient equal to For the visual presentation of the results (Fig. 10), the simple
0.98) between the variables “direct normal irradiation on a utility rounding of the fuzzy partition matrix was used, which stores the
scale” and “off-grid direct normal irradiation”. degrees of participation for each instance in each cluster. Thus, the
values greater than or equal to “0.5” were rounded up to “1” and
3.2. Proposed algorithm values below “0.5” were rounded down to “0”. The intermediate
values of degrees of participation (between 0.4 and 0.6, for
As the classic FCM algorithm (Section 2.1) assumes the random example) could have been representatives of a third intermediate
initialization of the fuzzy partition matrix, a higher number of it- cluster. However, the aim of this work (supported by three distinct
erations is required to achieve the final solution. With this limita- metrics) was to have only two clusters.
tion in mind, here the initialization of the fuzzy partition matrix is As in the initial stage (metaheuristic and partitional) there is no
which guarantees a more rapid convergence with it. Table 3 shows

a comparison of the mean and standard deviation (Std.) for the
execution time of the algorithm and for the value of the objective
function for a round of 100 tests. The percentage difference (%)
refers to FCM algorithm.
Both FCM and HFCM algorithm classified 835 instances as
belonging to Cluster 1 (black dots), and 4228 as belonging to Cluster
2 (red dots). Fig. 10 shows an overview of the result of the clustering
for the HFCM.
Visually, the predominant allocation of facilities in states with a
higher mean incidence of irradiation (highlighted in yellow) is
notable. However, using only the “distance to transmission lines”
variable, an approximate result is obtained, with 734 points in
Cluster 1 and 4329 points in Cluster 2. This is the predominant
variable in the convergence of the algorithm.
Fig. 7. Analysis of the correlation for the four variables of the data set.
concept of degree of participation or belonging from fuzzy clus-

tering, the final classification of each instance (as belonging to
Cluster 1 or Cluster 2) was binarized and the result used as input for
the subsequent (fuzzy) stage.
4. Experiments and results
The suggested proposal succeeded in reducing the number of

iterations and, consequently, the total execution time of the algo-
rithm. The classic version of the FCM algorithm required 23 itera-
tions, while the one initialized by GA and PSO required 13
iterations, and the version initialized by the DE required 14 itera-
tions. All of them achieved the same final value for the objective
function (8737.761). See Fig. 9.
When the algorithm is initialized by metaheuristics, the location
of the centroids is optimized before passing to the FCM algorithm, Fig. 9. Comparison between HFCM (DE-FCM; GA-FCM and PSO-FCM) and FCM.
Fig. 8. Formalization of the HFCM algorithm.

the statistic for each of them. It should be noted that the instances
allocated to Cluster 1 have a higher mean for the variables “mapped
area” (Var. 1), “direct normal irradiance on a utility scale” (Var. 3)
and “off-grid direct normal irradiance” (Var. 4). For “distance to the
transmission lines” (Var. 2), Cluster 2 had the lowest mean, a dif-
ference of 11.6% in relation to Cluster 1. It is interesting that Cluster
2, even with five times as many instances as Cluster 1, has lower
standard deviations and variance for all the variables, one of the
indicators of quality (compactness) in clustering (Fahad et al.,
2014). Therefore, Cluster 1 has the greatest potential for the
installation of solar power plants. When choosing the best locations
among the 836 possible alternatives in Cluster 1, economic, tech-
nical and even political aspects should be taken into account.
4.2. Instances from the literature

Fig. 10. Result of the clustering.
In order to validate the potential of the proposed HFCM tech-
nique, 100 tests were performed (100 different initializations) with
Table 3
Statistical comparison of the results. another six datasets (instances from the literature), with different
numbers of instances N (samples), dimensions D (number of vari-
Time Objective Function
ables) and clusters C (predefined in their respective references).
Mean % Std. Mean % Std. The results, in terms of minimum (Min.), mean and maximum
FCM 0.01532 0.00147 8737.76155 0.00010 (Max.) value, and standard deviation (Std.), are presented in
DE-FCM 0.01175 23.30 0.00136 8737.76152 0.00 0.00011 Tables 5e10, highlighting the lower values (minimum, mean and
GA-FCM 0.01374 10.31 0.00682 8737.76152 0.00 0.00011 maximum).
PSO-FCM 0.01284 16.19 0.00429 8737.76153 0.00 0.00010
As shown, for all six tested datasets, the HFCM algorithm ach-
ieved, with a shorter average time, the best solution (lowest cost),
or equivalent to the one found by the FCM algorithm.
There was a small divergence between the final centroid found
by the FCM algorithm and the one found by the HFCM algorithms,
5. Conclusions
although these, among themselves achieved the same result (with
a divergence only in the fourth decimal place for Var. 1 of the DE-
In the experiments with the new proposed algorithm there was
FCM algorithm). Although this was not sufficiently significant to
an increase in the training velocity of the FCM algorithm, which
affect the value of the objective function (at least not to five decimal
required fewer iterations and, consequently, less time to achieve
places).
the same value or better for the objective function (solution cost).
The two resulting clusters for the solar energy facilities location
4.1. Statistical comparison problem showed statistical characteristics that validate the KDD
methodology employed, as Cluster 1 had higher means for the four
To analyze the difference between the clusters, Table 4 shows initial variables of the problem and Cluster 2 showed better
Table 4
Statistic for the clusters found.
Cluster 1 (836 instances) Cluster 2 (4227 instances)
Var. 1 Var. 2 Var. 3 Var. 4 Var. 1 Var. 2 Var. 3 Var. 4
Minimum 100.80 0.00 3.47 4.17 100.04 0.00 2.17 2.86

Mean 347.74 2.50 4.76 5.20 255.50 2.21 3.30 4.21
Maximum 996.00 63.84 7.75 6.68 999.56 49.88 4.36 4.79
Standard Deviation 236.30 5.54 0.86 0.48 147.32 4.07 0.17 0.12
Variance 55,839.95 30.66 0.73 0.23 21,702.83 16.54 0.03 0.01
The "bold" means the best values for "Means" (the highest values for Cluster 1, variables (Var.) 1, 3 and 4, and smallest value for Cluster 2, variable 2) and "Standard Deviation"
(the smallest values).
Table 5
Results with Aggregation dataset.
Dataset Parameters Reference
Aggregation N ¼ 788; D ¼ 2; C ¼ 7 (Gionis et al., 2007)
Time Solution Cost
Min. Mean Max. Std. Min. Mean Max. Std.
FCM 0.0138 0.0184 0.0267 0.0026 6889.2553 6889.2556 6889.2558 0.0001

DE-FCM 0.0108 0.0164 0.0225 0.0025 6889.2548 6889.2555 6889.2558 0.0001
GA-FCM 0.0108 0.0169 0.0276 0.0037 6889.2553 6889.2556 6889.2558 0.0001
PSO-FCM 0.0096 0.0164 0.0263 0.0029 6889.2553 6889.2556 6889.2557 0.0001
Table 6
Results with Compound dataset.
Compound N ¼ 399; D ¼ 2; C ¼ 6 (Zahn, 1971)
Time Solution Cost
FCM 0.0062 0.0120 0.0226 0.0036 2272.2842 2272.2842 2272.2843 0.0000

DE-FCM 0.0053 0.0118 0.0237 0.0042 2272.2842 2272.2843 2272.2843 0.0000
GA-FCM 0.0050 0.0114 0.0345 0.0048 2272.2842 2272.2843 2272.2844 0.0000
PSO-FCM 0.0052 0.0117 0.0251 0.0044 2272.2841 2272.2843 2272.2845 0.0000
Table 7
Results with D31 dataset.
D31 N ¼ 3100; D ¼ 2; C ¼ 31 (Veenman et al., 2002)
Time Solution Cost
FCM 0.1416 0.2917 0.4336 0.0823 2018.2199 2102.5906 2143.6115 45.0564

DE-FCM 0.0724 0.2430 0.4468 0.1031 1927.2967 2157.4849 2251.2808 69.0257
GA-FCM 0.0951 0.2389 0.4383 0.0905 1927.2964 2102.8785 2177.2867 59.5833
PSO-FCM 0.1086 0.2526 0.4370 0.0980 1927.2965 2093.8631 2149.7790 48.7054
Table 8
Results with T4.8k dataset.
T4.8k N ¼ 8000; D ¼ 2; C ¼ 6 (Karypis et al., 1999)
Time Solution Cost
FCM 0.1149 0.1564 0.2210 0.0214 1.7621Eþ07 1.7621Eþ07 1.7621Eþ07 0.0001

DE-FCM 0.0856 0.1316 0.2291 0.0234 1.7621Eþ07 1.7621Eþ07 1.7621Eþ07 0.0001
GA-FCM 0.0850 0.1386 0.2306 0.0299 1.7621Eþ07 1.7621Eþ07 1.7621Eþ07 0.0001
PSO-FCM 0.0905 0.1378 0.2152 0.0263 1.7621Eþ07 1.7621Eþ07 1.7621Eþ07 0.0001
Table 9
Results with Credit Card dataset.
Credit Card N ¼ 30000; D ¼ 23; C ¼ 2 (Yeh and Lien, 2009)
Time Solution Cost
FCM 0.4846 0.6766 0.7738 0.0475 5.9365Eþ14 5.9365Eþ14 5.9365Eþ14 1.3211

DE-FCM 0.0802 0.4947 0.7694 0.2514 5.9365Eþ14 5.9365Eþ14 5.9365Eþ14 1.3321
GA-FCM 0.0804 0.5231 0.7257 0.1971 5.9365Eþ14 5.9365Eþ14 5.9365Eþ14 1.3147
PSO-FCM 0.0821 0.5999 0.7726 0.1466 5.9365Eþ14 5.9365Eþ14 5.9365Eþ14 1.3389
Table 10
Results with Wine Quality dataset.
Wine Quality N ¼ 4898; D ¼ 11; C ¼ 7 (Cortez et al., 2009)
Time Solution Cost
FCM 0.1136 0.1693 0.1952 0.0166 6.5131Eþ05 6.5131Eþ05 6.5131Eþ05 0.0039

DE-FCM 0.0602 0.1524 0.1905 0.0288 6.5131Eþ05 6.5131Eþ05 6.5131Eþ05 1.4520
GA-FCM 0.0740 0.1410 0.1903 0.0329 6.5131Eþ05 6.5131Eþ05 6.5131Eþ05 0.8123
PSO-FCM 0.0501 0.1474 0.1925 0.0336 6.5131Eþ05 6.5131Eþ05 6.5131Eþ05 0.9910
compactness, both in terms of the variables analyzed and in rela-

tion to the geographical location of their instances (see Fig. 10). It
should be highlighted that the variable “distance to the trans-
mission lines” should have shown a lower mean in Cluster 1, for an
ideal clustering, as it is a variable of the type “the lower the better”.
Therefore, Cluster 1 (black dots in Fig. 10) represents the locations
recommended for the installation of solar power plants, as it has
the highest average areas and highest rate of solar irradiance, var-
iables that had the greatest impact on the result (see Table 4).
Meanwhile, the distance to the transmission lines had the lowest
impact.
There is nothing to stop the installations being made in Cluster 2
(red dots in Fig. 10), especially because projects have already been
concluded and are in the implementation phase in this region (see
Fig. 11). However, we reached the conclusion that it is more effec- Fig. 12. Most cited benefits.
tive, both financially (due to the area available) and technically (due
to the solar irradiance), to use the areas belonging to Cluster 1.
It is important to highlight the importance of the study in million tons of waste or 193,000,000 gallons of gasoline. Regarding
relation to the better allocation of resources available for renewable jobs, the 2017 Solar Jobs Census estimated that 250,271 workers
energy (in relation to other conventional forms of power, such as were employed in the solar energy generation industry in the
fossil fuels). With clustering, it is possible to determine a priori the United States, of which 129,424 are in the solar panel installation
locations with the highest potential and, respectively, return on sector (52% of the total). In 2010, the total number of employees in
investment. The main benefits of installing solar energy plants the solar energy sector was 93,502, a rise of 270% during the period
include savings on the cost of energy from non-renewable sources, in question. This number will rise as more projects are imple-
followed by environmental gains, with reduced direct and indirect mented, with jobs being the third most frequently cited benefit for
emissions of greenhouse gases, and job creation. Another impor- the projects that have already been concluded (see Fig. 12).
tant benefit is the cleaning of areas that were once unproductive The application of the proposed methodology is valid for any
and contaminated and posed a risk to health and the environment, region of the globe that has a database available with the variables
as well as depreciating the region in which they are located. As a of interest and legislation that allows the recovery of abandoned
further economic benefit specifically for the United States, we can areas.
cite the state subsidies for the states, municipalities and businesses Fig. 12 summarizes the most frequently cited benefits, in the
that invest in clean energies. One such benefit is the Solar units that have already been installed, by their respective stake-
Renewable Energy Certificate (SREC), a specific intangible energy holders (EPA, 2016), the main ones being economic, environmental
commodity for the generation of solar energy, which is issued when and social (PILOT and REC Revenues stand for Payment In Lieu Of
1 MWh of solar energy is produced that can be sold on the market Taxes and Revenue Estimating Conference, respectively).
(Bird et al., 2011). Suggestions for future works include adding additional variables
To illustrate this, the 213 renewable energy generation units to the model, such as contamination level and cost of recovery and
(with a cumulative installed capacity of 1235 MW) already installed the expansion of the model to other energy sectors, such as wind
by the RE-Powering America's Land project in a total of 40 states energy, biomass and geothermal power. For this purpose, it is
and territories (Fig. 11 show these units, with five highlighted necessary to add variables related to each energy matrix, for
areas), have the potential to reduce the emission of greenhouse example, wind velocity for wind energy or the availability of
gases by 1.7 million tons of CO2 per year. This is equivalent to half a methane for landfill gas energy.
Fig. 11. Installed renewable energy facilities.

References Guha, S., Rastogi, R., Shim, K., 2000. Rock: a robust clustering algorithm for cate-
gorical attributes. Inf. Syst. 25 (5), 345e366.
Guha, S., Rastogi, R., Shim, K., 1998. CURE: an efficient clustering algorithm for large
Aggarwal, C.C., 2015. Data Mining. Springer, Cham.
databases. In: Proceedings of the 1998 ACM SIGMOD International Conference
Almeida, C.M.V.B., Agostinho, F., Huisingh, D., Giannetti, B.F., 2017. Cleaner Pro-
on Management of Data - SIGMOD '98. ACM, New York, pp. 73e84.
duction towards a sustainable transition. J. Clean. Prod. 142, 1e7.
Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. On clustering validation tech-
Apostolidis, N., Hutton, N., 2006. Integrated water management in brownfield sites:
niques. J. Intell. Inf. Syst. 17 (2e3), 107e145.
more opportunities than you think. Desalination 188 (1e3), 169e175.
Hartmann, B., To €ro
€k, S., Bo€rcso € k, E., Ola
hne
Groma, V., 2014. Multi-objective method
Arbelaitz, O., Gurrutxaga, I., Muguerza, J., Pe rez, J.M., Perona, I., 2013. An extensive
for energy purpose redevelopment of brownfield sites. J. Clean. Prod. 82,
comparative study of cluster validity indices. Pattern Recogn. 46 (1), 243e256.
202e212.
Babu, G.P., Murty, M.N., 1994. Clustering with evolution strategies. Pattern Recogn.
Hathaway, R.J., Bezdek, J.C., 1991. Grouped coordinate minimization using Newton's
27 (2), 321e329.
method for inexact minimization in one vector coordinate. J. Optim. Theor.
Babuska, R., 1998. Fuzzy Modeling for Control. Springer, New York.
~ os, R., Manzano-Agugliaro, F., Montoya, F.G., Gil, C., Alcayde, A., Go mez, J., 2011. Appl. 71 (3), 503e516.
Ban
Hinneburg, A., Keim, D.A., 2003. A general approach to clustering in large databases
Optimization methods applied to renewable and sustainable energy: a review.
with noise. Knowl. Inf. Off. Syst. 5 (4), 387e415.
Renew. Sustain. Energy Rev. 15 (4), 1753e1766.
Hinneburg, A., Keim, D.A., 1998. An efficient approach to clustering in large
Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V., 2001. Support vector clustering.
multimedia databases with noise. In: Proceedings of 4th International Confer-
J. Mach. Learn. Res. 2, 125e137.
€ ence on Knowledge Discovery and Data Mining - KDD '98. AAAI Press, New
Bergius, K., Oberg, T., 2007. Initial screening of contaminated land: a comparison of
York, pp. 58e65.
US and Swedish methods. Environ. Manag. 39 (2), 226e234.
Huang, Z., 1998. Extensions to the k-means algorithm for clustering large data sets
Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithms.
with categorical values. Data Min. Knowl. Discov. 2 (3), 283e304.
Springer, New York.
Kalogeratos, A., Likas, A., 2011. Document clustering using synthetic cluster pro-
Bezdek, J.C., Ehrlich, R., Full, W., 1984. FCM: the fuzzy c-means clustering algorithm.
totypes. Data Knowl. Eng. Times 70 (3), 284e306.
Comput. Geosci. 10 (2e3), 191e203.
Karray, F.O., Silva, C. De, 2004. Soft Computing and Intelligent Systems Design:
Bezdek, J.C., Hathaway, R.J., Howard, R.E., Wilson, C.A., Windham, M.P., 1987. Local
Theory, Tools, and Applications. Addison-Wesley, Harlow.
convergence analysis of a grouped variable version of coordinate descent.
Karypis, G., Han, Eui-Hong, Kumar, V., 1999. Chameleon: hierarchical clustering
J. Optim. Theor. Appl. 54 (3), 471e477.
using dynamic modeling. Computer 32 (8), 68e75 (Long. Beach. Calif).
Bird, L., Heeter, J., Kreycik, C., 2011. Solar Renewable Energy Certificate (SREC)
Kaufman, M.M., Rogers, D.T., Murray, K.S., 2005. An empirical model for estimating
Markets: Status and Trends.
remediation costs at contaminated sites. Water Air Soil Pollut. 167 (1e4),
Bramer, M., 2016. Principles of data mining. In: Undergraduate Topics in Computer
365e386.
Science, third ed. Springer, London.
Kennedy, J., Eberhart, R., 1995. Particle swarm optimization. In: Proceedings of ICNN
Brownlee, J., 2011. Clever Algorithms. LuLu.
'95-international Conference on Neural Networks. IEEE, Perth, Australia,
Bruneau, P., Pinheiro, P., Broeksema, B., Otjacques, B., 2015. Cluster Sculptor, an
pp. 1942e1948.
interactive visual clustering system. Neurocomputing 150 (B), 627e644.
Kennedy, J., Eberhart, R.C., Shi, Y., 2001. Swarm Intelligence, Science. Morgan
Cadez, S., Czerny, A., 2016. Climate change mitigation strategies in carbon-intensive
Kaufmann.
firms. J. Clean. Prod. 112, 4132e4143.
Khan, S.S., Ahmad, A., 2013. Cluster center initialization algorithm for K-modes
Cao, K., Guan, H., 2007. Brownfield redevelopment toward sustainable urban land
clustering. Expert Syst. Appl. 40 (18), 7444e7456.
use in China. Chin. Geogr. Sci. 17 (2), 127e134.
Kovacs, H., Szemmelveisz, K., 2017. Disposal options for polluted plants grown on
Chakraborty, U.K. (Ed.), 2008. Advances in Differential Evolution, Studies in
heavy metal contaminated brownfield lands: a review. Chemosphere 166,
Computational Intelligence. Springer, Berlin, Heidelberg.
8e20.
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J., 2009. Modeling wine prefer-
Kruse, R., Borgelt, C., Braune, C., Mostaghim, S., Steinbrecher, M., 2016. Computa-
ences by data mining from physicochemical properties. Decis. Support Syst. 47
tional intelligence: a methodological introduction. In: Texts in Computer Sci-
(4), 547e553.
ence, Texts in Computer Science, second ed. Springer, London.
de Sousa, C.A., 2003. Turning brownfields into green space in the city of toronto.
Lambin, E.F., Meyfroidt, P., 2011. Global land use change, economic globalization,
Landsc. Urban Plan. 62 (4), 181e198.
and the looming land scarcity. Proc. Natl. Acad. Sci. U. S. A 108 (9), 3465e3472.
Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete
Li, X., Jiao, W., Xiao, R., Chen, W., Liu, W., 2017. Contaminated sites in China:
data via the EM algorithm. J. R. Stat. Soc. Ser. B - Methodol. 39 (1), 1e38.
countermeasures of provincial governments. J. Clean. Prod. 147, 485e496.
Deng, Z., Choi, K.-S., Chung, F.-L., Wang, S., 2010. Enhanced soft subspace clustering
Lima, F., Ferreira, P., Vieira, F., 2013. Strategic impact management of wind power
integrating within-cluster and between-cluster information. Pattern Recogn. 43
projects. Renew. Sustain. Energy Rev. 25, 277e290.
(3), 767e781.
Lu, S., Shang, Y., Li, Y., 2017. A research on the application of fuzzy iteration clus-
Desarbo, W.S., 1982. Gennclus: new models for general nonhierarchical clustering
tering in the water conservancy project. J. Clean. Prod. 151, 356e360.
analysis. Psychometrika 47 (4), 449e475.
MacQueen, J., 1967. Some methods for classification and analysis of multivariate
Dunn, J.C., 1974. A fuzzy relative of the ISODATA process and its use in detecting
observations. In: Proceedings of the Fifth Berkeley Symposium on Mathemat-
compact well-separated clusters. J. Cybern. 3 (3), 32e57.
ical Statistics and Probability. University of California Press, Berkeley,
Eberhart, R.C., Shi, Y., 2007. Computational Intelligence: Concepts to Implementa-
pp. 281e297.
tions. Morgan Kaufmann, Burlington.
Manan, Z.A., Mohd Nawi, W.N.R., Wan Alwi, S.R., Klemes, J.J., 2017. Advances in
Engelbrecht, A.P., 2007. Computational Intelligence: an Introduction, second ed.
Process Integration research for CO 2 emission reduction e a review. J. Clean.
Wiley.
Prod. 167, 1e13.
EPA, 2016. RE-powering America's Land Initiative: Benefits Matrix [WWW Docu-
Manzano-Agugliaro, F., Sanchez-Muros, M.J., Barroso, F.G., Martínez-Sa nchez, A.,
ment]. https://goo.gl/XMov1T. (Accessed 3 March 2017).
rez-Ban
Rojo, S., Pe ~o n, C., 2012. Insects for biodiesel production. Renew. Sustain.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., 1996. A density-based algorithm for
Energy Rev. 16 (6), 3744e3753.
discovering clusters in large spatial databases with noise. In: Proceedings of the
Mok, P.Y., Huang, H.Q., Kwok, Y.L., Au, J.S., 2012. A robust adaptive clustering
2nd International Conference on Knowledge Discovery and Data Mining - KDD
analysis method for automatic identification of clusters. Pattern Recogn. 45 (8),
'96. AAAI Press, Portland, pp. 226e231.
3017e3033.
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A.Y., Foufou, S.,
Morio, M., Sch€ adler, S., Finkel, M., 2013. Applying a multi-criteria genetic algorithm
Bouras, A., 2014. A survey of clustering algorithms for big data: taxonomy and
framework for brownfield reuse optimization: improving redevelopment op-
empirical analysis. IEEE Trans. Emerg. Top. Comput. 2 (3), 267e279.
tions based on stakeholder preferences. J. Environ. Manag. 130, 331e346.
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., 1996. From data mining to knowledge
Nanda, S.J., Panda, G., 2014. A survey on nature inspired metaheuristic algorithms
discovery in databases. AI Mag. 37e54.
ndez-García, A., Rojas, E., Pe rez, M., Silva, R., Hernandez-Escobedo, Q., Man- for partitional clustering. Swarm Evol. Comput. 16, 1e18.
Ferna
Ng, R.T., Han, J., 1994. Efficient and effective clustering methods for spatial data
zano-Agugliaro, F., 2015. A parabolic-trough collector for cleaner industrial
mining. In: Proceedings of the 20th International Conference on Very Large
process heat. J. Clean. Prod. 89, 272e285.
Data Bases - VLDB '94. Morgan Kaufmann, San Francisco, pp. 144e155.
Gamarra, C., Guerrero, J.M., Montero, E., 2016. A knowledge discovery in databases
Nuissl, H., Schroeter-Schlaack, C., 2009. On the economic approach to the
approach for industrial microgrid planning. Renew. Sustain. Energy Rev. 60,
containment of land consumption. Environ. Sci. Pol. 12 (3), 270e280.
615e630.
Onat, N.C., Kucukvar, M., Tatari, O., 2014. Integrating triple bottom line input-output
Gionis, A., Mannila, H., Tsaparas, P., 2007. Clustering aggregation. ACM Trans. Knowl.
analysis into life cycle sustainability assessment framework: the case for US
Discov. Data 1 (1), 1e30.
buildings. Int. J. Life Cycle Assess. 19 (8), 1488e1505.
Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization & Machine
Orriols-Puig, A., Martínez-Lo pez, F.J., Casillas, J., Lee, N., 2013. Unsupervised KDD to
Learning. Addison-Wesley, Boston.
lez, M.O.A., Gonçalves, J.S., Vasconcelos, R.M., 2017. Sustainable development: creatively support managers' decision making with fuzzy association rules: a
Gonza
distribution channel application. Ind. Market. Manag. 42 (4), 532e543.
case study in the implementation of renewable energy in Brazil. J. Clean. Prod.
Ozkan, I., Turksen, I.B., 2007. Upper and lower values for the level of fuzziness in
142, 461e475.
FCM. Inf. Sci. 177 (23), 5143e5152.
Greenberg, M., Lewis, M.J., 2000. Brownfields redevelopment, preferences and
Pal, N.R., Bezdek, J.C., Tsao, E.C.-K., 1993. Generalized clustering networks and
public involvement: a case study of an ethically mixed neighbourhood. Urban
Kohonen's self-organizing scheme. IEEE Trans. Neural Networks 4 (4), 549e557.
Stud. 37 (13), 2501e2514.
Perea-Moreno, A.-J., García-Cruz, A., Novas, N., Manzano-Agugliaro, F., 2017. Rooftop
analysis for solar flat plate collector assessment to achieving sustainability document], annual ed. Code Fed. Regul. https://goo.gl/UBCLDF. (Accessed 3
energy. J. Clean. Prod. 148, 545e554. March 2017).
Poli, R., 2008. Analysis of the publications on the applications of particle swarm U.S. Government Publishing Office, 2002. Public Law 107-118-Small Business Lia-
optimisation. J. Artif. Evol. Appl. 2008, 1e10. bility Relief and Brownfields Revitalization Act [WWW Document]. H.R. 2869.
Price, K.V., Storn, R.M., Lampinen, J.A., 2005. Differential Evolution, Natural https://goo.gl/UK19n2. (Accessed 3 March 2017).
Computing Series. Springer, Berlin. van Straalen, N.M., 2002. Assessment of soil contamination: a functional perspec-
Ramon-Gonen, R., Gelbard, R., 2017. Cluster evolution analysis: identification and tive. Biodegradation 13 (1), 41e52.
detection of similar clusters and migration patterns. Expert Syst. Appl. 83, Veenman, C.J., Reinders, M.J.T., Backer, E., 2002. A maximum variance cluster al-
363e378. gorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24 (9), 1273e1280.
Roiger, R.J., 2017. Data Mining: a Tutorial-based Primer, second ed. CRC Press. Wang, W., Yang, J., Muntz, R.R., 1997. STING: a statistical information grid approach
Rong, L., Zhang, C., Jin, D., Dai, Z., 2017. Assessment of the potential utilization of to spatial data mining. In: Proceedings of 23rd International Conference on Very
municipal solid waste from a closed irregular landfill. J. Clean. Prod. 142, Large Data Bases - VLDB ’97. Morgan Kaufmann, San Francisco, pp. 186e195.
413e419. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J., 2017. Data Mining: Practical Machine
Saki, F., Kehtarnavaz, N., 2016. Online frame-based clustering with unknown Learning Tools and Techniques, fourth ed. Morgan Kaufmann.
number of clusters. Pattern Recogn. 57, 70e83. Wu, K.-L., 2012. Analysis of parameter selections for fuzzy c-means. Pattern Recogn.
Shahbaba, M., Beheshti, S., 2014. MACE-means clustering. Signal Process. 105, 45 (1), 407e415.
216e225. Xing, B., Gao, W.-J., 2014. Innovative Computational Intelligence: a Rough Guide to
Sheikholeslami, G., Chatterjee, S., Zhang, A., 1998. Wavecluster: a multi-resolution 134 Clever Algorithms, Intelligent Systems Reference Library. Springer, Cham.
clustering approach for very large spatial databases. In: Proceedings of 24rd Xu, R., Wunsch II, D., 2005. Survey of clustering algorithms. IEEE Trans. Neural
International Conference on Very Large Data Bases - VLDB '98. Morgan Kauf- Networks 16 (3), 645e678.
mann, San Francisco, pp. 428e439. Yeh, I.-C., Lien, C., 2009. The comparisons of data mining techniques for the pre-
Simas, M., Pacca, S., 2013. Energia eo lica, geraç~
ao de empregos e desenvolvimento dictive accuracy of probability of default of credit card clients. Expert Syst. Appl.
sustenta vel. Estud. Avançados 27 (77), 99e116. 36 (2), 2473e2480.
Steiner, M.T.A., Datta, D., Steiner Neto, P.J., Scarpin, C.T., Rui Figueira, J., 2015. Multi- Zahn, C.T., 1971. Graph-theoretical methods for detecting and describing gestalt
objective optimization in partitioning the healthcare system of Parana State in clusters. IEEE Trans. Comput. C 20 (1), 68e86.
Brazil. Omega 52, 53e64. Zgurovsky, M.Z., Zaychenko, Y.P., 2017. The Fundamentals of Computational Intel-
Storn, R., 1996. On the usage of differential evolution for function optimization. In: ligence: System Approach, Studies in Computational Intelligence. Springer,
Proceedings of North American Fuzzy Information Processing. IEEE, Cham.
pp. 519e523. Zhang, T., Ramakrishnan, R., Livny, M., 1996. BIRCH: an efficient data clustering
Storn, R., Price, K., 1997. Differential evolution e a simple and efficient heuristic for databases method for very large. In: Proceedings of the 1996 ACM SIGMOD
global optimization over continuous spaces. J. Global Optim. 11 (4), 341e359. International Conference on Management of Data - SIGMOD '96. ACM, New
Theodoridis, S., Koutroumbas, K., 2009. Pattern Recognition, fourth ed. Academic York, pp. 103e114.
Press. Zhou, K., Fu, C., Yang, S., 2014. Fuzziness parameter selection in fuzzy c-means: the
U.S. Government Publishing Office, 2015. 42 U.S.C. 9601-9628-Hazardous Sub- perspective of cluster validation. Sci. China Inf. Sci. 57 (11), 1e8.
stances Releases, Liability, Compensation [WWW Document]. United States Zhou, K., Yang, S., Shao, Z., 2017. Household monthly electricity consumption
Code, 2012 Ed. Suppl. 3, Title 42-Public Heal. Welfare, Subchapter I. https://goo. pattern mining: a fuzzy clustering-based model and a case study. J. Clean. Prod.
gl/y0ki6N. (Accessed 3 March 2017). 141, 900e908.
U.S. Government Publishing Office, 2011. 40 C.F.R. 239-282-Solid Wastes [WWW

Franco 2018

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Franco 2018

Uploaded by

Copyright:

Available Formats

Journal of Cleaner Production 191 (2018) 445e457

Contents lists available at ScienceDirect

Journal of Cleaner Production

Clustering of solar energy facilities using a hybrid fuzzy c-means

1. Introduction They also include brownﬁelds, deﬁned as industrial or commercial

Table 1 global optimization metaheuristics DE, GA and PSO.

Type Algorithm Reference

Fig. 1. Formalization of FCM algorithm.

where GA is an adaptive global optimization technique belonging to

Fig. 2. Formalization of DE algorithm.

Fig. 3. Formalization of the GA algorithm.

Fig. 4. Formalization of the PSO algorithm.

meteorological data, and the RE-Powering America's Land project

Fig. 5. Flowchart of the methodology.

which guarantees a more rapid convergence with it. Table 3 shows

concept of degree of participation or belonging from fuzzy clus-

4. Experiments and results

The suggested proposal succeeded in reducing the number of

Fig. 8. Formalization of the HFCM algorithm.

4.2. Instances from the literature

Cluster 1 (836 instances) Cluster 2 (4227 instances)

Var. 1 Var. 2 Var. 3 Var. 4 Var. 1 Var. 2 Var. 3 Var. 4

Minimum 100.80 0.00 3.47 4.17 100.04 0.00 2.17 2.86

Dataset Parameters Reference

Aggregation N ¼ 788; D ¼ 2; C ¼ 7 (Gionis et al., 2007)

Time Solution Cost

Min. Mean Max. Std. Min. Mean Max. Std.

FCM 0.0138 0.0184 0.0267 0.0026 6889.2553 6889.2556 6889.2558 0.0001

Dataset Parameters Reference

Compound N ¼ 399; D ¼ 2; C ¼ 6 (Zahn, 1971)

Time Solution Cost

Min. Mean Max. Std. Min. Mean Max. Std.

FCM 0.0062 0.0120 0.0226 0.0036 2272.2842 2272.2842 2272.2843 0.0000

Dataset Parameters Reference

D31 N ¼ 3100; D ¼ 2; C ¼ 31 (Veenman et al., 2002)

Time Solution Cost

Min. Mean Max. Std. Min. Mean Max. Std.

FCM 0.1416 0.2917 0.4336 0.0823 2018.2199 2102.5906 2143.6115 45.0564

Dataset Parameters Reference

T4.8k N ¼ 8000; D ¼ 2; C ¼ 6 (Karypis et al., 1999)

Time Solution Cost

Min. Mean Max. Std. Min. Mean Max. Std.

FCM 0.1149 0.1564 0.2210 0.0214 1.7621Eþ07 1.7621Eþ07 1.7621Eþ07 0.0001

Dataset Parameters Reference

Credit Card N ¼ 30000; D ¼ 23; C ¼ 2 (Yeh and Lien, 2009)

Time Solution Cost

Min. Mean Max. Std. Min. Mean Max. Std.

FCM 0.4846 0.6766 0.7738 0.0475 5.9365Eþ14 5.9365Eþ14 5.9365Eþ14 1.3211

Dataset Parameters Reference

Wine Quality N ¼ 4898; D ¼ 11; C ¼ 7 (Cortez et al., 2009)

Time Solution Cost

Min. Mean Max. Std. Min. Mean Max. Std.

FCM 0.1136 0.1693 0.1952 0.0166 6.5131Eþ05 6.5131Eþ05 6.5131Eþ05 0.0039

compactness, both in terms of the variables analyzed and in rela-

Fig. 11. Installed renewable energy facilities.

You might also like