Professional Documents
Culture Documents
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys
a r t i c l e i n f o a b s t r a c t
Article history: One of the most challenging problems in the semantic web eld consists of computing the semantic sim-
Received 12 December 2011 ilarity between different terms. The problem here is the lack of accurate domain-specic dictionaries,
Received in revised form 9 May 2012 such as biomedical, nancial or any other particular and dynamic eld. In this article we propose a
Accepted 13 July 2012
new approach which uses different existing semantic similarity methods to obtain precise results in
Available online xxxx
the biomedical domain. Specically, we have developed an evolutionary algorithm which uses informa-
tion provided by different semantic similarity metrics. Our results have been validated against a variety
Keywords:
of biomedical datasets and different collections of similarity functions. The proposed system provides
Semantic similarity
Evolutionary computation
very high quality results when compared against similarity ratings provided by human experts (in terms
Semantic web of Pearson correlation coefcient) surpassing the results of other relevant works previously published in
Synonym recognition the literature.
Differential evolution 2012 Elsevier B.V. All rights reserved.
0950-7051/$ - see front matter 2012 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.knosys.2012.07.005
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
2 J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx
Euclidean distance is not suitable for all types of input data, such as better in this case because we combine information provided by
the case in which we have to compute the distance of different different methods.
word meanings. For this reason, most of previous works have been
focused on designing new semantic similarity measures.
3. Differential evolution for synonym recognition
Traditional semantic similarity measures techniques use some
kind of dictionaries in order to compute the degree of similarity
In this section we describe the problem and the proposed solu-
between words. The problem here is that most works use general tion. Our approach is based on the similarity scores provided by
purpose resources such as WordNet [20]. However, these sources
different atomic similarity functions. The evolutionary algorithm
offer limited coverage of biomedical terms. For this reason, several works as a hyper-heuristics which assigns different coefcient val-
resources have been developed in recent years to improve
ues to the similarity scores obtained from the pool of functions in-
semantic similarity measures, for example, Medical Subject cluded in the system. At the beginning of the process, all functions
Headings (MeSHs) [22] or Unied Medical Language System
(or metrics) evenly contribute to calculate the semantic similarity
(UMLS) [21]. Semantic similarity measures fall into three main value of a specic pair of terms. Then, the system evolves so that
categories:
the functions which provide the most similar values to the human
experts opinion have the highest coefcients. Fig. 1 shows the
Path-based measures are based on dictionaries or thesaurus. If a
working diagram of the proposed approach.
given word has two or more meanings in those sources, then The differential evolution (DE) algorithm [29] was chosen
multiple paths may exist between the two words. The problem
among other candidates because, after a preliminary study, we
with this method is that it relies on the notion that all links in conclude that DE obtained very competitive results for the problem
the taxonomy are at uniform distances.
addressed. The reason lies in how the algorithm makes the solu-
Information content measures are based on frequency counts of tions evolve. Our system can be considered as a hyper-heuristics
concepts when they are found in the corpus text (Pedersen
(HHs) which uses differential evolution, HH(DE), to assign to each
et al. [20]). These measures assign higher values to specic con- similarity function a specic coefcient. These values modify the
cepts (e.g. pitch fork), and lower scores to more general terms
relevance of each function. Differential evolution performs the
(e.g. idea). search of local optima by making small additions and subtractions
Feature based measures consider the similarity between terms
between the members of its population (see Section 3.3). This fea-
according to their properties. In general, it is possible to esti- ture ts perfectly the problem, because the algorithm works with
mate semantic similarity according to the number of common
the scores provided by the similarity functions (Fig. 1). In fact,
features. For example, these methods could be based on the
the individual is dened as an array of oating point values, s,
relations between similar terms according to concept descrip-
where s(fx) is the coefcient which modies the result provided
tions retrieved from dictionaries.
by the similarity function fx. Fig. 2 illustrates the representation
of the individual used in this work.
As was previously mentioned, the problem of traditional
semantic similarity metrics is that there are not complete and up-
3.1. The synonym recognition problem
dated dictionaries for specic elds. If we focus on the biomedical
domain, we have to say that several outstanding works have been
Given two text expressions a and b, the problem consists of pro-
proposed in recent years. For instance, Pirr [22] proposed a new
viding the degree of synonymy between both words. However,
information content measure using the MeSH biomedical ontology
synonym recognition usually extends beyond synonymy and also
which successfully improved existing similarity methods. How-
involves semantic similarity measurements. According to Bollegala
ever, our study improves the results obtained in that work by using
et al. [3], a certain degree of semantic similarity is observed not
a combination of several similarity functions. Al-Mubai and Ngu-
only between synonyms (e.g. lift and elevator), but also between
yen [1] also proposed an ontology-based semantic similarity mea-
metonyms (e.g. car and wheel), hyponyms (leopard and cat), re-
sure and applied it to the biomedical domain. This proposal is
lated words (e.g. blood and hospital), and even between antonyms
based on the path length between concept nodes and the depth
(e.g. day and night).
of each term in the ontology hierarchy tree. Our results are also
nearer to human judgments in this case. Furthermore, there are
important works by Snchez et al. [24,26] in which several seman-
tic similarity measures based on approximating concept semantics
in terms of information content are successfully presented. Once
again, our technique obtains more precise results. Finally, Pedersen
et al. [21] implemented and evaluated a variety of semantic simi-
larity measures based on ontologies and terminologies found in
the Unied Medical Language System (UMLS). Our results are also Fig. 2. Individual representation.
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx 3
Table 1
Classication of the most relevant similarity metrics.
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
4 J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx
Table 2
Algorithm 1. Pseudo-code for the DE algorithm Optimal parameter setting.
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx 5
Table 3
Similarity results obtained by our system (last column) against other results published.
Table 4 better results obtained (see Table 7). Although the global correla-
Pearson correlation between computational methods and human tion of a particular function is not very good, that particular func-
judgments.
tion is important in our system, since our hyper-heuristics
Similarity function Correlation modies its coefcient so that the function provides relevant infor-
Resnik 0.721 mation to the global system.
Lin 0.718 Once again, we can see that our HH(DE) improves the results
J&C 0.718 provided by the basic functions because it uses a linear combina-
Li 0.707
tion of every function (see Section 3). Although a specic function
P&S 0.725
obtains bad results, it is difcult that every metric provides poor
HH(DE) 0.732
results.
As may be seen in Table 6, our proposal improves signicantly
the rest of metrics. In fact, the correlation value reached (0.809)
is even higher than the result obtained in the previous set of exper-
Table 4 presents the correlation of Pearson between the meth-
iments (0.732, Table 4), although in that case, every similarity
ods appeared in Table 3 [22] and the human expert score. As can
function provided better results. The scores provided by every sim-
be seen, our approach provides the best results of the study.
ilarity function are worse than in the previous experiment because
HH(DE) surpasses other similarity metrics because it is more ro-
WordNet is a general purpose resource which is not very appropri-
bust than most of existing methods.
ate for the biomedical domain [25]. Therefore, we can conclude
The second step in our experimental evaluation consists of
that the higher is the number of similarity functions included in
using a different set of similarity functions to tackle the same word
our hyper-heuristics (see Fig. 1), the higher is the quality in the glo-
dataset. Table 5 presents the results obtained with every metric in-
bal results. Table 7 shows how the global quality of our method is
cluded in WordNet::similarity resources [20].
improved as more basic functions are incorporated into our pool.
Every metric used has been previously explained in Section 3.2
(Table 1). Our results appear in the last column (Table 5). Our
scores are also statistically robust since they are average results 4.2.2. Experiments with different datasets
of 100 independent executions, with very low standard deviations In this subsection we study the results provided by our system
(see Section 4.1). Although there are functions which do not obtain using other biomedical dataset [1]. The conguration of our
good correlation results (Table 6), such as HSO (0.332), JCN (0.237), system is exactly the same, therefore, we can say that the param-
LIN (0.218) or vector_pairs (0.333), they have been included in our eter setting of our HH(DE) is consistent with at least two datasets.
system (Fig. 1), because the higher number of functions used, the To our best knowledge, there are no works in which more datasets
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
6 J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx
Table 5
Similarity results obtained by our system (last column) against WordNet results.
Word pair Human expert HSO JCN WUP PATH LIN LESK RES LCH Vector_pairs Vector HH(DE)
P01 0.031 0.250 0.000 0.842 0.250 0.000 0.004 0.000 0.624 0.010 0.183 0.139
P02 0.156 0.188 0.000 0.800 0.200 0.000 0.002 0.000 0.564 0.007 0.211 0.169
P03 0.06 0.000 0.000 0.480 0.071 0.000 0.002 0.000 0.285 0.013 0.111 0.115
P04 0.156 0.125 0.092 0.720 0.250 0.524 0.010 0.000 0.436 0.080 0.326 0.113
P05 0.156 0.000 0.000 0.174 0.050 0.000 0.002 0.000 0.188 0.018 0.081 0.034
P06 0.155 0.000 0.058 0.300 0.077 0.060 0.010 0.000 0.305 0.022 0.152 0.073
P07 0.06 0.000 0.052 0.375 0.091 0.075 0.028 0.000 0.350 0.051 0.397 0.195
P08 0.031 0.000 0.000 0.609 0.100 0.000 0.002 0.000 0.376 0.011 0.121 0.134
P09 0.031 0.000 0.000 0.556 0.111 0.000 0.003 0.000 0.404 0.003 0.057 0.083
P10 0.5 0.250 0.000 0.842 0.250 0.000 0.007 0.000 0.624 0.018 0.457 0.281
P11 0.156 0.313 0.000 0.889 0.333 0.000 0.078 0.331 0.702 0.195 0.727 0.472
P12 0.406 1.000 0.000 0.900 0.333 0.000 0.019 0.619 0.702 0.221 0.097 0.199
P13 0.406 0.000 0.000 0.720 0.125 0.000 0.007 0.000 0.436 0.070 0.222 0.165
P14 0.593 0.000 0.048 0.267 0.083 0.000 0.009 0.000 0.326 0.016 0.251 0.125
P15 0.375 0.313 0.000 0.923 0.333 0.000 0.013 0.517 0.702 0.227 0.375 0.358
P16 0.5 1.000 0.044 0.182 0.053 0.000 0.105 0.612 0.202 0.041 0.396 0.435
P17 0.468 0.000 0.000 0.250 0.053 0.000 0.001 0.468 0.202 0.034 0.108 0.298
P18 0.656 0.250 0.000 0.963 0.500 0.000 0.050 0.470 0.812 0.062 0.591 0.536
P19 0.187 0.125 0.059 0.400 0.077 0.221 0.011 0.000 0.305 0.043 0.112 0.029
P20 0.437 0.000 0.108 0.571 0.100 0.471 0.051 0.601 0.376 0.074 0.329 0.450
P21 0.593 0.375 0.000 0.900 0.333 0.000 0.153 0.627 0.702 0.167 0.515 0.539
P22 0.437 0.250 0.000 0.842 0.250 0.000 0.029 0.267 0.624 0.042 0.093 0.212
P23 0.718 0.250 0.000 0.957 0.500 0.000 0.428 0.229 0.812 0.020 0.612 0.504
P24 0.75 0.000 0.000 0.720 0.125 0.000 0.038 0.595 0.436 0.034 0.116 0.446
P25 0.562 0.313 0.000 0.933 0.333 0.000 0.074 0.649 0.702 0.359 0.583 0.463
P26 0.75 0.313 0.000 0.889 0.250 0.000 0.228 0.246 0.624 0.134 0.480 0.385
P27 0.531 0.313 0.000 0.917 0.333 0.000 0.100 0.658 0.702 0.438 0.833 0.550
P28 0.625 0.000 0.000 0.636 0.111 0.000 0.029 0.000 0.404 0.125 0.309 0.164
P29 0.843 0.250 0.000 0.857 0.250 0.000 0.017 0.000 0.624 0.218 0.849 0.366
P30 0.937 0.250 1.000 0.941 0.500 1.000 0.661 1.000 0.812 0.119 0.769 0.894
P31 0.843 0.313 0.455 0.952 0.500 0.897 0.098 0.880 0.812 0.197 0.628 0.710
P32 0.875 0.313 0.402 0.947 0.500 0.861 0.152 0.861 0.812 0.050 0.419 0.562
P33 0.875 0.000 0.000 0.571 0.100 0.000 0.040 0.622 0.376 0.022 0.322 0.554
P34 0.906 1.000 0.000 1.000 1.000 0.000 1.000 0.924 1.000 0.333 1.000 0.791
P35 0.968 0.250 0.000 0.966 0.500 0.000 0.752 1.000 0.812 0.055 0.720 1.000
P36 0.875 1.000 0.000 1.000 1.000 0.000 1.000 1.000 1.000 0.500 1.000 0.720
Table 6 Table 7
Pearson correlation between computational methods Pearson correlation values for different number of
calculated in WordNet::similarity and human similarity functions. HH(DE) uses the similarity func-
judgments. tions of more quality in each case.
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx 7
Table 8
Human expert scores for another biomedical word dataset.
Table 9
Similarity results obtained by our system (last column) against the results obtained with WordNet using the second biomedical dataset.
Word pair Human expert HSO JCN WUP PATH LIN LESK RES LCH Vector_pairs Vector HH(DE)
WP01 1 1.000 0.000 1.000 1.000 0.000 1.000 0.000 1.000 0.010 0.183 1.000
WP02 0.75 0.313 0.078 0.600 0.111 0.370 0.210 0.320 0.404 0.007 0.211 0.517
WP03 0.7 0.000 0.055 0.333 0.077 0.079 0.060 0.066 0.305 0.013 0.111 0.094
WP04 0.825 1.000 0.000 1.000 1.000 0.000 0.195 0.907 1.000 0.080 0.326 0.742
WP05 0.55 0.188 0.000 0.778 0.200 0.000 0.114 0.469 0.564 0.018 0.081 0.193
WP06 0.35 0.000 0.000 0.556 0.111 0.000 0.047 0.269 0.404 0.022 0.152 0.019
WP07 0.45 0.000 0.000 0.174 0.050 0.000 0.018 0.000 0.188 0.036 0.327 0.050
WP08 0.5 0.000 0.000 0.556 0.111 0.000 0.027 0.269 0.404 0.051 0.397 0.030
WP09 0.325 0.000 0.057 0.333 0.077 0.000 0.074 0.066 0.305 0.011 0.121 0.123
WP10 0.325 0.188 0.000 0.833 0.200 0.000 0.022 1.000 0.564 0.003 0.057 0.128
WP11 0.475 0.000 0.000 0.500 0.200 0.000 0.008 0.297 0.298 0.018 0.457 0.030
WP12 0.275 0.188 0.000 0.846 0.200 0.000 0.114 0.582 0.564 0.195 0.727 0.079
WP13 0.325 0.000 0.098 0.750 0.143 0.540 0.043 0.508 0.472 0.221 0.097 0.056
WP14 0.275 0.000 0.000 0.160 0.046 0.000 0.029 0.000 0.162 0.070 0.222 0.055
WP15 0.25 0.000 0.000 0.560 0.083 0.000 0.095 0.477 0.326 0.016 0.251 0.124
WP16 0.25 0.000 0.043 0.167 0.048 0.000 0.036 0.000 0.175 0.227 0.375 0.023
WP17 0.3 0.000 0.000 0.200 0.059 0.000 0.077 0.000 0.232 0.041 0.396 0.125
WP18 0.25 0.000 0.000 0.300 0.067 0.000 0.040 0.052 0.266 0.034 0.108 0.018
WP19 0.3 0.000 0.000 0.300 0.067 0.000 0.037 0.066 0.266 0.062 0.591 0.066
WP20 0.35 0.000 0.000 0.667 0.100 0.000 0.022 0.508 0.376 0.043 0.112 0.007
WP21 0.25 0.000 0.000 0.222 0.046 0.000 0.066 0.066 0.162 0.500 1.000 0.298
WP22 0.25 0.000 0.104 0.583 0.091 0.540 0.045 0.477 0.350 0.074 0.329 0.029
WP23 0.25 0.000 0.000 0.632 0.125 0.000 0.052 0.352 0.436 0.167 0.515 0.029
WP24 0.25 0.000 0.000 0.286 0.063 0.000 0.012 0.066 0.248 0.042 0.093 0.010
WP25 0.25 0.000 0.000 0.261 0.056 0.000 0.033 0.052 0.216 0.020 0.612 0.018
WP26 0.25 0.000 0.000 0.571 0.100 0.000 0.014 0.352 0.376 0.034 0.116 0.005
WP27 0.25 0.000 0.000 0.692 0.111 0.000 0.022 0.508 0.404 0.359 0.583 0.191
WP28 0.25 0.000 0.000 0.375 0.091 0.000 0.025 0.052 0.350 0.134 0.480 0.081
WP29 0.25 0.000 0.000 0.546 0.091 0.000 0.021 0.320 0.350 0.438 0.833 0.221
WP30 0.25 0.000 0.000 0.250 0.077 0.000 0.048 0.000 0.305 0.125 0.309 0.170
Table 10 shows a wide range of scores for the correlation coef- are often good; however, they are not successful in our case be-
cient. This is mainly due to the fact that the correlation not only cause the terms examined are not extracted from high quality cor-
depends on the strategy implemented but on the amount and qual- pora. A general purpose dictionary (WordNet) does not containing
ity of the background data. For example, metrics based on vectors many biomedical terms, so these functions are not precise in this
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005
8 J.M. Chaves-Gonzlez, J. Martnez-Gil / Knowledge-Based Systems xxx (2012) xxxxxx
Metrics Correlation
References
HSO 0.701
JCN 0.111 [1] H. Al-Mubaid, H.A. Nguyen, Measuring semantic similarity between
WUP 0.483 biomedical concepts within multiple ontologies, IEEE Transactions on
PATH 0.753 Systems, Man, and Cybernetics, Part C 39 (4) (2009) 389398.
LIN 0.077 [2] S. Banerjee, T. Pedersen, Extended gloss overlaps as a measure of semantic
LESK 0.712 relatedness, IJCAI (2003) 805810.
RES 0.106 [3] D. Bollegala, Y. Matsuo, M. Ishizuka, Measuring semantic similarity between
LCH 0.687 words using web search engines, in: Proceedings of the World Wide Web
Conference, 2007, pp. 757766.
Vector_pairs 0.351
[4] A. Budanitsky, G. Hirst, Evaluating wordnet-based measures of lexical
Vector 0.289
semantic relatedness, Computational Linguistics 32 (1) (2006) 1347.
HH(DE) 0.885 [5] P-I. Chen, S-J. Lin, Word AdHoc Network: Using Google Core Distance to extract
the most relevant information, Knowledge Based System 24 (3) (2011) 393
405.
[6] D. Conrath, J. Jiang, Semantic similarity based on corpus statistics and lexical
Table 11 taxonomy, in: Comp. Linguist Proc., Taiwan, 1997, pp. 1933.
Pearson correlations between different approaches and [7] S. Das, P.N. Suganthan, Differential evolution: a survey of the state-of-the-art,
IEEE Transaction on Evolutionary Computation 15 (1) (2011) 431.
the human expert opinion.
[8] J. Demsar, Statistical comparison of classiers over multiple data sets, Journal
Similarity Correlation of Machine Learning Research 7 (2006) 130.
function [9] F.A. Grootjen, T.P. van der Weide, Conceptual query expansion, Data
(metric) Knowledge Engineering 56 (2) (2006) 174193.
[10] A.Y. Halevy, A. Rajaraman, J.J. Ordille, Data integration: the teenage years,
Vector [21] 0.76 VLDB (2006) 916.
LIN [21] 0.69 [11] G. Hirst, D. St-Onge, Lexical chains as representations of context for the
J&C [21] 0.55 detection and correction of malapropisms, in: C. Fellbaum (Ed.), WordNet: An
RES [21] 0.55 Electronic Lexical Database, MIT Press, 1998.
Path [21] 0.48 [12] J. Hu, R.S. Kashi, G.T. Wilfong, Comparison and classication of documents
L&C [21] 0.47 based on layout similarity, Information Retrieval 2 (2/3) (2000) 227243.
[13] A. Java, S. Nirenburg, M. McShane, T.W. Finin, J. English, A. Joshi, Using a
PATH [1] 0.818
natural language understanding system to generate semantic web content,
L&C [1] 0.833
International Journal on Semantic Web and Information Systems 3 (4) (2007)
W&P [1] 0.778
5074.
C&K [1] 0.702 [14] E. Kaufmann, A. Bernstein, Evaluating the usability of natural language query
Metrics proposed in [1] 0.836 languages and interfaces to Semantic Web knowledge bases, Journal of Web
HH(DE) 0.885 Semantics 8 (3) (2010) 377393.
[15] C. Leacock, M. Chodorow, G.A. Miller, Using corpus statistics and wordnet
relations for sense identication, Computational Linguistics 24 (1) (1998) 147
165.
[16] M. Lesk, Information in data: using the Oxford English dictionary on a
case. Our HH(DE) assigns a coefcient to each metric to modify the computer, SIGIR Forum 20 (14) (1986) 1821.
importance of each metric in the global system and avoid negative [17] D. Lin, An information-theoretic denition of similarity, in: Int. Conf. ML Proc.,
San Francisco, CA, USA, 1998, pp. 296304.
results.
[18] C.D. Manning, H. Schtze, Foundations of Statistical Natural Language
Finally, Table 11 summarizes all results related to the second Processing, MIT Press, Cambridge, Massachusetts, 1999.
dataset. As can be seen, our approach improves any other similar- [19] E. Mezura-Montes, J. Velzquez-Reyes, C.A. Coello-Coello, A comparative study
of differential evolution variants for global optimization, in: Proceedings of the
ity function applied over the same word dataset. There are several
8th Annual Conference on Genetic and Evolutionary Computation (GECCO 06),
reasons which can explain this good behaviour. For example, the ACM, New York, NY, USA, 2006, pp. 485492.
approaches presented in [21] are limited by the fact that a unique [20] T. Pedersen, S. Patwardhan, J. Michelizzi, Word-Net: similarity measuring the
ontology is exploited. According to Snchez et al. [26] these results relatedness of concepts, Association for the Advancement of Articial
Intelligence (2004) 10241025.
rely completely on the coverage and completeness of the input [21] T. Pedersen, S. Pakhomov, S. Patwardhan, C.G. Chute, Measures of semantic
ontology. On the other hand, the metrics proposed in [1] are nearer similarity and relatedness in the biomedical domain, Journal of Biomedical
to our results since their strategies consist of experimentally opti- Informatics 40 (3) (2007) 288299.
[22] G. Pirr, A semantic similarity metric combining features and intrinsic
mized parameters for the evaluated dataset. However, even com- information content, Data Knowledge Engineering 68 (11) (2009) 12891308.
pared against those metrics, our hyper-heuristics obtains more [23] P. Resnik, Using information content to evaluate semantic similarity in a
successful results. taxonomy, IJCAI (1995) 448453.
[24] D. Snchez, M. Batet, D. Isern, Ontology-based information content
computation, Knowledge Based System 24 (2) (2011) 297303.
5. Conclusions and future work [25] D. Snchez, M. Batet, D. Isern, A. Valls, Ontology-based semantic similarity: a
new feature-based approach, Expert Systems with Applications 39 (9) (2012)
77187728.
In this work, we have presented a novel approach that sur- [26] D. Snchez, A. Sol-Ribalta, M. Batet, F. Serratosa, Enabling semantic similarity
passes existing similarity functions when dealing with datasets estimation across multiple ontologies: an evaluation in the biomedical
from the biomedical domain. The novelty of our work consists of domain, Journal of Biomedical Informatics 45 (1) (2012) 141155.
[27] N. Shadbolt, T. Berners-Lee, W. Hall, The semantic web revisited, IEEE
using other similarity functions as black boxes which are smartly Intelligent Systems 21 (3) (2006) 96101.
combined. This allows our HH(DE) to make use of the best features [28] J.S. Simonoff, Smoothing Methods in Statistics, Springer, 1996.
from every similarity function. [29] R. Storn, K. Price, Differential Evolution A Simple and Efcient Adaptive
Scheme for Global Optimization Over Continuous Spaces, TR-95-012,
We present the novel approach of applying an evolutionary International Computer Science Institute, Berkeley, 1995.
algorithm to this kind of problem. Furthermore, it provides the best [30] Z. Wu, M. Palmer, Verb semantics and lexical selection, in: Assoc. Comput.
similarity scores (see Section 4) when compared against other rel- Linguist Proc., Las Cruces, NM, USA, 1994, pp. 133138.
[31] D. Zaharie, A comparative analysis of crossover variants in differential
evant works published in the bibliography. evolution, in: Proceedings of the International Multiconference on Computer
As future work, we propose to explore further possibilities for Science and Information Technology, Wisla, Poland, 2007, pp. 171181.
synonym recognition in other domains, especially those in which
Please cite this article in press as: J.M. Chaves-Gonzlez, J. Martnez-Gil, Evolutionary algorithm based on different semantic similarity functions for
synonym recognition in the biomedical domain, Knowl. Based Syst. (2012), http://dx.doi.org/10.1016/j.knosys.2012.07.005