You are on page 1of 8

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

Learning to Rank Methods for Information Retrieval and


Natural Language Processing.
Vajenti Mala*, D.K.Lobiyal
School of Computer & Systems Sciences, JNU, New Delhi
er.vajenti@gmail.com* ,lobiyal@gmail.com
ABSTRACT
The most significant way to access the useful information is to use information retrieval (IR)
systems. An IR system takes users query as an input and gives a set of documents relevant to the
query as output to the user. The relevance of the output documents is determined by using
learning to rank techniques. In this paper, we have applied different methods of learning to rank
to evaluate their performance. We have the results of merging different ranking algorithms are
analyzed to compare effectiveness of automatic ranking. By comparing the results of Position
ranking and Borda count ranking technique, it is observed that the latter one performance better.
Finally, the paper finishes by some point of general issues, key elements and applications for
learning to ranking strategies.
Index Terms : Information Retrieval, Natural Language Processing, Search Engine, Merging, MetaSearch Engine, Data Fusion

I.

INTRODUCTION

The growing unstructured data on the web is making it difficult and challenging for the users of web to
extract useful information Therefore, a good information retrieval system is essential for the user to
obtain effective and efficient results. The most significant part to access the useful information using
information retrieval and Natural Language Processing (NLP) is ranking. There are several tasks that
information retrieval (IR) and natural language processing (NLP) performs for solving the central
problem of ranking. They include text retrieval, entity search, meta-search, personalized search, text
summarization and question answering. In this paper we focus on text retrieval since, according to the
report by IProspect, 56% of the web users are using the web search every day and 88% use search web
every week [2]. Therefore, to find the relevance of searched text, ranking plays a key role.

This paper is organized by follows. In the following next section, we describe the review of related works
on learning ranking methods for information retrieval. In section 3, we discuss the learning methods
(data fusion) techniques. In section 4 and 5, we have explained the methods which are implemented on
the architecture based on learning ranking methodology. In section 6 and 7 simulation part as well
applications of a learning methodology are discussed. Finally and last section we have concluded the
work presented in the paper with the future research directions.

34 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

II. RELATED WORKS


There are different relatives works have been done in the area of search engine. A good search engine can
be selecting the different result to identify potentially useful search engine for a user given query from
different search engine. The main objective of search engine is to sending the query to the search engine
selector to improve the effective and efficient result from the search engine according to the user demand
only. In paper [13] W.Meng discussed a learning based approach to determine the rank list in document
selection by the search engine algorithm to improve the efficiency of the result. In [14], static approach is
applied to select appropriate search engine by using frequency of each term/keyword as well as
documents from search engine.
Jiang et al [15], have developed two ranking approaches over the classification approaches to improve
the performance of ranking.
III. LEARNING TO RANKING SYSTEM FOR INFORMATION RETRIEVAL
In information retrieval system, ranking system is widely used. Recently in the literature, a new field of
ranking is emerging and called learning to rank. It is developed with its basis on the area of machine
learning, information retrieval and natural language processing.
A. Representation of Ranking of Documents
Document retrieval have different retrieval strategies that includes web search engine, and desktop
search. Document retrieval can be express by the following fig: 1, in which the ranking plays a vital role.

Ranking based
on relevance

No: of Documents

Database
Dn = {D1, D2DN}

Rank of Doc
Retrieval
System

Query Qn

Dq,1
Dq,2
.
.
.

Dq,nq

Fig: 1 Document Retrieval System


IV. METHODS OF LEARNING TO RANK
A.

Borda Rank

In 1770 Jean-Charles de Borda Count proposed a voting based data fusion method [11]. It is an
unsupervised method for aggregation of ranking. It is a method, which can be applied on Meta Search
engine. In this case, Borda Count ranks the documents based on their positions in the basic rankings. If
any document has a high ranking in basic rankings it is counted as a high ranking in the final ranking list.
The scores of ranking documents in the final ranking list can be calculated as
35 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

S RD = F ( ) = Si .(1)

i =1

S RD

Si ,1

M
= Si , j ..(2)

M
S
i ,n

Where Si,j is the number of documents in the ranked list behind document j. The basic ranking i , i (j) is
denoted by the rank of the document j in the basic ranking i , and n denotes the no: of documents.
For example the document a, b, c can be ranked for three basic rankings: 1, 2, and 3.

1
a
b
C

2
A
C
B

3
b
a
c

The ranking scores of the documents S RD are as bellow.

0 1 0 1
S RD = 1 + 0 + 2 = 3 (3)

2 2 1 5
The final ranking list is created by Borda Count is based on the scores. S RD

c
S RD = b (4)
a

B.

Reciprocal Rank

In this approach, merging of document into unified list is done and retrieved documents are used for
ranking. Basically,
The retrieval system is defined as rank position or reciprocal rank system. The computation equation of
the rank score of
Document l shows the position of this document in all of the systems (m=1n).

SR (Dl) =

1
l

.(5)

position(d lm )

Firstly, the method calculates the rank position score of every document and it combines them by the
using rank position
36 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

scores. Documents can be in ascending or descending order. The top document is known as Pseudorels.
In our experiment, we used two data fusion Techniques for determining the Pseudorels: Rank Position,
and Borda Count.
Data fusion methods merges and accepts more than two ranked lists with the aim to provide better
performance than by an
individual system used for data fusing [10].
V. ARCHITECTURE OF LEARNING TO RANLK FOR IR SYSTEM
Learning to rank includes a metasearch engine system which sends the user request query to the
several search engines and aggregate the results obtained from them.
Meta-Search Query Interface

Query Dispatcher

Search
Engines

Information Extractor
Record
Collector

Result
Collector

User

Result Merger
DBMS

Eliminator (Duplicate Records)

Result Ranker
Fig: 2. Standard Architecture of MSEs

User Query Interface: User sends queries to the search engine with option of four types of
search and search engines to be used.
Query Dispatcher: It generates or fetches an actual query to the search engines according to the
user query.
Information Extraction and Result Merger:
Information Extraction (IE) is a very important component for extracting the results. It contains
data record collected as a result from collector component. The result merger merges the
documents retrieved from specific search engines as selected by the user, and combines them
into a single rank list. These documents are arranged in ascending order with their global

37 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

similarity. The top documents having higher global similarity in the ranked list are returned to
the user interface.
Display: It generates the results page from the replies received and includes Ranking, Parsing
and clustering to obtain the search results.
Personalization/Knowledge: It involves weight of the searching results, query and search
engine for each user.

A. Architecture based Example


Example: Query: Tourist places in India. Web pages: Result obtained from Yahoo, Google and MSN.
Objective: To merge ranked list of web pages obtained from individual search engines and compare
them with data fusion techniques by automatic ranking system.
Input: 1.User query
2. Ranked list of web pages provided by different search engine for the specific query.

Output: A final ranked list of web pages using Rank and Borda count techniques
Methods:Step1. Take a query to find out the results from different databases or search engines, such as
Yahoo, Google, MSN.
Step2. Take top k documents from the

individual Search Engine.

Step3. Find out union of top k documents obtained from individual engine and remove the duplicate
to get the unique result from different search engines in order to get unique web pages.
For k=No: of documents = 20, Number of Queries = See in the Query List
Top k results by Yahoo, Google and MSN are shown in table2.
Step4. Using data fusion techniques compares the performance of these two techniques in automatic
ranking.

VI. Experimental Simulation


In simulation, the algorithm is conducted in C++ and MATLAB 2010b.In this experiments we assume
that there are 3 search engines, Yahoo, Google and MSN. We take top 20 documents from queries in
the query list which is relevant to the user requirement. The results of the different search engines
after removal of the duplicates are ranked as shown in Figure 1. In Figure 4 we have compared the
two techniques to show which search engine gets the topmost results.

38 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

Fig: 2 Precision ratios of Three Web Search


Engines

Fig: 3 Score of Rank position and Borda Count


Method for three different SE

Table I. Natural Language Query List of Medical Documents


Query

Query/Keyword

nos

Query

Query/Keyword

nos

Q1

Cancer

Q6

Brain Tumor

Q2

Eye Diseases

Q7

Animal Disease

Q3

Blood Sugar

Q8

Heart attack

Q4

Breast Cancer

Q9

X-Ray

Q5

Dengue

Q10

Root Canal

Learning to rank has wide variety of an applications for IR and NLP. Most of them are document
search retrievals such as Expert search, meta search, personalized search, online advertisement
search, question answering, keyword phrases extraction, documents summarization, and machine
translation.

A.

Searching on Web
Web search is mostly used application for learning to rank which is also known as ranking models. It
can be used for different problems like web searching, personalized searching, federated search,
online advertisement searching etc.

B.

Filtering or Recommended System

39 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

This application is basically used for examining the ratings of items and ranked list of items. This
application can be formalized as classification problem because users can give ranking to the items
which are more likely to be preferred by the users.

C.

Key Phrase Extraction


Key phrase extraction can be formalized as classification methods such as decision tree and Nave
Bayes Jiang et al [15] have presented a key phrase extraction problem as ranking instead of
classification. This application can be viewed as inverse problem of document retrieval.

D. Query Dependent Summarization


It is important for the user to search the results and summarize the documents. Summarization is
important since the documents may be either relevant or not. The problem can be referred to as
query-dependent summarization.

E. Machine Translation
The typical problem of ranking in machine learning is re-ranking. There are many advantages of reranking approach. Machine translation produces many translation models by using generative
model. With the help of discriminative model, re-ranking can be done by candidate translations.
Through re-ranking, the accuracy of translation may be enhanced using discriminative model for
final translation selection. Further, through re-ranking the efficiency of translation may also be
improved.

VII.

CONCLUSION AND FUTURE WORK

Information retrieval and natural language processing is very important to retrieve the results in more
effective and efficient manner. For learning IR we have presented different ranking methods that are
applied for learning to IR ranking. Meta search engine supports or aggregates the results obtained from
different search engines. The main issue of Meta Search Engine (MSE) occurs in selection of database,
documents and result merging. In this paper, we have represented how to combine the search results and
obtained the results in the ranked list from multiple search engines. We have experimented on Rank
Position (Reciprocal Ranking) and Borda ranking techniques of retrieval system relevance judgment and
compared the results of both. Our result indicate that Natural Language search systems can have
practical use in society and algorithm can be used to improve the precision of medical documents. Other
text mining techniques can be used for large amount of text data in future.
VIII. REFERENCES
[1]

Lui,Bing, Web Data Mining. ACM Computing Classification, (1998).

40 | 2015, IJAFRC All Rights Reserved

www.ijafrc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 2, Issue 12, December - 2015. ISSN 2348 4853, Impact Factor 1.317

[2]

Meng, W.,Yu,C and Liu K. Building Effective and Efficient Metasearch engine ACM Computing
Surveys, Vol.34, No.1, (March 2002).

[3]

W.MENG, Metasearch Engines, Department of Computer Science, State University of New York
at Binghamton, 2008.

[4]

H. Jadidoleslamy, Search Result Merging and Ranking Strategies in Meta-Search Engine: A


Survey, International Journal of Computer Science Issues, Vol, 9 Issue 4, No 3, (July 2012).

[5]

A. Gulli, and A. Signorini, "Building anOpen Source


Informatica, May, 2005.

[6]

J. E. Glover, S. Lawrence, P W. Birmingham, and C.Giles. Architecture of a Metasearch Engine that


Supports User Information needs , NEC Research Institute, Artificial Intelligence Laboratory,
University of Michigan, In ACM, 1999.

[7]

Y. Lu, W.Meng, L. Shu, C. Yu, and K. Liu,Evaluation of Result Merging Strategies for Metasearch
Engines, 6th International Conference on Web Information Systems Engineering (WISE
Conference), New York, 2005.

[8]

H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu,Fully Automatic Wrapper Generation for Search
Engines. World Wide Web Conference, Chiba, japan,2005.

[9]

S. Souldatos, T. Dalamagas, and T. Sellis, Captain Nemo: A MetaSearch Engine with Personalized
Hierarchical search sapace, School of Electrical and Computer Engineering, national technical
university of Athens, November, 2005.

[10]

Craft, W. Combining approaches to information retrieval. In Advances in information Retrieved:


Recent Research from the center for intelligent Information Retrieva, edited by W.Bruce Croft.
Klwer Acadmic Publishers (2000), pp,1-36.

[11]

L. Akritidis, D. Katsaros, and P. Bozanis, Effective Ranking Fusion Methods for


Metasearch Engines, Panhellenic Conference on Informatics (IEEE), 2008.

[12]

S.M.Mahabhashy, and P. Singgitham, Tadpole: A Metasearch Engine Evaluation of Meta Search


ranking Strategies. University of Stanford, 2004.

[13]

Yuwono, B. and lee, D. Search and ranking algorithm for locating resources on the World Wide
Web. In proceeding of the 5th International Conference on Data engineering, page no: 164-177.
(1996).

[14]

Hang Li., Learning to Rank for Information Retrieval and Natural language Processing.
Morgan&Claypool, 2011.

[15]

Jiang.X, Yunhua.Hu, Hang Li., A Ranking approach to Keyphrase Extraction. SIGIR09, July 19-23,
2009, Boston, Massachusetts,USA.

41 | 2015, IJAFRC All Rights Reserved

Meta Search Engine", University of Pisa,

Personalized

www.ijafrc.org