Professional Documents
Culture Documents
Today s Overview
What is a search engine Indexing Analyzer What is Solr Config Files
Search Engine
Lucene
Solr
Searcher
What is
Lucene is open source Information Retrieval Software Library Lucene is Open Source (License License) Present Version 3.5 Lucene takes Text as input data and create Index on it Stores the index in File System (Can store in RAM or Harddisk) You can search over the Lucene Index Index created consists of documents wherein each document further holds field value pairs. Fields contain classified information about the document.
What is
It is not Text Extraction library It is not a crawler (robot) It is not Search Server This is not Text analytic library
Terminology
Analyzer IndexWriter Document IndexSearcher IndexReader Field org.apache.lucene.analysis org.apache.lucene.index org.apache.lucene.document org.apache.lucene.search
Lucene Indexing
org.apache.lucene.index.IndexWriter creates the index. IndexWriter writer=new IndexWriter (Directory d, Analyzer a, boolean create) where d - directory to store the index. a analyzer for the content of the files. create a boolean which indicates whether a new index needs to be created or if an existing index should be extended. Create an instance of Document. Document doc=new Document();
Analyzer
SimpleAnalyzer
Lowercases, splits at non-letter boundaries
StopAnalyzer
Lower-cases and removes stop words
Lucene Searching
Create an IndexSearcher object.
IndexSearcher searcher = new IndexSearcher (Directory indexDir, boolean readOnly);
What is Solr
Solr is pronounced as Solar . It stands for Searching on Lucene . Web-based Indexing & Searching Server By default, comes bundled with Jetty server Can also be deployed in any other servlet container like Tomcat, Resin etc.
Advantages of Solr
Solr can replicate index on multiple servers. Uses REST based web-services for indexing and searching Indexing and search can be done simultaneously. Supports faceted searching. Supports result clustering. Supports Hit highlighting Supports Multiple output formats (XML/XSLT and JSON).
Deploying Solr
Following are the steps for deploying Solr in Jetty: y Download Solr and install it. y Root directory of Solr [eg: D:\tools\solr] is referred as SOLR_HOME. y Start Solr services. To start the service, execute start.jar present in SOLR_HOME/example/. java jar start.jar
y Default port for jetty is 8983.Once the service is started, type in URL
http://localhost:8983/solr Solr admin screen appears.
schema.xml - Describes the data type Field type Analyzer and Tokenizer used on fields Copy fields Default Field
Solr Indexing
In Java project, use solrj API for indexing Initialize SolrServer. Create document Insert fields into document. Add the documents to server. Commit the server.
Solr Search
One option to search over the solr index is using SolrJ. For this, the user needs to define the server, create query object, and send the query to the server to fetch response. SolrServer _server = new CommonsHttpSolrServer ("http://localhost:8983/solr"); SolrQuery solrquery = new SolrQuery (); solrquery.setQuery (<enter query here>); QueryResponse rsp = _server.query (solrquery); In Solr, the search queries are processed by the appropriate SolrRequestHandler. Range, Prefix, Boolean, Wildcard queries are allowed in Solr.
Why Solr
Replication of Index Scalable and Fault Tolerant (Depending upon the underlying infrastructure) Built in Faceted Search Capabaility
Resources
http://lucene.apache.org http://lucene.apache.org/solr
http://minion.dev.java.net/
Thank You
QUESTIONS?