2. SOLR Introduction
Why do we need a Search Engine ?
What is Lucene/SOLR ?
Advantages of SOLR
SOLR Architecture
Query Syntax
Working with SOLR: Feed data, query data
SOLR installation
SOLR configuration
2
3. Why do we need a Search Engine ?
Google, Bing, Yahoo, … Database
Yes, that’s normal way
Can not access to our data
The problem is response time
Need a Search Engine:
Lucene / SOLR
3
4. What is Lucene/SOLR ?
Lucene
Apache Lucene is a free/open source information retrieval
software library.
Lucene is just an indexing and search library
Lucene supports: Java, Delphi, Perl, C#, C++, Python, Ruby, and
PHP
4
5. What is Lucene/SOLR ?
Solr
Solr is wrapper of Lucene for Java
Solr is a web application (WAR) which can be deployed in any
servlet container, e.g. Jetty, Tomcat
Solr is a REST service
5
6. SOLR Introduction
Advantages of SOLR
Open source/free
Administration Interface
Rich Document Parsing and Indexing (PDF, Word, HTML, etc)
Full-Text Search
Faceted Search and Filtering
Multi Server support
The comparison of Search Engines:
http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-
overview/
6
9. Query Syntax
Keyword matching
title:foo - Search for word "foo" in the title field.
title:"foo bar” - Search for phrase "foo bar" in the title field.
-title:bar - Search everything, except "bar" in the title field.
9
10. Query Syntax
Wildcard matching
title:foo* - Search for any word that starts with "foo" in the title field.
title:foo*bar - Search for any word that starts with "foo" and ends with
bar in the title field.
*:* - Search every thing
10
11. Query Syntax
Proximity matching
"foo bar"~number
Number = 0, exactly match
Number = 1, The result may be “bar foo”
11
12. Query Syntax
Range searches
field:[a TO z] - Search the field has value in range [a->z]
field:[* TO 100] - Search all values less than or equal to 100
field:[100 TO *] - Search all values greater than or equal to 100
field:[* TO *] - Matches all documents with the field
12
16. SolrJ
Feed data
// make a connection to Solr server
SolrServer server = new HttpSolrServer("http://localhost:8080/solr/");
// prepare a doc
final SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField("id", 1);
doc1.addField("firstName", "First Name");
doc1.addField("lastName", "Last Name");
final Collection<SolrInputDocument> docs = new
ArrayList<SolrInputDocument>();
docs.add(doc1);
// add docs to Solr
server.add(docs);
server.commit();
16
17. SolrJ
Query data
final SolrQuery query = new SolrQuery();
query.setQuery("*:*");
query.addSortField("firstName", SolrQuery.ORDER.asc);
final QueryResponse rsp = server.query(query);
final SolrDocumentList solrDocumentList = rsp.getResults();
for (final SolrDocument doc : solrDocumentList) {
final String firstName = (String)
doc.getFieldValue("firstName");
final String id = (String) doc.getFieldValue("id");
}
17
19. SOLR Introduction
• Extract solr-4.2.1.zip to (D:Projectsolr_websolr-4.2.1)
• Copy resourcesolr-4.2.1examplessolr to D:Projectsolr_websolr = SOLR_HOME
• Copy resourcesolr-4.2.1distsolr-4.2.1.war to SOLR_HOME and rename to solr.war
• Open the SOLR_HOMEcollection1confsolrconfig.xml and modify the <dataDir>
<dataDir>${solr.data.dir:D:/Project/sorl_web/solr/collection1/data}</dataDir>
• Create a Tomcat Context (solr.xml) file like this:
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="D:/Project/solr_web/solr/solr.war" debug="0“
crossContext="true">
<Environment name="solr/home" type="java.lang.String"
value="D:/Project/solr_web/solr" override="true"/>
</Context>
• Copy this file (solr.xml) to tomcat.7.0.35confCatalinalocalhost
• Start Tomcat
• Open the SOLR dashboard with address: http://localhost:8080/sorl/#/
19
20. SOLR Introduction
SOLR Configuration
Ref:
http://wiki.apache.org/solr/SolrConfigXml
http://wiki.apache.org/solr/SchemaXml
In the configuration of a Solr server, we need at least 2 xml files:
solrconfig.xml and schema.xml
Solrconfig.xml: contains the common configuration of a Core: size of
memory, data path, transaction, …
Schema.xml: contains the definitions of data: structure, data type,
fields name …
20
21. SOLR Introduction
SOLR Configuration
Schema.xml
field : a field will be indexed by solr
<field name="firstName" type="string" indexed="true" stored="true"/>
dynamicField: like a field but the name is not specified yet
<dynamicField name="*_i" type="int" indexed="true"
stored="true"/>
name="*_i" will match any field ending in _i (like myid_i, z_i)
21
Notas del editor
http://www.solrtutorial.com/
Full-Text Search: In text retrieval, full-text search refers to techniques for searching a single computer-stored document or a collection in a full-text database. Full-text search is distinguished from searches based on metadata or on parts of the original texts represented in databases (such as titles, abstracts, selected sections, or bibliographical references).In a full-text search, the search engine examines all of the words in every stored document as it tries to match search criteria (text specified by a user) Faceted Search: Faceted search is the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field. Each facet displayed also shows the number of hits within the search that match that category. Users can then “drill down” by applying specific contstraints to the search results. Faceted search is also called faceted browsing, faceted navigation, guided navigation and sometimes parametric search. Faceted search provides an effective way to allow users to refine search results, continually drilling down until the desired items are found. Example for Faceted Search: A computer selling page, normally, we have a panel to select the manufactory of computer (sony, ibm, …) the we search the appropriate product. In faceted search concept, we do an opposite thing, from a query, we show the suitable manufactory, then user can continue the searching based on current result.Ref: http://searchhub.org/2009/09/02/faceted-search-with-solr/