SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Apache Solr
                                        Enterprise search platform
                                     from the Apache Lucene project




Rivet Logic Corporation
1800 Alexander Bell Drive
Suite 400
Reston, VA 20191
Ph: 703.955.3480 Fax: 703.234.7711
What is Solr?


 ● Search Server
 ● Built upon Apache Lucene
 ● Fast, very
 ● Scalable, query load and collection size
 ● Interoperable
 ● Extensible
 ● Lucene power exposed over HTTP
 ● Spell checking, highlighting, faceting and etc.
 ● Caching
 ● Replication
 ● Distributed search
How stuff works?
schema.xml


● Field types
   ○ <fieldType name="text" class="solr.TextField" indexed="true" />


● Fields
   ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>
● Unique key (optional)
   ○ <uniqueKey>id</uniqueKey>
● copy fields
   ○ <copyField source="developers" dest="df"/>
● dynamic fields
   ○ <dynamicField name="*_dt" type="date"       indexed="true" stored="true"/>
● similarity configuration
   ○ Similarity is the scoring routine for each document vs. a query
solrconfig.xml

● Lucene indexing parameters
   ○ <mergeFactor>10</mergeFactor>
   ○ <ramBufferSizeMB>32</ramBufferSizeMB>
● Cache settings
   ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="
     32"/>
● Request handler configuration
   ○ <requestHandler name="dismax" class="solr.SearchHandler" >
● HTTP cache settings
   ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">


● Search components, response writers, query parsers
   ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">


   ○ <queryResponseWriter name="velocity" class="org.apache.solr.request.
     VelocityResponseWriter"/>
   ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
Request Handler

<requestHandler name="/itas" class="solr.SearchHandler">
   <lst name="defaults">
    <str name="v.template">browse</str>
    <str name="v.properties">velocity.properties</str>
    <str name="title">Solritas</str>

    <str name="wt">velocity</str>
    <str name="defType">dismax</str>
    <str name="q.alt">*:*</str>
    <str name="rows">10</str>
    <str name="fl">*,score</str>
    <str name="facet">on</str>
    <str name="facet.field">df</str>
    <str name="facet.mincount">1</str>
    <str name="hl">true</str>
    <str name="hl.fl">developers</str>
    <str name="qf">
       text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
    </str>
   </lst>
 </requestHandler>
Response Writer


● A Response Writer generates the formatted response of
  a search.
● The wt parameter selects the Response Writer to be
  used
● json, php, phps, python, ruby, xml, xslt, velocity

  <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter">
   <int name="xsltCacheLifetimeSeconds">5</int>
  </queryResponseWriter>
Analyzers, Tokenizers, Filters

● The Analyzer class is a native Lucene concept that determines
  how tokens are produced from a piece of text
   <fieldType name="nametext" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
   </fieldType>

● The job of a tokenizer is to break up a stream of text into
  tokens
● A token looks at each Token in the stream sequentially
  and decides whether to pass it along, replace it or discard
  it
    <fieldType name="text" class="solr.TextField">
       <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.StandardFilterFactory"/>
       </analyzer>
    </fieldType>
Other features

● Highlighting
   ○ &hl=true&hl.fl=developers
● Synonyms
   ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
    expand="true"/>
● Spell check
   ○ The spell check component can return a list of alternative spelling
      suggestions.
   ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
● Content Streams
   ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in
    solrconfig.xml

● Solr Cell
   ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many
    other types

● More like this
   ○ http://wiki.apache.org/solr/MoreLikeThis
Indexing with solrJ



SolrServer solr =
        new CommonsHttpSolrServer(
                    new URL("http://localhost:8983/solr"));
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "EXAMPLEDOC01");
doc.addField("title", "NOVAJUG SolrJ Example");
solr.add(doc);
solr.commit(); // after a batch, not per document
solr.optimize(); // periodically, if/when needed
Data Import Handler

● Indexes relational database, XML data, and e-mail
  sources
● Supports full and incremental/delta indexing
● Highly extensible with custom data sources,
  transformers, etc
● http://wiki.apache.org/solr/DataImportHandler
Replication


● Master is polled
● Replicant pulls Lucene index and optionally also Solr
  configuration files
● Query throughput scaling: replicate and load balance
● http://wiki.apache.org/solr/SolrReplication
Demo

● Download solr
   ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/
● Start solr
   ○ cd <solr_home>/example
   ○ java -jar start.jar
● Post documents
   ○ cd <solr_home>/example/exampledocs
   ○ java -jar post.jar *.xml
   ○ java -jar post.jar cw.xml
● Access Solr
   ○ http://localhost:8983/solr/admin/
● Querying solr
   ○ http://localhost:8983/solr/select/?q=binesh
   ○ http://localhost:8983/solr/select/?q=binny
   ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1
   ○ http://localhost:8983/solr/itas/
● Luke
   ○ http://www.getopt.org/luke/
Liferay + Solr: Motivation


● Centralizing search index in clustered Liferay
  environment

● Performance improvement
   ○ Re-indexing costs too much for large DB's
   ○ Often time indexes of Liferay deployments in a cluster are not
     synchronized
Liferay + Solr: Configuration 1


Install Solr (http://lucene.apache.org/solr)

Setting up environment variables
 ● SOLR_HOME = /${solr installed folder}
 ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"
solr.xml
 ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content

   <?xml version="1.0" encoding="utf-8">
   <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war"
           debug="0" crossContext="true">
       <Environment name="solr/home" type="java.lang.String"
                     value="$SOLR_HOME" override="true" />
   </Context>
Liferay + Solr: Configuration 2

schema.xml
 ● This file tells Solr how to index the data coming from Liferay, and can be
   customized for your installation.
 ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have
   to create the conf directory) in your Solr home folder.
... <fields>
<field name="comments" type="text" indexed="true" stored="true" />
<field name="content" type="text" indexed="true" stored="true" />
<field name="description" type="text" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="properties" type="text" indexed="true" stored="true" />
<field name="title" type="text" indexed="true" stored="true" />
<field name="uid" type="string" indexed="true" stored="true" />
<field name="url" type="text" indexed="true" stored="true" />
<field name="userName" type="text" indexed="true" stored="true" />
<field name="version" type="text" indexed="true" stored="true" />
<dynamicField name="*" type="string" indexed="true" stored="true" />
</fields>
<uniqueKey>uid</uniqueKey>
<defaultSearchField>content</defaultSearchField>
       ... <copyField source="comments" dest="content"/> ... ...
Liferay + Solr: Configuration 3



Copy WAR file
 ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war
   into $SOLR_HOME/example; where ${solr.version} represents Solr
   version number, i.e., 1.4.0.


Start Liferay/tomcat
 ● Solr will be picked up and "solr" will be deployed automatically under
   ${tomcat}/webapps folder


Install solr-web Liferay plugin
 ● Latest Liferay plugin can be checked out from the following location
http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web
 ● Build the checked out plugin and deploy it
Liferay + Solr: Configuration 4


Final Step
 ● We need to rebuild Liferay search indexes
 ● Control Panel > Server Administration
Liferay + Solr: How it works


 solr-spring.xml (from solr-web plugin)

  ...
  <bean id="solrServer"
         class="com.liferay.portal.search.solr.server.BasicAuthSolrServer">
      <constructor-arg type="java.lang.String"
                     value="http://localhost:8080/solr" />
  </bean>
  <bean id="indexSearcher.solr"
         class="com.liferay.portal.search.solr.SolrIndexSearcherImpl">
<property name="solrServer" ref="solrServer" />
  </bean>
  <bean id="indexWriter.solr"
         class="com.liferay.portal.search.solr.SolrIndexWriterImpl">
<property name="commit" value="true" />
<property name="solrServer" ref="solrServer" />
  </bean>
  ...
Liferay + Solr: Back to the default?


● Simply undeploy solr-web plugin
● Rebuild search indexes using the control panel described
  in the previous step

Más contenido relacionado

La actualidad más candente

Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 

La actualidad más candente (20)

Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr 4
Solr 4Solr 4
Solr 4
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 

Similar a Apache solr liferay

Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
JSGB
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 

Similar a Apache solr liferay (20)

Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Solr a.b-ab
Solr a.b-abSolr a.b-ab
Solr a.b-ab
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Solr5
Solr5Solr5
Solr5
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Apache solr liferay

  • 1. Apache Solr Enterprise search platform from the Apache Lucene project Rivet Logic Corporation 1800 Alexander Bell Drive Suite 400 Reston, VA 20191 Ph: 703.955.3480 Fax: 703.234.7711
  • 2. What is Solr? ● Search Server ● Built upon Apache Lucene ● Fast, very ● Scalable, query load and collection size ● Interoperable ● Extensible ● Lucene power exposed over HTTP ● Spell checking, highlighting, faceting and etc. ● Caching ● Replication ● Distributed search
  • 4. schema.xml ● Field types ○ <fieldType name="text" class="solr.TextField" indexed="true" /> ● Fields ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/> ● Unique key (optional) ○ <uniqueKey>id</uniqueKey> ● copy fields ○ <copyField source="developers" dest="df"/> ● dynamic fields ○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/> ● similarity configuration ○ Similarity is the scoring routine for each document vs. a query
  • 5. solrconfig.xml ● Lucene indexing parameters ○ <mergeFactor>10</mergeFactor> ○ <ramBufferSizeMB>32</ramBufferSizeMB> ● Cache settings ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount=" 32"/> ● Request handler configuration ○ <requestHandler name="dismax" class="solr.SearchHandler" > ● HTTP cache settings ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr"> ● Search components, response writers, query parsers ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ○ <queryResponseWriter name="velocity" class="org.apache.solr.request. VelocityResponseWriter"/> ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
  • 6. Request Handler <requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str> <str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>
  • 7. Response Writer ● A Response Writer generates the formatted response of a search. ● The wt parameter selects the Response Writer to be used ● json, php, phps, python, ruby, xml, xslt, velocity <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>
  • 8. Analyzers, Tokenizers, Filters ● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> </fieldType> ● The job of a tokenizer is to break up a stream of text into tokens ● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
  • 9. Other features ● Highlighting ○ &hl=true&hl.fl=developers ● Synonyms ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> ● Spell check ○ The spell check component can return a list of alternative spelling suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ● Content Streams ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml ● Solr Cell ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types ● More like this ○ http://wiki.apache.org/solr/MoreLikeThis
  • 10. Indexing with solrJ SolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "EXAMPLEDOC01"); doc.addField("title", "NOVAJUG SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, if/when needed
  • 11. Data Import Handler ● Indexes relational database, XML data, and e-mail sources ● Supports full and incremental/delta indexing ● Highly extensible with custom data sources, transformers, etc ● http://wiki.apache.org/solr/DataImportHandler
  • 12. Replication ● Master is polled ● Replicant pulls Lucene index and optionally also Solr configuration files ● Query throughput scaling: replicate and load balance ● http://wiki.apache.org/solr/SolrReplication
  • 13. Demo ● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/ ● Start solr ○ cd <solr_home>/example ○ java -jar start.jar ● Post documents ○ cd <solr_home>/example/exampledocs ○ java -jar post.jar *.xml ○ java -jar post.jar cw.xml ● Access Solr ○ http://localhost:8983/solr/admin/ ● Querying solr ○ http://localhost:8983/solr/select/?q=binesh ○ http://localhost:8983/solr/select/?q=binny ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1 ○ http://localhost:8983/solr/itas/ ● Luke ○ http://www.getopt.org/luke/
  • 14. Liferay + Solr: Motivation ● Centralizing search index in clustered Liferay environment ● Performance improvement ○ Re-indexing costs too much for large DB's ○ Often time indexes of Liferay deployments in a cluster are not synchronized
  • 15. Liferay + Solr: Configuration 1 Install Solr (http://lucene.apache.org/solr) Setting up environment variables ● SOLR_HOME = /${solr installed folder} ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data" solr.xml ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content <?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>
  • 16. Liferay + Solr: Configuration 2 schema.xml ● This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have to create the conf directory) in your Solr home folder. ... <fields> <field name="comments" type="text" indexed="true" stored="true" /> <field name="content" type="text" indexed="true" stored="true" /> <field name="description" type="text" indexed="true" stored="true" /> <field name="name" type="text" indexed="true" stored="true" /> <field name="properties" type="text" indexed="true" stored="true" /> <field name="title" type="text" indexed="true" stored="true" /> <field name="uid" type="string" indexed="true" stored="true" /> <field name="url" type="text" indexed="true" stored="true" /> <field name="userName" type="text" indexed="true" stored="true" /> <field name="version" type="text" indexed="true" stored="true" /> <dynamicField name="*" type="string" indexed="true" stored="true" /> </fields> <uniqueKey>uid</uniqueKey> <defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...
  • 17. Liferay + Solr: Configuration 3 Copy WAR file ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0. Start Liferay/tomcat ● Solr will be picked up and "solr" will be deployed automatically under ${tomcat}/webapps folder Install solr-web Liferay plugin ● Latest Liferay plugin can be checked out from the following location http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web ● Build the checked out plugin and deploy it
  • 18. Liferay + Solr: Configuration 4 Final Step ● We need to rebuild Liferay search indexes ● Control Panel > Server Administration
  • 19. Liferay + Solr: How it works solr-spring.xml (from solr-web plugin) ... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"> <property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"> <property name="commit" value="true" /> <property name="solrServer" ref="solrServer" /> </bean> ...
  • 20. Liferay + Solr: Back to the default? ● Simply undeploy solr-web plugin ● Rebuild search indexes using the control panel described in the previous step