SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Apache Solr
                                        Enterprise search platform
                                     from the Apache Lucene project




Rivet Logic Corporation
1800 Alexander Bell Drive
Suite 400
Reston, VA 20191
Ph: 703.955.3480 Fax: 703.234.7711
What is Solr?


 ● Search Server
 ● Built upon Apache Lucene
 ● Fast, very
 ● Scalable, query load and collection size
 ● Interoperable
 ● Extensible
 ● Lucene power exposed over HTTP
 ● Spell checking, highlighting, faceting and etc.
 ● Caching
 ● Replication
 ● Distributed search
How stuff works?
schema.xml


● Field types
   ○ <fieldType name="text" class="solr.TextField" indexed="true" />


● Fields
   ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/>
● Unique key (optional)
   ○ <uniqueKey>id</uniqueKey>
● copy fields
   ○ <copyField source="developers" dest="df"/>
● dynamic fields
   ○ <dynamicField name="*_dt" type="date"       indexed="true" stored="true"/>
● similarity configuration
   ○ Similarity is the scoring routine for each document vs. a query
solrconfig.xml

● Lucene indexing parameters
   ○ <mergeFactor>10</mergeFactor>
   ○ <ramBufferSizeMB>32</ramBufferSizeMB>
● Cache settings
   ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="
     32"/>
● Request handler configuration
   ○ <requestHandler name="dismax" class="solr.SearchHandler" >
● HTTP cache settings
   ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr">


● Search components, response writers, query parsers
   ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">


   ○ <queryResponseWriter name="velocity" class="org.apache.solr.request.
     VelocityResponseWriter"/>
   ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
Request Handler

<requestHandler name="/itas" class="solr.SearchHandler">
   <lst name="defaults">
    <str name="v.template">browse</str>
    <str name="v.properties">velocity.properties</str>
    <str name="title">Solritas</str>

    <str name="wt">velocity</str>
    <str name="defType">dismax</str>
    <str name="q.alt">*:*</str>
    <str name="rows">10</str>
    <str name="fl">*,score</str>
    <str name="facet">on</str>
    <str name="facet.field">df</str>
    <str name="facet.mincount">1</str>
    <str name="hl">true</str>
    <str name="hl.fl">developers</str>
    <str name="qf">
       text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
    </str>
   </lst>
 </requestHandler>
Response Writer


● A Response Writer generates the formatted response of
  a search.
● The wt parameter selects the Response Writer to be
  used
● json, php, phps, python, ruby, xml, xslt, velocity

  <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter">
   <int name="xsltCacheLifetimeSeconds">5</int>
  </queryResponseWriter>
Analyzers, Tokenizers, Filters

● The Analyzer class is a native Lucene concept that determines
  how tokens are produced from a piece of text
   <fieldType name="nametext" class="solr.TextField">
      <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/>
   </fieldType>

● The job of a tokenizer is to break up a stream of text into
  tokens
● A token looks at each Token in the stream sequentially
  and decides whether to pass it along, replace it or discard
  it
    <fieldType name="text" class="solr.TextField">
       <analyzer>
          <tokenizer class="solr.StandardTokenizerFactory"/>
          <filter class="solr.StandardFilterFactory"/>
       </analyzer>
    </fieldType>
Other features

● Highlighting
   ○ &hl=true&hl.fl=developers
● Synonyms
   ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
    expand="true"/>
● Spell check
   ○ The spell check component can return a list of alternative spelling
      suggestions.
   ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
● Content Streams
   ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in
    solrconfig.xml

● Solr Cell
   ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many
    other types

● More like this
   ○ http://wiki.apache.org/solr/MoreLikeThis
Indexing with solrJ



SolrServer solr =
        new CommonsHttpSolrServer(
                    new URL("http://localhost:8983/solr"));
SolrInputDocument doc = new SolrInputDocument();
doc.addField("id", "EXAMPLEDOC01");
doc.addField("title", "NOVAJUG SolrJ Example");
solr.add(doc);
solr.commit(); // after a batch, not per document
solr.optimize(); // periodically, if/when needed
Data Import Handler

● Indexes relational database, XML data, and e-mail
  sources
● Supports full and incremental/delta indexing
● Highly extensible with custom data sources,
  transformers, etc
● http://wiki.apache.org/solr/DataImportHandler
Replication


● Master is polled
● Replicant pulls Lucene index and optionally also Solr
  configuration files
● Query throughput scaling: replicate and load balance
● http://wiki.apache.org/solr/SolrReplication
Demo

● Download solr
   ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/
● Start solr
   ○ cd <solr_home>/example
   ○ java -jar start.jar
● Post documents
   ○ cd <solr_home>/example/exampledocs
   ○ java -jar post.jar *.xml
   ○ java -jar post.jar cw.xml
● Access Solr
   ○ http://localhost:8983/solr/admin/
● Querying solr
   ○ http://localhost:8983/solr/select/?q=binesh
   ○ http://localhost:8983/solr/select/?q=binny
   ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1
   ○ http://localhost:8983/solr/itas/
● Luke
   ○ http://www.getopt.org/luke/
Liferay + Solr: Motivation


● Centralizing search index in clustered Liferay
  environment

● Performance improvement
   ○ Re-indexing costs too much for large DB's
   ○ Often time indexes of Liferay deployments in a cluster are not
     synchronized
Liferay + Solr: Configuration 1


Install Solr (http://lucene.apache.org/solr)

Setting up environment variables
 ● SOLR_HOME = /${solr installed folder}
 ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data"
solr.xml
 ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content

   <?xml version="1.0" encoding="utf-8">
   <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war"
           debug="0" crossContext="true">
       <Environment name="solr/home" type="java.lang.String"
                     value="$SOLR_HOME" override="true" />
   </Context>
Liferay + Solr: Configuration 2

schema.xml
 ● This file tells Solr how to index the data coming from Liferay, and can be
   customized for your installation.
 ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have
   to create the conf directory) in your Solr home folder.
... <fields>
<field name="comments" type="text" indexed="true" stored="true" />
<field name="content" type="text" indexed="true" stored="true" />
<field name="description" type="text" indexed="true" stored="true" />
<field name="name" type="text" indexed="true" stored="true" />
<field name="properties" type="text" indexed="true" stored="true" />
<field name="title" type="text" indexed="true" stored="true" />
<field name="uid" type="string" indexed="true" stored="true" />
<field name="url" type="text" indexed="true" stored="true" />
<field name="userName" type="text" indexed="true" stored="true" />
<field name="version" type="text" indexed="true" stored="true" />
<dynamicField name="*" type="string" indexed="true" stored="true" />
</fields>
<uniqueKey>uid</uniqueKey>
<defaultSearchField>content</defaultSearchField>
       ... <copyField source="comments" dest="content"/> ... ...
Liferay + Solr: Configuration 3



Copy WAR file
 ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war
   into $SOLR_HOME/example; where ${solr.version} represents Solr
   version number, i.e., 1.4.0.


Start Liferay/tomcat
 ● Solr will be picked up and "solr" will be deployed automatically under
   ${tomcat}/webapps folder


Install solr-web Liferay plugin
 ● Latest Liferay plugin can be checked out from the following location
http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web
 ● Build the checked out plugin and deploy it
Liferay + Solr: Configuration 4


Final Step
 ● We need to rebuild Liferay search indexes
 ● Control Panel > Server Administration
Liferay + Solr: How it works


 solr-spring.xml (from solr-web plugin)

  ...
  <bean id="solrServer"
         class="com.liferay.portal.search.solr.server.BasicAuthSolrServer">
      <constructor-arg type="java.lang.String"
                     value="http://localhost:8080/solr" />
  </bean>
  <bean id="indexSearcher.solr"
         class="com.liferay.portal.search.solr.SolrIndexSearcherImpl">
<property name="solrServer" ref="solrServer" />
  </bean>
  <bean id="indexWriter.solr"
         class="com.liferay.portal.search.solr.SolrIndexWriterImpl">
<property name="commit" value="true" />
<property name="solrServer" ref="solrServer" />
  </bean>
  ...
Liferay + Solr: Back to the default?


● Simply undeploy solr-web plugin
● Rebuild search indexes using the control panel described
  in the previous step

Más contenido relacionado

La actualidad más candente

Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksAlexandre Rafalovitch
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)Alexandre Rafalovitch
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 

La actualidad más candente (20)

Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr 4
Solr 4Solr 4
Solr 4
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 

Similar a Apache solr liferay

Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solrNet7
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdutionXuan-Chao Huang
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentAlkacon Software GmbH & Co. KG
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBertrand Delacretaz
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampKais Hassan, PhD
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 

Similar a Apache solr liferay (20)

Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Solr a.b-ab
Solr a.b-abSolr a.b-ab
Solr a.b-ab
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve contentOpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
OpenCms Days 2012 - OpenCms 8.5: Using Apache Solr to retrieve content
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr workshop
Solr workshopSolr workshop
Solr workshop
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Information Retrieval - Data Science Bootcamp
Information Retrieval - Data Science BootcampInformation Retrieval - Data Science Bootcamp
Information Retrieval - Data Science Bootcamp
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Solr5
Solr5Solr5
Solr5
 

Último

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Último (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.How Tech Giants Cut Corners to Harvest Data for A.I.
How Tech Giants Cut Corners to Harvest Data for A.I.
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

Apache solr liferay

  • 1. Apache Solr Enterprise search platform from the Apache Lucene project Rivet Logic Corporation 1800 Alexander Bell Drive Suite 400 Reston, VA 20191 Ph: 703.955.3480 Fax: 703.234.7711
  • 2. What is Solr? ● Search Server ● Built upon Apache Lucene ● Fast, very ● Scalable, query load and collection size ● Interoperable ● Extensible ● Lucene power exposed over HTTP ● Spell checking, highlighting, faceting and etc. ● Caching ● Replication ● Distributed search
  • 4. schema.xml ● Field types ○ <fieldType name="text" class="solr.TextField" indexed="true" /> ● Fields ○ <field name="technologies" type="text" indexed="true" stored="true" multiValued="true"/> ● Unique key (optional) ○ <uniqueKey>id</uniqueKey> ● copy fields ○ <copyField source="developers" dest="df"/> ● dynamic fields ○ <dynamicField name="*_dt" type="date" indexed="true" stored="true"/> ● similarity configuration ○ Similarity is the scoring routine for each document vs. a query
  • 5. solrconfig.xml ● Lucene indexing parameters ○ <mergeFactor>10</mergeFactor> ○ <ramBufferSizeMB>32</ramBufferSizeMB> ● Cache settings ○ <queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount=" 32"/> ● Request handler configuration ○ <requestHandler name="dismax" class="solr.SearchHandler" > ● HTTP cache settings ○ <httpCaching lastModifiedFrom="openTime" etagSeed="Solr"> ● Search components, response writers, query parsers ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ○ <queryResponseWriter name="velocity" class="org.apache.solr.request. VelocityResponseWriter"/> ○ <queryParser name="lucene" class="org.apache.solr.search.LuceneQParserPlugin"/>
  • 6. Request Handler <requestHandler name="/itas" class="solr.SearchHandler"> <lst name="defaults"> <str name="v.template">browse</str> <str name="v.properties">velocity.properties</str> <str name="title">Solritas</str> <str name="wt">velocity</str> <str name="defType">dismax</str> <str name="q.alt">*:*</str> <str name="rows">10</str> <str name="fl">*,score</str> <str name="facet">on</str> <str name="facet.field">df</str> <str name="facet.mincount">1</str> <str name="hl">true</str> <str name="hl.fl">developers</str> <str name="qf"> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 </str> </lst> </requestHandler>
  • 7. Response Writer ● A Response Writer generates the formatted response of a search. ● The wt parameter selects the Response Writer to be used ● json, php, phps, python, ruby, xml, xslt, velocity <queryResponseWriter name="xslt" class="org.apache.solr.request.XSLTResponseWriter"> <int name="xsltCacheLifetimeSeconds">5</int> </queryResponseWriter>
  • 8. Analyzers, Tokenizers, Filters ● The Analyzer class is a native Lucene concept that determines how tokens are produced from a piece of text <fieldType name="nametext" class="solr.TextField"> <analyzer class="org.apache.lucene.analysis.WhitespaceAnalyzer"/> </fieldType> ● The job of a tokenizer is to break up a stream of text into tokens ● A token looks at each Token in the stream sequentially and decides whether to pass it along, replace it or discard it <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StandardFilterFactory"/> </analyzer> </fieldType>
  • 9. Other features ● Highlighting ○ &hl=true&hl.fl=developers ● Synonyms ○ <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> ● Spell check ○ The spell check component can return a list of alternative spelling suggestions. ○ <searchComponent name="spellcheck" class="solr.SpellCheckComponent"> ● Content Streams ○ Allows Solr server to fetch local or remote data itself. Must enable remote streaming in solrconfig.xml ● Solr Cell ○ leveraging Tika, extracts and indexes rich documents such as Word, PDF, HTML, and many other types ● More like this ○ http://wiki.apache.org/solr/MoreLikeThis
  • 10. Indexing with solrJ SolrServer solr = new CommonsHttpSolrServer( new URL("http://localhost:8983/solr")); SolrInputDocument doc = new SolrInputDocument(); doc.addField("id", "EXAMPLEDOC01"); doc.addField("title", "NOVAJUG SolrJ Example"); solr.add(doc); solr.commit(); // after a batch, not per document solr.optimize(); // periodically, if/when needed
  • 11. Data Import Handler ● Indexes relational database, XML data, and e-mail sources ● Supports full and incremental/delta indexing ● Highly extensible with custom data sources, transformers, etc ● http://wiki.apache.org/solr/DataImportHandler
  • 12. Replication ● Master is polled ● Replicant pulls Lucene index and optionally also Solr configuration files ● Query throughput scaling: replicate and load balance ● http://wiki.apache.org/solr/SolrReplication
  • 13. Demo ● Download solr ○ http://mirrors.ibiblio.org/pub/mirrors/apache/lucene/solr/1.4.0/ ● Start solr ○ cd <solr_home>/example ○ java -jar start.jar ● Post documents ○ cd <solr_home>/example/exampledocs ○ java -jar post.jar *.xml ○ java -jar post.jar cw.xml ● Access Solr ○ http://localhost:8983/solr/admin/ ● Querying solr ○ http://localhost:8983/solr/select/?q=binesh ○ http://localhost:8983/solr/select/?q=binny ○ http://localhost:8983/solr/select/?q=binesh&facet=true&facet.field=df&facet.mincount=1 ○ http://localhost:8983/solr/itas/ ● Luke ○ http://www.getopt.org/luke/
  • 14. Liferay + Solr: Motivation ● Centralizing search index in clustered Liferay environment ● Performance improvement ○ Re-indexing costs too much for large DB's ○ Often time indexes of Liferay deployments in a cluster are not synchronized
  • 15. Liferay + Solr: Configuration 1 Install Solr (http://lucene.apache.org/solr) Setting up environment variables ● SOLR_HOME = /${solr installed folder} ● JAVA_OPTS = "$JAVA_OPTS -Dsolr.solr.home=$SOLR_HOME/example/solr/data" solr.xml ● Place the file under ${tomcat}/conf/Catalina/localhost/ with following content <?xml version="1.0" encoding="utf-8"> <Context docBase="$SOLR_HOME/apache-solr-1.4.0.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="$SOLR_HOME" override="true" /> </Context>
  • 16. Liferay + Solr: Configuration 2 schema.xml ● This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. ● Copy this file from solr-web plugin to $SOLR_HOME/conf (you may have to create the conf directory) in your Solr home folder. ... <fields> <field name="comments" type="text" indexed="true" stored="true" /> <field name="content" type="text" indexed="true" stored="true" /> <field name="description" type="text" indexed="true" stored="true" /> <field name="name" type="text" indexed="true" stored="true" /> <field name="properties" type="text" indexed="true" stored="true" /> <field name="title" type="text" indexed="true" stored="true" /> <field name="uid" type="string" indexed="true" stored="true" /> <field name="url" type="text" indexed="true" stored="true" /> <field name="userName" type="text" indexed="true" stored="true" /> <field name="version" type="text" indexed="true" stored="true" /> <dynamicField name="*" type="string" indexed="true" stored="true" /> </fields> <uniqueKey>uid</uniqueKey> <defaultSearchField>content</defaultSearchField> ... <copyField source="comments" dest="content"/> ... ...
  • 17. Liferay + Solr: Configuration 3 Copy WAR file ● Copy the WAR file $SOLR_HOME/dist/apache-solr-${solr.version}.war into $SOLR_HOME/example; where ${solr.version} represents Solr version number, i.e., 1.4.0. Start Liferay/tomcat ● Solr will be picked up and "solr" will be deployed automatically under ${tomcat}/webapps folder Install solr-web Liferay plugin ● Latest Liferay plugin can be checked out from the following location http://svn.liferay.com/repos/public/plugins/trunk/webs/solr-web ● Build the checked out plugin and deploy it
  • 18. Liferay + Solr: Configuration 4 Final Step ● We need to rebuild Liferay search indexes ● Control Panel > Server Administration
  • 19. Liferay + Solr: How it works solr-spring.xml (from solr-web plugin) ... <bean id="solrServer" class="com.liferay.portal.search.solr.server.BasicAuthSolrServer"> <constructor-arg type="java.lang.String" value="http://localhost:8080/solr" /> </bean> <bean id="indexSearcher.solr" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl"> <property name="solrServer" ref="solrServer" /> </bean> <bean id="indexWriter.solr" class="com.liferay.portal.search.solr.SolrIndexWriterImpl"> <property name="commit" value="true" /> <property name="solrServer" ref="solrServer" /> </bean> ...
  • 20. Liferay + Solr: Back to the default? ● Simply undeploy solr-web plugin ● Rebuild search indexes using the control panel described in the previous step