SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Beyond full-text searches
With Lucene and Solr
Bertrand Delacrétaz
ApacheCon EU 2007, Amsterdam
bdelacretaz@apache.org
www.codeconsult.ch

slides revision: 2007-05-03
Slides at
http://wiki.apache.org/apachecon/Eu2007OnlineSessionSlides


How to graft a Lucene-based
dynamic navigation system
on an search-challenged CMS
using Solr.

As seen from the “Solr integrator” point of
view.

Beyond full-text?
tsrvideo.ch - a Solr client
tsrvideo.ch - a Solr client
The Project
Deliver a rich video player experience

Users explore much more than they search

Existing content stored in two separate CMSes
with very different content models
(and http/XML interfaces)
Client system overview

 Ajax + HTML

                Apache           Solr
                HTTP server      Search server


          HTTP/JSON                       Lucene
                                          index
                       replicated index
                       for backup
the Solr search server
What is Solr?

                                     Solr
                        HTTP/XML

                                    servlet


                                   Lucene
                                    index



See also http://wiki.apache.org/apachecon/Eu2007OnlineSessionSlides
Solr architecture




Diagram by Yonik Seeley
Indexing in Solr
<add>
 <doc>
  <field name=quot;idquot;>9885A004</field>
  <field name=quot;namequot;>Canon PowerShot SD500</field>
  <field name=quot;categoryquot;>camera</field>                 HTTP
  <field name=quot;featuresquot;>3x optical zoom</field>        POST
  <field name=quot;featuresquot;>aluminum case</field>
  <field name=quot;weightquot;>6.4</field>
  <field name=quot;pricequot;>329.95</field>
 </doc>
</add>


“Solr XML” documents are POSTed to Solr via HTTP
Field names and types are defined in the Solr schema
Solr indexing schema
<field name=quot;idquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot;/>
<field name=quot;categoryquot; type=quot;text_wsquot; indexed=quot;truequot; stored=quot;truequot;/>

<dynamicField name=quot;*_twsquot; type=quot;text_wsquot; indexed=quot;truequot; stored=quot;truequot;/>
<dynamicField name=quot;*_dtquot; type=quot;datequot; indexed=quot;truequot; stored=quot;truequot;/>

<fieldtype name=quot;sfloatquot;
 class=quot;Solr.SortableFloatFieldquot;sortMissingLast=quot;truequot;/>

<uniqueKey>id</uniqueKey>

<copyField source=quot;catquot; dest=quot;textquot;/>
<copyField source=quot;namequot; dest=quot;textquot;/>
<copyField source=quot;namequot; dest=quot;nameSortquot;/>
<copyField source=quot;manuquot; dest=quot;textquot;/>
Field content analysis
<fieldtype name=quot;text_frquot; class=quot;Solr.TextFieldquot;>
                                                   Le Châtelain
 <analyzer>
                                                   et ses chevaux
  <tokenizer
    class=quot;Solr.StandardTokenizerFactoryquot;/>
  <filter
    class=quot;Solr.ISOLatin1AccentFilterFactoryquot;/>
  <filter
    class=quot;Solr.LowerCaseFilterFactoryquot;/>
  <filter
    class=quot;Solr.StopFilterFactoryquot;
    words=quot;french-stopwords.txtquot;
    ignoreCase=quot;truequot;/>
  <filter
                                                   chatelain
    class=quot;Solr.SnowballPorterFilterFactoryquot;
                                                   cheval
    language=quot;Frenchquot;/>
  </analyzer>
</fieldtype>
Solr Field Analysis test page
Solr queries
http://solr.xy.com/select?q=apache & fl=solr_id,title

<result numFound=quot;2quot; start=quot;0quot;>
   <doc>
    <str name=quot;solr_idquot;>tsr.ch/story/4336075</str>
    <str name=quot;titlequot;>ApacheCon Amsterdam</str>
   </doc>
   <doc>
    <str name=quot;solr_idquot;>tsr.ch/story/1715414</str>
    <str name=quot;titlequot;>Geeks are upon us</str>
   </doc>
</result>                  Enhanced Lucene query
                           language as standard
Play it again, JSON!
http://solr.xy.com/select?q=apache&fl=solr_id,title&wt=json

{quot;responsequot;:
  {quot;numFoundquot;:54,quot;startquot;:0,
   quot;docsquot;:[
    {quot;solr_idquot;:quot;tsr.ch/story/4336075quot;,
      quot;titlequot;:quot;ApacheCon Amsterdamquot;
    },

   {quot;solr_idquot;:quot;tsr.ch/story/4336032quot;,
      quot;titlequot;:quot;Geeks are upon usquot;
    },
...
Solr live statistics
Solr Function Query
 A Function query influences the score by a function
 of a field's numeric value or ordinal.
 // OrdFieldSource
 ord(myfield)

 // ReverseOrdFieldSource
 rord(myfield)

 // LinearFloatFunction on numeric field value
 linear(myfield,1,2)

 // MaxFloatFunction of LinearFloatFunction on numeric field value or constant
 max(linear(myfield,1,2),100)

 // ReciprocalFloatFunction on numeric field value
 recip(myfield,1,2,3)

_val_:quot;linear(recip(rord(broadcast_date),1,1000,1000),11,0)quot;
That’s our client

 Ajax + HTML

                Apache        Solr
                HTTP server   Search server
                           Solr schema
                          and analyzers


          HTTP/JSON                Lucene
                                   index
Indexing
Indexing Process

                       cron
                       scheduler


   Legacy              curl,
            HTTP/XML               Solr XML

   CMS                 XSLT
                         Issues:
                         Change/delete signals?
                         Polling? RSS feeds?
                         Legacy content structure and consistency
                         Indexing delay
                         Deleted/retired documents
                         Non-transactional behavior
Content Normalization

   cms A
   XML
                                      HTTP
           Content           Solr
    HTTP
           Normalization     XML
   cms B     Convert to “Solr XML”.
   XML       Common field names.
             Normalized values.
             -> unified acces
Normalized and unified values
<field name=”solr.id”>story.cmsA.12129</field>

<field name=”role”>story</field>
<field name=”topic”>news</field>
<field name=”topic”>sports</field>

<field name=”author”>Bob S. Ponge</field>
<field name=”author.id”>person.438</field>

<field name=”link.related”>story.cmsB.73-1</field>
More than “just” full-text searches

<field name=”author.id”>person.438</field>

<field name=”link.related”>story.cmsB.73-1</field>
Content Mining?




                     content
                   normalization




      Unified navigation
      and queries
Testing
“How do I break this thing before it breaks by itself?”




                                  “testing” picture: taliesin on morguefile.com
Use-cases based testing
Do I find “cheval” when searching for “chevaux”?

Is document 98.345 found when searching for
“+montreux -casino AND role:story”?

etc...
Reference data required for such tests: Solr indexes
are collection of files that can easily be saved

Why not automate these? read on...
Automated functional testing
# Scenarii are executed by our auto-test tool, based
# on htmlunit (http://htmlunit.sourceforge.net/)

# test a query that returns no results
request : /solr/select?wt=xmlt&q=thismustnotbefound
match : /response/result/@numFound : 0
dontMatch : /response/result/@numFound : 1

# test a title query
request : /solr/select?q=title%3Afootball
match : contains(/response//doc[1]/str[@name='title'],'ootball') : true

Test are run as JUnit tests, against a Solr instance.
Stress tests
Generate heaps of
semi-random query
URLs, and replay
them in many HTTP
clients simultaneously
using httpstone                                         http://code.google.com/p/httpstone/

http://solr...&q=quot;attirerquot; role:audio quot;enfantsquot; quot;fidéliserquot;
http://solr...&q=quot;fidéliserquot; quot;carottesquot; role:story quot;enfants...quot;
http://solr...&q=quot;surtoutquot; quot;adultesquot; quot;histoirequot; quot;L'avisquot;
http://solr...&q=quot;Résultatsquot; quot;enfants...quot; quot;publicitéquot; role:video
http://solr...&q=quot;lunettesquot; quot;différences?quot; quot;Résultatsquot; quot;fabrications,quot;
http://solr...&q=quot;attirerquot; role:story quot;solaires:quot; quot;rend-t-onquot;
http://solr...&q=role:audio quot;quellesquot; quot;Mêmesquot; quot;Mêmesquot;
...
Test outcomes
Explain search features with use cases

Avoid regressions with automated tests

Tune the index and analyzers with automated
functional tests

Get a feel for scalability with stress tests

Build confidence before launches!
Lessons Learned
Lessons Learned (a.k.a “conclusion”)
Solr opens the doors to Lucene!

Designing the “right” indexing
content model takes time.

Do not hesitate to duplicate
fields with different indexing
parameters, denormalized, aggregated, etc.

Content unification enables “content mining”.

Tune and run automated tests.
Repeat. Repeat. Repeat...
References

 http://lucene.apache.org/solr
 http://wiki.apache.org/solr/SolrResources
 http://lucene.apache.org/java
 http://lucenebook.com/

 “Modern Information Retrieval”, Ricardo Baeza-Yates
 http://www.ischool.berkeley.edu/~hearst/irbook/

 Other ApacheCon EU 2007 presentations:
 http://wiki.apache.org/apachecon/Eu2007OnlineSessionSlides

Más contenido relacionado

La actualidad más candente

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solrpittaya
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy SokolenkoProvectus
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature PreviewYonik Seeley
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache SolrBiogeeks
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From SolrRamzi Alqrainy
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptLucidworks
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Lucidworks
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and SparkLucidworks
 

La actualidad más candente (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
Solr 6 Feature Preview
Solr 6 Feature PreviewSolr 6 Feature Preview
Solr 6 Feature Preview
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Retrieving Information From Solr
Retrieving Information From SolrRetrieving Information From Solr
Retrieving Information From Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 

Destacado

Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisJosiane Gamgo
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
PhD Dissertation - Manuscrit de thèse de doctorat
PhD Dissertation - Manuscrit de thèse de doctoratPhD Dissertation - Manuscrit de thèse de doctorat
PhD Dissertation - Manuscrit de thèse de doctoratSaïd Radhouani
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialeckilucenerevolution
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
Analytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoopAnalytics in olap with lucene & hadoop
Analytics in olap with lucene & hadooplucenerevolution
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
 
Lucandra
LucandraLucandra
Lucandraotisg
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자Donghyeok Kang
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...Lucidworks
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMLucidworks
 
Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Adrien Grand
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartLucidworks
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 

Destacado (20)

Architecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's ThesisArchitecture and Implementation of Apache Lucene: Marter's Thesis
Architecture and Implementation of Apache Lucene: Marter's Thesis
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
PhD Dissertation - Manuscrit de thèse de doctorat
PhD Dissertation - Manuscrit de thèse de doctoratPhD Dissertation - Manuscrit de thèse de doctorat
PhD Dissertation - Manuscrit de thèse de doctorat
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
Portable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej BialeckiPortable Lucene Index Format & Applications - Andrzej Bialecki
Portable Lucene Index Format & Applications - Andrzej Bialecki
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Analytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoopAnalytics in olap with lucene & hadoop
Analytics in olap with lucene & hadoop
 
Lucene
LuceneLucene
Lucene
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Lucandra
LucandraLucandra
Lucandra
 
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
 
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?Berlin Buzzwords 2013 - How does lucene store your data?
Berlin Buzzwords 2013 - How does lucene store your data?
 
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, FlipkartNear Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 

Similar a Beyond full-text searches with Lucene and Solr

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solrLucidworks (Archived)
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)Pat Patterson
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSematext Group, Inc.
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdutionXuan-Chao Huang
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_osdstuartnz
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 
Rest with Java EE 6 , Security , Backbone.js
Rest with Java EE 6 , Security , Backbone.jsRest with Java EE 6 , Security , Backbone.js
Rest with Java EE 6 , Security , Backbone.jsCarol McDonald
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solradunne
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologiesBat Programmer
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Lucidworks
 

Similar a Beyond full-text searches with Lucene and Solr (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
OData: Universal Data Solvent or Clunky Enterprise Goo? (GlueCon 2015)
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Drupal for ng_os
Drupal for ng_osDrupal for ng_os
Drupal for ng_os
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rest
RestRest
Rest
 
Rest with Java EE 6 , Security , Backbone.js
Rest with Java EE 6 , Security , Backbone.jsRest with Java EE 6 , Security , Backbone.js
Rest with Java EE 6 , Security , Backbone.js
 
Add Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with SolrAdd Powerful Full Text Search to Your Web App with Solr
Add Powerful Full Text Search to Your Web App with Solr
 
04 data accesstechnologies
04 data accesstechnologies04 data accesstechnologies
04 data accesstechnologies
 
Webinar: What's New in Solr 6
Webinar: What's New in Solr 6Webinar: What's New in Solr 6
Webinar: What's New in Solr 6
 
ApacheCon 2005
ApacheCon 2005ApacheCon 2005
ApacheCon 2005
 

Más de Bertrand Delacretaz

VanillaJS & the Web Platform, a match made in heaven?
VanillaJS & the Web Platform, a match made in heaven?VanillaJS & the Web Platform, a match made in heaven?
VanillaJS & the Web Platform, a match made in heaven?Bertrand Delacretaz
 
Surviving large online communities with conciseness and clarity
Surviving large online communities with conciseness and clarity Surviving large online communities with conciseness and clarity
Surviving large online communities with conciseness and clarity Bertrand Delacretaz
 
Repoinit: a mini-language for content repository initialization
Repoinit: a mini-language for content repository initializationRepoinit: a mini-language for content repository initialization
Repoinit: a mini-language for content repository initializationBertrand Delacretaz
 
The Moving House Model, adhocracy and remote collaboration
The Moving House Model, adhocracy and remote collaborationThe Moving House Model, adhocracy and remote collaboration
The Moving House Model, adhocracy and remote collaborationBertrand Delacretaz
 
GraphQL in Apache Sling - but isn't it the opposite of REST?
GraphQL in Apache Sling - but isn't it the opposite of REST?GraphQL in Apache Sling - but isn't it the opposite of REST?
GraphQL in Apache Sling - but isn't it the opposite of REST?Bertrand Delacretaz
 
How to convince your left brain (or manager) to follow the Open Source path t...
How to convince your left brain (or manager) to follow the Open Source path t...How to convince your left brain (or manager) to follow the Open Source path t...
How to convince your left brain (or manager) to follow the Open Source path t...Bertrand Delacretaz
 
L'Open Source change le Monde - BlendWebMix 2019
L'Open Source change le Monde - BlendWebMix 2019L'Open Source change le Monde - BlendWebMix 2019
L'Open Source change le Monde - BlendWebMix 2019Bertrand Delacretaz
 
Shared Neurons - the Secret Sauce of Open Source communities?
Shared Neurons - the Secret Sauce of Open Source communities?Shared Neurons - the Secret Sauce of Open Source communities?
Shared Neurons - the Secret Sauce of Open Source communities?Bertrand Delacretaz
 
Sling and Serverless, Best Friends Forever?
Sling and Serverless, Best Friends Forever?Sling and Serverless, Best Friends Forever?
Sling and Serverless, Best Friends Forever?Bertrand Delacretaz
 
Serverless - introduction et perspectives concrètes
Serverless - introduction et perspectives concrètesServerless - introduction et perspectives concrètes
Serverless - introduction et perspectives concrètesBertrand Delacretaz
 
State of the Feather - ApacheCon North America 2018
State of the Feather - ApacheCon North America 2018State of the Feather - ApacheCon North America 2018
State of the Feather - ApacheCon North America 2018Bertrand Delacretaz
 
Karate, the black belt of HTTP API testing?
Karate, the black belt of HTTP API testing?Karate, the black belt of HTTP API testing?
Karate, the black belt of HTTP API testing?Bertrand Delacretaz
 
Open Source at Scale: the Apache Software Foundation (2018)
Open Source at Scale: the Apache Software Foundation (2018)Open Source at Scale: the Apache Software Foundation (2018)
Open Source at Scale: the Apache Software Foundation (2018)Bertrand Delacretaz
 
They don't understand me! Tales from the multi-cultural trenches
They don't understand me! Tales from the multi-cultural trenchesThey don't understand me! Tales from the multi-cultural trenches
They don't understand me! Tales from the multi-cultural trenchesBertrand Delacretaz
 
Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)
Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)
Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)Bertrand Delacretaz
 
Project and Community Services the Apache Way
Project and Community Services the Apache WayProject and Community Services the Apache Way
Project and Community Services the Apache WayBertrand Delacretaz
 
La Fondation Apache - keynote au Paris Open Source Summit 2017
La Fondation Apache - keynote au Paris Open Source Summit 2017La Fondation Apache - keynote au Paris Open Source Summit 2017
La Fondation Apache - keynote au Paris Open Source Summit 2017Bertrand Delacretaz
 
Asynchronous Decision Making - FOSS Backstage 2017
Asynchronous Decision Making - FOSS Backstage 2017Asynchronous Decision Making - FOSS Backstage 2017
Asynchronous Decision Making - FOSS Backstage 2017Bertrand Delacretaz
 
Building an Apache Sling Rendering Farm
Building an Apache Sling Rendering FarmBuilding an Apache Sling Rendering Farm
Building an Apache Sling Rendering FarmBertrand Delacretaz
 

Más de Bertrand Delacretaz (20)

VanillaJS & the Web Platform, a match made in heaven?
VanillaJS & the Web Platform, a match made in heaven?VanillaJS & the Web Platform, a match made in heaven?
VanillaJS & the Web Platform, a match made in heaven?
 
Surviving large online communities with conciseness and clarity
Surviving large online communities with conciseness and clarity Surviving large online communities with conciseness and clarity
Surviving large online communities with conciseness and clarity
 
Repoinit: a mini-language for content repository initialization
Repoinit: a mini-language for content repository initializationRepoinit: a mini-language for content repository initialization
Repoinit: a mini-language for content repository initialization
 
The Moving House Model, adhocracy and remote collaboration
The Moving House Model, adhocracy and remote collaborationThe Moving House Model, adhocracy and remote collaboration
The Moving House Model, adhocracy and remote collaboration
 
GraphQL in Apache Sling - but isn't it the opposite of REST?
GraphQL in Apache Sling - but isn't it the opposite of REST?GraphQL in Apache Sling - but isn't it the opposite of REST?
GraphQL in Apache Sling - but isn't it the opposite of REST?
 
Open Source Changes the World!
Open Source Changes the World!Open Source Changes the World!
Open Source Changes the World!
 
How to convince your left brain (or manager) to follow the Open Source path t...
How to convince your left brain (or manager) to follow the Open Source path t...How to convince your left brain (or manager) to follow the Open Source path t...
How to convince your left brain (or manager) to follow the Open Source path t...
 
L'Open Source change le Monde - BlendWebMix 2019
L'Open Source change le Monde - BlendWebMix 2019L'Open Source change le Monde - BlendWebMix 2019
L'Open Source change le Monde - BlendWebMix 2019
 
Shared Neurons - the Secret Sauce of Open Source communities?
Shared Neurons - the Secret Sauce of Open Source communities?Shared Neurons - the Secret Sauce of Open Source communities?
Shared Neurons - the Secret Sauce of Open Source communities?
 
Sling and Serverless, Best Friends Forever?
Sling and Serverless, Best Friends Forever?Sling and Serverless, Best Friends Forever?
Sling and Serverless, Best Friends Forever?
 
Serverless - introduction et perspectives concrètes
Serverless - introduction et perspectives concrètesServerless - introduction et perspectives concrètes
Serverless - introduction et perspectives concrètes
 
State of the Feather - ApacheCon North America 2018
State of the Feather - ApacheCon North America 2018State of the Feather - ApacheCon North America 2018
State of the Feather - ApacheCon North America 2018
 
Karate, the black belt of HTTP API testing?
Karate, the black belt of HTTP API testing?Karate, the black belt of HTTP API testing?
Karate, the black belt of HTTP API testing?
 
Open Source at Scale: the Apache Software Foundation (2018)
Open Source at Scale: the Apache Software Foundation (2018)Open Source at Scale: the Apache Software Foundation (2018)
Open Source at Scale: the Apache Software Foundation (2018)
 
They don't understand me! Tales from the multi-cultural trenches
They don't understand me! Tales from the multi-cultural trenchesThey don't understand me! Tales from the multi-cultural trenches
They don't understand me! Tales from the multi-cultural trenches
 
Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)
Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)
Prise de Décisions Asynchrone, Devoxx France 2018 (avec vidéo)
 
Project and Community Services the Apache Way
Project and Community Services the Apache WayProject and Community Services the Apache Way
Project and Community Services the Apache Way
 
La Fondation Apache - keynote au Paris Open Source Summit 2017
La Fondation Apache - keynote au Paris Open Source Summit 2017La Fondation Apache - keynote au Paris Open Source Summit 2017
La Fondation Apache - keynote au Paris Open Source Summit 2017
 
Asynchronous Decision Making - FOSS Backstage 2017
Asynchronous Decision Making - FOSS Backstage 2017Asynchronous Decision Making - FOSS Backstage 2017
Asynchronous Decision Making - FOSS Backstage 2017
 
Building an Apache Sling Rendering Farm
Building an Apache Sling Rendering FarmBuilding an Apache Sling Rendering Farm
Building an Apache Sling Rendering Farm
 

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Beyond full-text searches with Lucene and Solr

  • 1. Beyond full-text searches With Lucene and Solr Bertrand Delacrétaz ApacheCon EU 2007, Amsterdam bdelacretaz@apache.org www.codeconsult.ch slides revision: 2007-05-03
  • 2. Slides at http://wiki.apache.org/apachecon/Eu2007OnlineSessionSlides How to graft a Lucene-based dynamic navigation system on an search-challenged CMS using Solr. As seen from the “Solr integrator” point of view. Beyond full-text?
  • 3. tsrvideo.ch - a Solr client
  • 4. tsrvideo.ch - a Solr client
  • 5. The Project Deliver a rich video player experience Users explore much more than they search Existing content stored in two separate CMSes with very different content models (and http/XML interfaces)
  • 6. Client system overview Ajax + HTML Apache Solr HTTP server Search server HTTP/JSON Lucene index replicated index for backup
  • 8. What is Solr? Solr HTTP/XML servlet Lucene index See also http://wiki.apache.org/apachecon/Eu2007OnlineSessionSlides
  • 10. Indexing in Solr <add> <doc> <field name=quot;idquot;>9885A004</field> <field name=quot;namequot;>Canon PowerShot SD500</field> <field name=quot;categoryquot;>camera</field> HTTP <field name=quot;featuresquot;>3x optical zoom</field> POST <field name=quot;featuresquot;>aluminum case</field> <field name=quot;weightquot;>6.4</field> <field name=quot;pricequot;>329.95</field> </doc> </add> “Solr XML” documents are POSTed to Solr via HTTP Field names and types are defined in the Solr schema
  • 11. Solr indexing schema <field name=quot;idquot; type=quot;stringquot; indexed=quot;truequot; stored=quot;truequot;/> <field name=quot;categoryquot; type=quot;text_wsquot; indexed=quot;truequot; stored=quot;truequot;/> <dynamicField name=quot;*_twsquot; type=quot;text_wsquot; indexed=quot;truequot; stored=quot;truequot;/> <dynamicField name=quot;*_dtquot; type=quot;datequot; indexed=quot;truequot; stored=quot;truequot;/> <fieldtype name=quot;sfloatquot; class=quot;Solr.SortableFloatFieldquot;sortMissingLast=quot;truequot;/> <uniqueKey>id</uniqueKey> <copyField source=quot;catquot; dest=quot;textquot;/> <copyField source=quot;namequot; dest=quot;textquot;/> <copyField source=quot;namequot; dest=quot;nameSortquot;/> <copyField source=quot;manuquot; dest=quot;textquot;/>
  • 12. Field content analysis <fieldtype name=quot;text_frquot; class=quot;Solr.TextFieldquot;> Le Châtelain <analyzer> et ses chevaux <tokenizer class=quot;Solr.StandardTokenizerFactoryquot;/> <filter class=quot;Solr.ISOLatin1AccentFilterFactoryquot;/> <filter class=quot;Solr.LowerCaseFilterFactoryquot;/> <filter class=quot;Solr.StopFilterFactoryquot; words=quot;french-stopwords.txtquot; ignoreCase=quot;truequot;/> <filter chatelain class=quot;Solr.SnowballPorterFilterFactoryquot; cheval language=quot;Frenchquot;/> </analyzer> </fieldtype>
  • 14. Solr queries http://solr.xy.com/select?q=apache & fl=solr_id,title <result numFound=quot;2quot; start=quot;0quot;> <doc> <str name=quot;solr_idquot;>tsr.ch/story/4336075</str> <str name=quot;titlequot;>ApacheCon Amsterdam</str> </doc> <doc> <str name=quot;solr_idquot;>tsr.ch/story/1715414</str> <str name=quot;titlequot;>Geeks are upon us</str> </doc> </result> Enhanced Lucene query language as standard
  • 15. Play it again, JSON! http://solr.xy.com/select?q=apache&fl=solr_id,title&wt=json {quot;responsequot;: {quot;numFoundquot;:54,quot;startquot;:0, quot;docsquot;:[ {quot;solr_idquot;:quot;tsr.ch/story/4336075quot;, quot;titlequot;:quot;ApacheCon Amsterdamquot; }, {quot;solr_idquot;:quot;tsr.ch/story/4336032quot;, quot;titlequot;:quot;Geeks are upon usquot; }, ...
  • 17. Solr Function Query A Function query influences the score by a function of a field's numeric value or ordinal. // OrdFieldSource ord(myfield) // ReverseOrdFieldSource rord(myfield) // LinearFloatFunction on numeric field value linear(myfield,1,2) // MaxFloatFunction of LinearFloatFunction on numeric field value or constant max(linear(myfield,1,2),100) // ReciprocalFloatFunction on numeric field value recip(myfield,1,2,3) _val_:quot;linear(recip(rord(broadcast_date),1,1000,1000),11,0)quot;
  • 18. That’s our client Ajax + HTML Apache Solr HTTP server Search server Solr schema and analyzers HTTP/JSON Lucene index
  • 20. Indexing Process cron scheduler Legacy curl, HTTP/XML Solr XML CMS XSLT Issues: Change/delete signals? Polling? RSS feeds? Legacy content structure and consistency Indexing delay Deleted/retired documents Non-transactional behavior
  • 21. Content Normalization cms A XML HTTP Content Solr HTTP Normalization XML cms B Convert to “Solr XML”. XML Common field names. Normalized values. -> unified acces
  • 22. Normalized and unified values <field name=”solr.id”>story.cmsA.12129</field> <field name=”role”>story</field> <field name=”topic”>news</field> <field name=”topic”>sports</field> <field name=”author”>Bob S. Ponge</field> <field name=”author.id”>person.438</field> <field name=”link.related”>story.cmsB.73-1</field>
  • 23. More than “just” full-text searches <field name=”author.id”>person.438</field> <field name=”link.related”>story.cmsB.73-1</field>
  • 24. Content Mining? content normalization Unified navigation and queries
  • 25. Testing “How do I break this thing before it breaks by itself?” “testing” picture: taliesin on morguefile.com
  • 26. Use-cases based testing Do I find “cheval” when searching for “chevaux”? Is document 98.345 found when searching for “+montreux -casino AND role:story”? etc... Reference data required for such tests: Solr indexes are collection of files that can easily be saved Why not automate these? read on...
  • 27. Automated functional testing # Scenarii are executed by our auto-test tool, based # on htmlunit (http://htmlunit.sourceforge.net/) # test a query that returns no results request : /solr/select?wt=xmlt&q=thismustnotbefound match : /response/result/@numFound : 0 dontMatch : /response/result/@numFound : 1 # test a title query request : /solr/select?q=title%3Afootball match : contains(/response//doc[1]/str[@name='title'],'ootball') : true Test are run as JUnit tests, against a Solr instance.
  • 28. Stress tests Generate heaps of semi-random query URLs, and replay them in many HTTP clients simultaneously using httpstone http://code.google.com/p/httpstone/ http://solr...&q=quot;attirerquot; role:audio quot;enfantsquot; quot;fidéliserquot; http://solr...&q=quot;fidéliserquot; quot;carottesquot; role:story quot;enfants...quot; http://solr...&q=quot;surtoutquot; quot;adultesquot; quot;histoirequot; quot;L'avisquot; http://solr...&q=quot;Résultatsquot; quot;enfants...quot; quot;publicitéquot; role:video http://solr...&q=quot;lunettesquot; quot;différences?quot; quot;Résultatsquot; quot;fabrications,quot; http://solr...&q=quot;attirerquot; role:story quot;solaires:quot; quot;rend-t-onquot; http://solr...&q=role:audio quot;quellesquot; quot;Mêmesquot; quot;Mêmesquot; ...
  • 29. Test outcomes Explain search features with use cases Avoid regressions with automated tests Tune the index and analyzers with automated functional tests Get a feel for scalability with stress tests Build confidence before launches!
  • 31. Lessons Learned (a.k.a “conclusion”) Solr opens the doors to Lucene! Designing the “right” indexing content model takes time. Do not hesitate to duplicate fields with different indexing parameters, denormalized, aggregated, etc. Content unification enables “content mining”. Tune and run automated tests. Repeat. Repeat. Repeat...
  • 32. References http://lucene.apache.org/solr http://wiki.apache.org/solr/SolrResources http://lucene.apache.org/java http://lucenebook.com/ “Modern Information Retrieval”, Ricardo Baeza-Yates http://www.ischool.berkeley.edu/~hearst/irbook/ Other ApacheCon EU 2007 presentations: http://wiki.apache.org/apachecon/Eu2007OnlineSessionSlides