SlideShare una empresa de Scribd logo
1 de 30
Enterprise Search platform
Building solid scalable enterprise search REST services on top of Apache Lucene




                                 Tommaso Teofili
Agenda

• Apache Lucene overview


• Why do we need Apache Solr?


• Everyman tales from Solr


• Enterprise what?


• One step beyond...
Apache Lucene overview

• Information Retrieval library


• Inverted indexes are quick and efficient


• Vector space model


• Advanced search options (synonims, stopwords, similarity, nearness)


• Different language implementations (Java, .NET, C, Python)
The Lucene API

• Lucene indexes are built on a Directory


• Directory can be accessed by IndexReaders and IndexWriters


• IndexSearchers are built on top of Directories and IndexReaders


• IndexWriters can write Documents inside the index


• Documents are made of Fields


• Fields have value(s) and options


• Directory > IndexReader/Writer > Document > Field
Indexing Lucene
Indexing Lucene

• A Lucene index has one or more segments and a generation


• Changes to the index must be committed (and optimized)


• No fixed schema


• Each field can be STORED, INDEXED and ANALYZED


• Each field can have NORMS and TERM VECTORS
Searching Lucene

• Open an IndexSearcher on top of an IndexReader over a Directory


• Many query types: TermQuery, MultiTermQuery, BooleanQuery,
  WildcardQuery, PhraseQuery, PrefixQuery, MultiPhraseQuery, FuzzyQuery,
  TermRangeQuery, NumericRangeQuery


• Get results from a TopDocs object
Why do we need Apache Solr?

• Lucene is a library

• Lucene by itself can only be queried programmatically

• Often the search system has to be totally independent from other
  systems (i.e.: CMS)

• A ready to deploy search server is what you need

• Need to scale both vertically and horizontally
The Solar System
Everyman tales with Solr
Apache Solr - Overview

• Ready to use enterprise search server


• REST (and programmatic) API


• Results in XML, JSON, PHP, Ruby, etc...


• Exploit Lucene power


• Scaling capabilities (replication, distributed search)


• Easy administration interface


• Easy to extend and customize (plugin architecture)
Apache Solr - Project status

• Latest release 1.4.1 on June 2010


• Lots of new features on trunk


• Most of new features on branch 3.0


• A huge very active community


• Lucid Imagination powered project
Solr - 5 minutes tutorial

• Download latest release (1.4.1)


• cd $SOLR_HOME/example


• java -jar -server start.jar


• You have an up and running Solr instance you can access via http://localhost:8983/solr
  (this runs on top of Jetty)


• cd $SOLR_HOME/example/exampledocs


• Index with the command: sh post.sh *.xml


• Search with your browser
Solr - Query syntax

• Default operator is OR (you can override adding &q.op=AND to the HTTP req)


• You can query fields with fieldname:value


• Common + - AND OR NOT modifiers


• Range queries on date or numeric fields timestamp:[* TO NOW]


• Boost terms, i.e.: roma^2 inter


• Fuzzy search roam~0.6


• ...
Solr - Basic configuration steps
• Define fields, types and analysis inside schema.xml


• Play with solrconfig.xml:


    • request handlers (update, search)


    • index parameters


    • caches


    • deletion policy


    • autowarming


    • replication, clustering, etc...
Solr - schema.xml

• Types


• Analyzers to use for each type


• Fields with name, type and options


• Unique key


• Dynamic fields


• Copy fields


• Don’t use the default schema.xml, write it from scratch!
Solr - Type definition
                        Analyzers for querying and indexing
  inside the schema
Solr - solrconfig.xml

• Where Solr will write the index


• Index merge factor


• Control different caches: documents, query results, filters


• Request handlers available to consume (HTTP) requests, typically at least a (standard)
  search and an update handler exist


• Update request processor chains to configure indexing behavior


• Event listeners (newSearcher, firstSearcher)


• and much more...
Solr - Indexing

• Update requests on index are given with XML commands via HTTP POST


• <add> to insert and update




• <del> to remove by unique key or query
Solr - Searching

• HTTP GET to Solr instance with mandatory q parameter which specify the
  query


• df - the default field to query


• fl - the list of fields to return (stored fields only)


• sort - fields used for sorting, default to score (it’s not a field)


• start, rows - paging attributes


• wt - response type, default to xml, can be json, php, ruby, etc
Solr - Data import

• Typically “old” systems rely on databases


• Data can be imported from DBs using the DataImportHandler component


• Define datasource, driver and mappings
Solr - Highlighting

• Useful when a snippet of the search results is needed


• In Solr 1.4.1 only stored fields can be highlighted


• Add &hl=true&hl.fl=field1,field2 to HTTP search request in order to enable
  highlighting on field1 and field2
Solr - Faceting

• Break up search results into multiple categories showing counts for each


• Often used in stores


• Can be very useful in guiding user experience


• User can then drill down only results of a certain category
Solr - Filter queries

• Queries used as filters against the actual query


• Define document superset without influencing score


• Useful for domain specific queries where you want the user to search only in
  certain “areas” of the index


• Add &fq=somefilterquery with the default Solr syntax
Solr - Enterprise
what?
Multicore
Replication
Distributed search
...
Solr - Multi core

• Define multiple Solr cores inside one only Solr instance


• Each cores maintain its own index


• Unified administration interface


• Runtime commands to create, swap, load, unload, delete cores
Solr - Replication

• It’s useful in case of high traffic to replicate a Solr instance and split (with
  eventually some load balancer in front) the queries


• Master has the original index


• Slave polls master asking the last version of index


• If slave has a lower version of the index asks the master for the difference
  (rsync like)


• In the meanwhile indexes remain available
Solr - Distributed search

• When an index is too large, in terms of space or memory required, it can be
  useful to define two or more shards


• A shard is a Solr instance and can be searched or indexed independently


• At the same time it’s possible to query all the shards having the result be
  merged from the sub-results of each shard


• http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/
  solr&indent=true&q=category:information


• Note that the document distribution among indexes is up to the user (or who
  feeds the indexes)
One step beyond...

• Solr in the cloud


• Spatial search


• Solr & UIMA :-)
References

• http://lucene.apache.org/solr/


• http://lucene.apache.org/solr/tutorial.html


• http://wiki.apache.org/solr/FrontPage

Más contenido relacionado

La actualidad más candente

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Lucidworks
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakTommaso Teofili
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solrlucenerevolution
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr Tommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrJayesh Bhoyar
 

La actualidad más candente (20)

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Flexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit OakFlexible search in Apache Jackrabbit Oak
Flexible search in Apache Jackrabbit Oak
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Make your gui shine with ajax solr
Make your gui shine with ajax solrMake your gui shine with ajax solr
Make your gui shine with ajax solr
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Scaling search in Oak with Solr
Scaling search in Oak with Solr Scaling search in Oak with Solr
Scaling search in Oak with Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 

Similar a Enterprise Search platform building scalable REST services on Apache Lucene

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0Anshum Gupta
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for DrupalChris Caple
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Shalin Shekhar Mangar
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2GokulD
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
Apachesolr presentation
Apachesolr presentationApachesolr presentation
Apachesolr presentationfreeformkurt
 
PLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationPLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationAlfresco Software
 

Similar a Enterprise Search platform building scalable REST services on Apache Lucene (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr
SolrSolr
Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Apache solr
Apache solrApache solr
Apache solr
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
What's new in Solr 5.0
What's new in Solr 5.0What's new in Solr 5.0
What's new in Solr 5.0
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Lucene Bootcamp - 2
Lucene Bootcamp - 2Lucene Bootcamp - 2
Lucene Bootcamp - 2
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Apachesolr presentation
Apachesolr presentationApachesolr presentation
Apachesolr presentation
 
PLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR IntegrationPLAT-4 Understanding the SOLR Integration
PLAT-4 Understanding the SOLR Integration
 

Más de Tommaso Teofili

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRTommaso Teofili
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in SlingTommaso Teofili
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industryTommaso Teofili
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache HamaTommaso Teofili
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiTommaso Teofili
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaTommaso Teofili
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on codeTommaso Teofili
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA IntroductionTommaso Teofili
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU TourTommaso Teofili
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationTommaso Teofili
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the WebTommaso Teofili
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic SearchTommaso Teofili
 

Más de Tommaso Teofili (16)

Affect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IRAffect Enriched Word Embeddings for News IR
Affect Enriched Word Embeddings for News IR
 
Data replication in Sling
Data replication in SlingData replication in Sling
Data replication in Sling
 
Search engines in the industry
Search engines in the industrySearch engines in the industry
Search engines in the industry
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Machine learning with Apache Hama
Machine learning with Apache HamaMachine learning with Apache Hama
Machine learning with Apache Hama
 
Adapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGiAdapting Apache UIMA to OSGi
Adapting Apache UIMA to OSGi
 
Oak / Solr integration
Oak / Solr integrationOak / Solr integration
Oak / Solr integration
 
Domeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and ClerezzaDomeo, Text Mining, UIMA and Clerezza
Domeo, Text Mining, UIMA and Clerezza
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Apache UIMA - Hands on code
Apache UIMA - Hands on codeApache UIMA - Hands on code
Apache UIMA - Hands on code
 
Apache UIMA Introduction
Apache UIMA IntroductionApache UIMA Introduction
Apache UIMA Introduction
 
OSS Enterprise Search EU Tour
OSS Enterprise Search EU TourOSS Enterprise Search EU Tour
OSS Enterprise Search EU Tour
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Apache UIMA and Metadata Generation
Apache UIMA and Metadata GenerationApache UIMA and Metadata Generation
Apache UIMA and Metadata Generation
 
Data and Information Extraction on the Web
Data and Information Extraction on the WebData and Information Extraction on the Web
Data and Information Extraction on the Web
 
Apache UIMA and Semantic Search
Apache UIMA and Semantic SearchApache UIMA and Semantic Search
Apache UIMA and Semantic Search
 

Último

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Enterprise Search platform building scalable REST services on Apache Lucene

  • 1. Enterprise Search platform Building solid scalable enterprise search REST services on top of Apache Lucene Tommaso Teofili
  • 2. Agenda • Apache Lucene overview • Why do we need Apache Solr? • Everyman tales from Solr • Enterprise what? • One step beyond...
  • 3. Apache Lucene overview • Information Retrieval library • Inverted indexes are quick and efficient • Vector space model • Advanced search options (synonims, stopwords, similarity, nearness) • Different language implementations (Java, .NET, C, Python)
  • 4. The Lucene API • Lucene indexes are built on a Directory • Directory can be accessed by IndexReaders and IndexWriters • IndexSearchers are built on top of Directories and IndexReaders • IndexWriters can write Documents inside the index • Documents are made of Fields • Fields have value(s) and options • Directory > IndexReader/Writer > Document > Field
  • 6. Indexing Lucene • A Lucene index has one or more segments and a generation • Changes to the index must be committed (and optimized) • No fixed schema • Each field can be STORED, INDEXED and ANALYZED • Each field can have NORMS and TERM VECTORS
  • 7. Searching Lucene • Open an IndexSearcher on top of an IndexReader over a Directory • Many query types: TermQuery, MultiTermQuery, BooleanQuery, WildcardQuery, PhraseQuery, PrefixQuery, MultiPhraseQuery, FuzzyQuery, TermRangeQuery, NumericRangeQuery • Get results from a TopDocs object
  • 8. Why do we need Apache Solr? • Lucene is a library • Lucene by itself can only be queried programmatically • Often the search system has to be totally independent from other systems (i.e.: CMS) • A ready to deploy search server is what you need • Need to scale both vertically and horizontally
  • 11. Apache Solr - Overview • Ready to use enterprise search server • REST (and programmatic) API • Results in XML, JSON, PHP, Ruby, etc... • Exploit Lucene power • Scaling capabilities (replication, distributed search) • Easy administration interface • Easy to extend and customize (plugin architecture)
  • 12. Apache Solr - Project status • Latest release 1.4.1 on June 2010 • Lots of new features on trunk • Most of new features on branch 3.0 • A huge very active community • Lucid Imagination powered project
  • 13. Solr - 5 minutes tutorial • Download latest release (1.4.1) • cd $SOLR_HOME/example • java -jar -server start.jar • You have an up and running Solr instance you can access via http://localhost:8983/solr (this runs on top of Jetty) • cd $SOLR_HOME/example/exampledocs • Index with the command: sh post.sh *.xml • Search with your browser
  • 14. Solr - Query syntax • Default operator is OR (you can override adding &q.op=AND to the HTTP req) • You can query fields with fieldname:value • Common + - AND OR NOT modifiers • Range queries on date or numeric fields timestamp:[* TO NOW] • Boost terms, i.e.: roma^2 inter • Fuzzy search roam~0.6 • ...
  • 15. Solr - Basic configuration steps • Define fields, types and analysis inside schema.xml • Play with solrconfig.xml: • request handlers (update, search) • index parameters • caches • deletion policy • autowarming • replication, clustering, etc...
  • 16. Solr - schema.xml • Types • Analyzers to use for each type • Fields with name, type and options • Unique key • Dynamic fields • Copy fields • Don’t use the default schema.xml, write it from scratch!
  • 17. Solr - Type definition Analyzers for querying and indexing inside the schema
  • 18. Solr - solrconfig.xml • Where Solr will write the index • Index merge factor • Control different caches: documents, query results, filters • Request handlers available to consume (HTTP) requests, typically at least a (standard) search and an update handler exist • Update request processor chains to configure indexing behavior • Event listeners (newSearcher, firstSearcher) • and much more...
  • 19. Solr - Indexing • Update requests on index are given with XML commands via HTTP POST • <add> to insert and update • <del> to remove by unique key or query
  • 20. Solr - Searching • HTTP GET to Solr instance with mandatory q parameter which specify the query • df - the default field to query • fl - the list of fields to return (stored fields only) • sort - fields used for sorting, default to score (it’s not a field) • start, rows - paging attributes • wt - response type, default to xml, can be json, php, ruby, etc
  • 21. Solr - Data import • Typically “old” systems rely on databases • Data can be imported from DBs using the DataImportHandler component • Define datasource, driver and mappings
  • 22. Solr - Highlighting • Useful when a snippet of the search results is needed • In Solr 1.4.1 only stored fields can be highlighted • Add &hl=true&hl.fl=field1,field2 to HTTP search request in order to enable highlighting on field1 and field2
  • 23. Solr - Faceting • Break up search results into multiple categories showing counts for each • Often used in stores • Can be very useful in guiding user experience • User can then drill down only results of a certain category
  • 24. Solr - Filter queries • Queries used as filters against the actual query • Define document superset without influencing score • Useful for domain specific queries where you want the user to search only in certain “areas” of the index • Add &fq=somefilterquery with the default Solr syntax
  • 26. Solr - Multi core • Define multiple Solr cores inside one only Solr instance • Each cores maintain its own index • Unified administration interface • Runtime commands to create, swap, load, unload, delete cores
  • 27. Solr - Replication • It’s useful in case of high traffic to replicate a Solr instance and split (with eventually some load balancer in front) the queries • Master has the original index • Slave polls master asking the last version of index • If slave has a lower version of the index asks the master for the difference (rsync like) • In the meanwhile indexes remain available
  • 28. Solr - Distributed search • When an index is too large, in terms of space or memory required, it can be useful to define two or more shards • A shard is a Solr instance and can be searched or indexed independently • At the same time it’s possible to query all the shards having the result be merged from the sub-results of each shard • http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/ solr&indent=true&q=category:information • Note that the document distribution among indexes is up to the user (or who feeds the indexes)
  • 29. One step beyond... • Solr in the cloud • Spatial search • Solr & UIMA :-)

Notas del editor