SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Introduction to
Apache Solr
Software is eating the world"
The search is eating the software
April 2014
2
Alexandre Rafalovitch
www.outerthoughts.com
Web search engines !
are quite sophisticated
3
4
But the real search needs !
are!
much DEEPER and BROADER
5
Searching code
6
Searching people and companies
7
Searching products
8
Searching library material
9
Searching languages
10
Understanding full-text search
SELECT * 

FROM database

WHERE field LIKE ‘%word%’"
This DOES NOT Scale"
Instead: "
break text into tokens"
domain-specific processing (e.g. lower-casing)"
build fast-access structures"
algorithms for term, phrases, proximity search
11
Basic search engine features
Search (Duh!): keyword, phrase, field-specific"
Positive and negative terms"
Sort: relevancy, recency"
Pagination"
Compact summary in results"
SPEED
12
Advanced search engine features
Facets/Taxonomy - based navigation with live counts"
Language-specific processing"
Domain-specific text processing (WiFi = Wi-Fi = WIFI)"
Geographic search"
More-like-this, did-you-mean, autocomplete"
Scaling/Clustering"
NOT web crawling - different, but related
13
Search engine solutions?
Solr"
Elastic Search"
Xapian"
Sphinx"
Zoie"
Groonga"
Searchdaimon"
{F}lexSearch"
Algolia (SaaS)"
Searchify (SaaS)"
ForageJS"
Lunr.js"
FACT-Finder"
DtSearch"
MarkLogic"
Verity"
Fast"
Most databases"
!
!
…AND MORE
14
Used with permission from SemaText
Open Source Search Evolution
15
Secret Ingredient - Lucene
Solr"
Elastic Search"
Zoie"
SwiftType"
PyLucene (Python wrapper)"
Lucene.net (C# port)
Scalable, high-performance
indexing"
Incremental indexing"
Full-text search"
Information-Retrieval algorithms"
Implemented in Java"
Written in 1999, still going strong
16
Secret Ingredient - Solr
Certified distributions"
LucidWorks"
HelioSearch"
Big Data platforms"
Cloudera"
Hortonworks HDP"
Hosted and SaaS"
Amazon CloudSearch"
WebSolr, SolrHQ, SearchBox
Lucene full-text-search"
XML and REST config"
Schema/Schemaless"
SolrCloud (clustering)"
Caching"
Near real-time"
Rich-document indexing (Tika inside)"
Plugins, components, processors
17
Solr Ecosystem sample
Drupal"
Project Blacklight"
LuxDB"
SolrMeter"
CrafterCMS"
Typo3"
Magenta"
HippoCMS"
ColdFusion"
SolrNet"
DataStax"
Dovecot"
NGData Lily"
Basho Riak"
YaCy"
Apache ManifoldCF"
Apache Camel"
FranzAllegrograph"
BitNami Solr Stack"
Carrot2!
Broadleaf Commerce"
Cloudera CDK!
CodeLibs Fess (フェス)!
Splunk"
Alfresco"
Rosette by BasisTech!
Luwak by Flax!
Quepid by OSC!
TwigKit!
SPM by SemaText!
SILK by LucidWorks!
Banana (O/S Solr
Kibana)
18
DEMO Time
19
DEMO - Basic
Unzip"
Go to example directory"
Run Solr"
Import some documents from example docs"
grep -l store *.xml | xargs ./post.sh"
Show off Solr 4 admin panel
20
DEMO - Browse handler
Restart Solr with -Dsolr.clustering.enabled=true"
Visit http://localhost:8983/solr/browse/ "
Show off"
Search"
Facets - Categories and Ranges"
Spatial/Geo-distance"
Clusters
21
DEMO - Thai specific
Index Thai and English text"
Search in English, Thai,Auto-transliterated Thai"
ShowAnalysis screen"
Code at: https://github.com/arafalov/solr-thai-test
22
Getting into Solr
23
Start for free
Download, unzip, cd example; java -jar start.jar"
Go through basic tutorial in docs/tutorial.html"
Copy example directory, modify schema.xml until happy"
If coming from ElasticSearch, look at example-schemaless"
Do NOT follow this path to production"
example schema is a kitchen sink !!!
24
Accelerate your learning
Buy my book - seriously. That’s what it’s for"
All code/data is at: https://github.com/arafalov/solr-indexing-book "
Buy Solr InAction - just published and is a great reference"
Use my www.solr-start.com resource and join the mailing list"
Join solr-user mailing list - full of advanced hackers"
Watch Lucid Revolution videos for background"
Start helping out on Stack Overflow #solr"
Blog what you learned, twit with #Solr
25
Pick a project - make it happen
Solr + Dart => Better search experience for Dart packages"
Solr consultants discovery website"
Visualise Solr search request - step by step"
Solr + your language => is client library up to date?"
ToDoMVC for Solr clients"
Package LARGE dataset for others (e.g. Project Gutenberg)"
Rebuild lernu.net Esperanto dictionary with Solr backend
26
With Solr, how far can I go?
Cloudera (BigData) has > 1,000,000,000 $USD
investments - opportunities?"
8M+ searches/day, 40 languages, 100ms NRT, 1024 cores,
256 shards, 32 servers on #solr at Bloomberg http://bit.ly/
1jmG72G (via @FlaxSearch)
27
Other Search-related books
Designing the Search Experience: The Information
Architecture of Discovery - by a TwigKit creator +1"
SearchAnalytics for Your Site: Conversations with Your
Customers by Louis Rosenfeld - see also Quepid"
Enterprise Search by Martin White
28
29
Alexandre Rafalovitch
www.outerthoughts.com

Más contenido relacionado

La actualidad más candente

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 

La actualidad más candente (20)

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache Solr/Lucene Internals  by Anatoliy SokolenkoApache Solr/Lucene Internals  by Anatoliy Sokolenko
Apache Solr/Lucene Internals by Anatoliy Sokolenko
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Data Science with Solr and Spark
Data Science with Solr and SparkData Science with Solr and Spark
Data Science with Solr and Spark
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHP
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 

Destacado

Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
Basis Technology
 

Destacado (20)

Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Solr: Search at the Speed of Light
Solr: Search at the Speed of LightSolr: Search at the Speed of Light
Solr: Search at the Speed of Light
 
Building a real time big data analytics platform with solr
Building a real time big data analytics platform with solrBuilding a real time big data analytics platform with solr
Building a real time big data analytics platform with solr
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Configuring Apache Solr for Thai Text Search
Configuring Apache Solr for Thai Text SearchConfiguring Apache Solr for Thai Text Search
Configuring Apache Solr for Thai Text Search
 
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian CarrierOSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
OSDF 2013 - Autopsy 3: Extensible Desktop Forensics by Brian Carrier
 
Optimizing multilingual search in SOLR
Optimizing multilingual search in SOLROptimizing multilingual search in SOLR
Optimizing multilingual search in SOLR
 
Rosette Search Essentials for Elasticsearch
Rosette Search Essentials for ElasticsearchRosette Search Essentials for Elasticsearch
Rosette Search Essentials for Elasticsearch
 
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
Moving Beyond Entity Extraction to Entity Resolution - Human Language Technol...
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham MoreheadSimple fuzzy Name Matching in Elasticsearch - Graham Morehead
Simple fuzzy Name Matching in Elasticsearch - Graham Morehead
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 

Similar a Introduction to Apache Solr

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
Chetan Giridhar
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 
Open source library management software
Open source library management softwareOpen source library management software
Open source library management software
Ann Marie Pipkin
 
<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />
tutorialsruby
 
Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...
Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...
Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...
tutorialsruby
 

Similar a Introduction to Apache Solr (20)

Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Search Intelligence @elo7.com
Search Intelligence @elo7.comSearch Intelligence @elo7.com
Search Intelligence @elo7.com
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
PyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in pythonPyCon India 2012: Rapid development of website search in python
PyCon India 2012: Rapid development of website search in python
 
How to build a custom search engine
How to build a custom search engineHow to build a custom search engine
How to build a custom search engine
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Bollean Search - NageshRao
Bollean Search - NageshRaoBollean Search - NageshRao
Bollean Search - NageshRao
 
Best Great Ideas on Java Research Papers
Best Great Ideas on Java Research PapersBest Great Ideas on Java Research Papers
Best Great Ideas on Java Research Papers
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer Presentation
 
Open source library management software
Open source library management softwareOpen source library management software
Open source library management software
 
<img src="../i/r_14.png" />
<img src="../i/r_14.png" /><img src="../i/r_14.png" />
<img src="../i/r_14.png" />
 
psager
psagerpsager
psager
 
psager
psagerpsager
psager
 
Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...
Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...
Content Outside of CONTENTdm: Part 1: Exhibit Creation Tool using <b>...&l...
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 

Más de Alexandre Rafalovitch

Más de Alexandre Rafalovitch (6)

JSON in Solr: from top to bottom
JSON in Solr: from top to bottomJSON in Solr: from top to bottom
JSON in Solr: from top to bottom
 
From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)From content to search: speed-dating Apache Solr (ApacheCON 2018)
From content to search: speed-dating Apache Solr (ApacheCON 2018)
 
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasksSearching for AI - Leveraging Solr for classic Artificial Intelligence tasks
Searching for AI - Leveraging Solr for classic Artificial Intelligence tasks
 
Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)Rapid Solr Schema Development (Phone directory)
Rapid Solr Schema Development (Phone directory)
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 

Último

Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Introduction to Apache Solr

  • 1. Introduction to Apache Solr Software is eating the world" The search is eating the software April 2014
  • 3. Web search engines ! are quite sophisticated 3
  • 4. 4
  • 5. But the real search needs ! are! much DEEPER and BROADER 5
  • 7. Searching people and companies 7
  • 11. Understanding full-text search SELECT * 
 FROM database
 WHERE field LIKE ‘%word%’" This DOES NOT Scale" Instead: " break text into tokens" domain-specific processing (e.g. lower-casing)" build fast-access structures" algorithms for term, phrases, proximity search 11
  • 12. Basic search engine features Search (Duh!): keyword, phrase, field-specific" Positive and negative terms" Sort: relevancy, recency" Pagination" Compact summary in results" SPEED 12
  • 13. Advanced search engine features Facets/Taxonomy - based navigation with live counts" Language-specific processing" Domain-specific text processing (WiFi = Wi-Fi = WIFI)" Geographic search" More-like-this, did-you-mean, autocomplete" Scaling/Clustering" NOT web crawling - different, but related 13
  • 14. Search engine solutions? Solr" Elastic Search" Xapian" Sphinx" Zoie" Groonga" Searchdaimon" {F}lexSearch" Algolia (SaaS)" Searchify (SaaS)" ForageJS" Lunr.js" FACT-Finder" DtSearch" MarkLogic" Verity" Fast" Most databases" ! ! …AND MORE 14
  • 15. Used with permission from SemaText Open Source Search Evolution 15
  • 16. Secret Ingredient - Lucene Solr" Elastic Search" Zoie" SwiftType" PyLucene (Python wrapper)" Lucene.net (C# port) Scalable, high-performance indexing" Incremental indexing" Full-text search" Information-Retrieval algorithms" Implemented in Java" Written in 1999, still going strong 16
  • 17. Secret Ingredient - Solr Certified distributions" LucidWorks" HelioSearch" Big Data platforms" Cloudera" Hortonworks HDP" Hosted and SaaS" Amazon CloudSearch" WebSolr, SolrHQ, SearchBox Lucene full-text-search" XML and REST config" Schema/Schemaless" SolrCloud (clustering)" Caching" Near real-time" Rich-document indexing (Tika inside)" Plugins, components, processors 17
  • 18. Solr Ecosystem sample Drupal" Project Blacklight" LuxDB" SolrMeter" CrafterCMS" Typo3" Magenta" HippoCMS" ColdFusion" SolrNet" DataStax" Dovecot" NGData Lily" Basho Riak" YaCy" Apache ManifoldCF" Apache Camel" FranzAllegrograph" BitNami Solr Stack" Carrot2! Broadleaf Commerce" Cloudera CDK! CodeLibs Fess (フェス)! Splunk" Alfresco" Rosette by BasisTech! Luwak by Flax! Quepid by OSC! TwigKit! SPM by SemaText! SILK by LucidWorks! Banana (O/S Solr Kibana) 18
  • 20. DEMO - Basic Unzip" Go to example directory" Run Solr" Import some documents from example docs" grep -l store *.xml | xargs ./post.sh" Show off Solr 4 admin panel 20
  • 21. DEMO - Browse handler Restart Solr with -Dsolr.clustering.enabled=true" Visit http://localhost:8983/solr/browse/ " Show off" Search" Facets - Categories and Ranges" Spatial/Geo-distance" Clusters 21
  • 22. DEMO - Thai specific Index Thai and English text" Search in English, Thai,Auto-transliterated Thai" ShowAnalysis screen" Code at: https://github.com/arafalov/solr-thai-test 22
  • 24. Start for free Download, unzip, cd example; java -jar start.jar" Go through basic tutorial in docs/tutorial.html" Copy example directory, modify schema.xml until happy" If coming from ElasticSearch, look at example-schemaless" Do NOT follow this path to production" example schema is a kitchen sink !!! 24
  • 25. Accelerate your learning Buy my book - seriously. That’s what it’s for" All code/data is at: https://github.com/arafalov/solr-indexing-book " Buy Solr InAction - just published and is a great reference" Use my www.solr-start.com resource and join the mailing list" Join solr-user mailing list - full of advanced hackers" Watch Lucid Revolution videos for background" Start helping out on Stack Overflow #solr" Blog what you learned, twit with #Solr 25
  • 26. Pick a project - make it happen Solr + Dart => Better search experience for Dart packages" Solr consultants discovery website" Visualise Solr search request - step by step" Solr + your language => is client library up to date?" ToDoMVC for Solr clients" Package LARGE dataset for others (e.g. Project Gutenberg)" Rebuild lernu.net Esperanto dictionary with Solr backend 26
  • 27. With Solr, how far can I go? Cloudera (BigData) has > 1,000,000,000 $USD investments - opportunities?" 8M+ searches/day, 40 languages, 100ms NRT, 1024 cores, 256 shards, 32 servers on #solr at Bloomberg http://bit.ly/ 1jmG72G (via @FlaxSearch) 27
  • 28. Other Search-related books Designing the Search Experience: The Information Architecture of Discovery - by a TwigKit creator +1" SearchAnalytics for Your Site: Conversations with Your Customers by Louis Rosenfeld - see also Quepid" Enterprise Search by Martin White 28