SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Powerful Full-Text Search
with Solr
Jay Bharat
jay@carmatec.com
Carmatec It solution, Bangalore
1 July 2013

1
An introduction to Solr
Implementing search with free
software

2
Solr Tm -1/2

3
Solr Tm-2/2

4
What is Solr?
•  Solr is an open source enterprise search
server based on the Lucene Java search
library.
•  Solr runs in a Java servlet container such
as Tomcat or Jetty
•  Solr is free software and a project of the
Apache Software Foundation
•  Solr is a sub-project of Lucene and can be
found at http://lucene.apache.org/solr/

5
Key Features
•  Advanced Full-Text search
•  Optimized for High Volume Web Traffic
•  Standards Based Open Interfaces – XML and
HTTP
•  Comprehensive HTML Administration Interface
•  Server statistics exposed over JMX for monitoring
•  Scalability through efficient replication
•  Flexibility with XML configuration and Plugins
•  Push vs Crawl indexing method
6
Solr Clients
•  Solr can be integrated with, among others…
–  Ruby
–  PHP
–  Java
–  Python
–  JSON
–  Forrest/Cocoon
–  C# or Deveel Solr Client or solrnet
–  Coldfusion
–  Drupal or apacheSolr project for Drupal
7
Indexing
• 
• 
• 
• 

Push vs Crawl
Schema.xml
Add documents
HTML interface
–  Update
–  Delete
–  Commit

•  DataImportHandler
–  For searching databases

8
Searching
•  Full text search
http://localhost:8983/solr/select?q=Iraq
§  Search only within a field
http://localhost:8983/solr/select?
q=category:news
§  Control which fields are displayed in result
http://localhost:8983/solr/select?
q=video&fl=id,category
9
§  Provide ranges to fields
More Searching
•  Faceting information
http://localhost:8983/solr/select?
q=news&fl=id,description&facet=true&facet.fi
eld=category
§  More like this (MLT)
http://localhost:8983/solr/select?
q=Iraq&mlt=true&mlt.fl=headline&mlt.mindf=1
&mlt.mintf=1&fl=id,score&rows=100
•  More information on how this works and the
options available can be found at
http://wiki.apache.org/solr/MoreLikeThis
10
QueryResponseWriter
§  A QueryResponseWriter is a Solr Plugin
that defines the response format for any
request
§  All of the requests we have made so far
are formatted with the
XMLResponseWriter
§  Other formats can be applied by
appending wt=format to the search string
like this:
http://localhost:8983/solr/select?q=date:

11
Acknowledgements
•  Search smarter with Apache Solr, Part 1:
Essential features and the Solr schema
–  http://www.ibm.com/developerworks/java/
library/j-solr1/

•  Solr Tutorial from Lucid Imagination
–  http://www.lucidimagination.com/Community/
Hear-from-the-Experts/Podcasts-and-Videos/
Solr-Tutorial

•  Solr Wiki
–  http://wiki.apache.org/solr/

12
Powered by Lucene
•  Wikipedia
•  Internet Archive
•  LinkedIn
•  monster.com

13
Indexing
aardvark

0

Little Red Riding Hood
hood

0

1

little

0

2

1

Robin Hood
red

0

riding

0

robin

1

2

Little Women
women
zoo

2
14
Search
•  Core parameters
•  qt – query type (request handler)
•  wt – writer type (response writer)

•  Common parameters
•  q
•  sort
•  start
•  rows
•  fq – filters
•  fl – return fields
15
Search Syntax
•  field:term (*:* returns everything)
•  A score is generated at query time, the value itself doesn’t have any meaning, the
scores are relevant only when relative to each other (a scale)
•  fq can filter query based on some supplied condition
•  wt is the return type of the results (xml,json, etc.)
•  qt is the request handler used to process the request (default is “standard”)
•  fl is the list of fields to return (field must be stored)
•  q is the query string
•  You can specify the start value and maxrows

16
Search Syntax
•  field:term (*:* returns everything)
•  A score is generated at query time, the value itself
doesn’t have any meaning, the scores are relevant only
when relative to each other (a scale)
•  fq can filter query based on some supplied condition
•  wt is the return type of the results (xml,json, etc.)
•  qt is the request handler used to process the request
(default is “standard”)
•  fl is the list of fields to return (field must be stored)
•  q is the query string
•  You can specify the start value and maxrows
17
What is Lucene
•  High performance, scalable, full-text
search library
•  Focus: Indexing + Searching Documents
–  “Document” is just a list of name+value pairs

•  No crawlers or document parsing
•  Flexible Text Analysis (tokenizers + token
filters)
•  100% Java, no dependencies, no config
files
18
What is SOLR
•  Solr (pronounced "solar") is an open source
enterprise search platform from the Apache
Lucene project. Its major features include fulltext search, hit highlighting, faceted search,
dynamic clustering, database integration, and
rich document (e.g., Word, PDF) handling.
Providing distributed search and index
replication, Solr is highly scalable.[1] Solr is the
most popular enterprise search engine.[2] Solr 4
adds NoSQL features.[3]
19
What is SOLR
•  Solr (pronounced "solar") is an open source
enterprise search platform from the Apache
Lucene project. Its major features include fulltext search, hit highlighting, faceted search,
dynamic clustering, database integration, and
rich document (e.g., Word, PDF) handling.
Providing distributed search and index
replication, Solr is highly scalable.[1] Solr is the
most popular enterprise search engine.[2] Solr 4
adds NoSQL features.[3]
20
Solr Features
•  Advanced Full-Text Search Capabilities
•  Optimized for High Volume Web Traffic
•  Standards Based Open Interfaces - XML, JSON and
HTTP
•  Comprehensive HTML Administration Interfaces
•  Linearly scalable, auto index replication, auto failover
and recovery
•  Near Real-time indexing
•  Flexible and Adaptable with XML configuration
•  Extensible Plugin Architecture
21
Indexing Data
HTTP POST to http://localhost:8983/solr/update
<add><doc>
<field name=“id”>05991</field>
<field name=“name”>Peter Parker</field>
<field name=“supername”>Spider-Man</field>
<field name=“category”>superhero</field>
<field name=“powers”>agility</field>
<field name=“powers”>spider-sense</field>
</doc></add>
22
Indexing CSV data
Guru, Saurabh, Vivek, Siddhartha | Lubaib
, Venugopal|superhero, php|bangalore|benguluru,
Magneto, Mumbai|Bombay, GB|gigabytes, cm|centimeter,
Purvankara

http://localhost:8983/solr/update/csv?
fieldnames=supername,Vivek,Magento,gb
&separator=,
&f.name.split=true&f.name.separator=|
&f.powers.split=true&f.powers.separator=|
23
Data upload methods
URL=http://localhost:8983/solr/update/csv

•  HTTP POST body (curl, HttpClient, etc)
curl $URL -H 'Content-type:text/plain;
charset=utf-8' --data-binary @info.csv

•  Multi-part file upload (browsers)
•  Request parameter
?stream.body=‘Cyclops, Scott Summers,…’

•  Streaming from URL (must enable)
?stream.url=file://data/info.csv

24
Indexing with SolrJ
// Solr’s Java Client API… remote or embedded/local!
SolrServer server = new
CommonsHttpSolrServer("http://localhost:8983/solr");
SolrInputDocument doc = new SolrInputDocument();
doc.addField(”player","Dravid");
doc.addField("name",”Kumar Rahul");
doc.addField(“category",“superhero");
server.add(doc);
server.commit();

25
Deleting Documents
•  Delete by Id, most efficient
<delete>
<id>05591</id>
<id>32552</id>
</delete>
•  Delete by Query
<delete>
<query>category:supervillain</query>
</delete>
26
Commit
•  <commit/> makes changes visible
–  Triggers static cache warming in
solrconfig.xml
–  Triggers autowarming from existing caches
default on

•  <optimize/> same as commit, merges all
index segments for faster searching
_0.fnm
_0.fdt
_0.fdx
_0.frq
_0.tis
_0.tii
_0.prx
_0.nrm
_0_1.del

Lucene Index Segments
_1.fnm
_1.fdt
_1.fdx
[…]

27
Searching
http://localhost:8983/solr/select?q=powers:agility
&start=0&rows=2&fl=supername,category
<response>
<result numFound=“427" start="0">
<doc>
<str name=“supername">Spider-Man</str>
<str name=“category”>superhero</str>
</doc>
<doc>
<str name=“supername">Msytique</str>
<str name=“category”>supervillain</str>
</doc>
</result>
</response>

28
Response Format
•  Add &wt=json for JSON formatted response
{“result": {"numFound":427, "start":0,
"docs": [
{“supername”:”Spider-Man”, “category”:”superhero”},
{“supername”:” Magento”, “category”:” Purvankara”}
]
}
•  Also Python, Ruby, PHP, SerializedPHP, XSLT
29
Scoring
• 
• 
• 
• 
• 
• 

Query results are sorted by score descending
VSM – Vector Space Model
tf – term frequency: numer of matching terms in field
lengthNorm – number of tokens in field
idf – inverse document frequency
coord – coordination factor, number of matching
terms
•  document boost
•  query clause boost
http://lucene.apache.org/java/docs/scoring.html
30
Explain
http://solr/select?q=super fast&indent=on&debugQuery=on
<lst name="debug">
<lst name="explain">
<str name="id=Flash,internal_docid=6">
0.16389132 = (MATCH) product of:
0.32778263 = (MATCH) sum of:
0.32778263 = (MATCH) weight(text:fast in 6), product of:
0.5012072 = queryWeight(text:fast), product of:
2.466337 = idf(docFreq=5)
0.20321926 = queryNorm
0.65398633 = (MATCH) fieldWeight(text:fast in 6), product of:
1.4142135 = tf(termFreq(text:fast)=2)
2.466337 = idf(docFreq=5)
0.1875 = fieldNorm(field=fast, doc=6)
0.5 = coord(1/2)
</str>
<str name="id=Superman,internal_docid=7">
0.1365761 = (MATCH) product of:

31
Lucene Query Syntax
1.  justice league
•  Equiv: justice OR league
•  QueryParser default operator is “OR”/optional
2.  +justice +league –name:aquaman
•  Equiv: justice AND league NOT name:aquaman
3.  “justice league” –name:aquaman
4.  title:spiderman^10 description:spiderman
5.  description:“spiderman movie”~100

32
Lucene Query Examples2
1.  releaseDate:[2000 TO 2007]
2.  Wildcard searches: sup?r, su*r, super*
3.  spider~
• 
• 

Fuzzy search: Levenshtein distance
Optional minimum similarity: spider~0.7

4.  *:*
5.  (Superman AND “Lex Luthor”) OR
(+Batman +Joker)
33
DisMax Query Syntax
• 

Good for handling raw user queries

–  Balanced quotes for phrase query
–  ‘+’ for required, ‘-’ for prohibited
–  Separates query terms from query structure
http://solr/select?qt=dismax
&q=super man
// the user query
&qf=title^3 subject^2 body
// field to query
&pf=title^2,body
// fields to do phrase queries
&ps=100
// slop for those phrase q’s
&tie=.1
// multi-field match reward
&mm=2
// # of terms that should match
&bf=popularity
// boost function
34
DisMax Query Form
•  The expanded Lucene Query:

+( DisjunctionMaxQuery( title:super^3 |
subject:super^2 | body:super)
DisjunctionMaxQuery( title:man^3 |
subject:man^2 | body:man)
)
DisjunctionMaxQuery(title:”super man”~100^2
body:”super man”~100)
FunctionQuery(popularity)
•  Tip: set up your own request handler with default parameters
35
to avoid clients having to specify them
Function Query
•  Allows adding function of field value to score
–  Boost recently added or popular documents

•  Current parser only supports function
notation
•  Example: log(sum(popularity,1))
•  sum, product, div, log, sqrt, abs, pow
•  scale(x, target_min, target_max)
–  calculates min & max of x across all docs

•  map(x, min, max, target)
–  useful for dealing with defaults

36
Boosted Query
•  Score is multiplied instead of added
–  New local params <!...> syntax added

&q=<!boost b=sqrt(popularity)>super man
•  Parameter dereferencing in local params
&q=<!boost b=$boost v=$userq>
&boost=sqrt(popularity)
&userq=super man
37
Configuring Relevancy

<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt“/>
<filter class="solr.StopFilterFactory“
words=“stopwords.txt”/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
</analyzer>
</fieldType>
38
Field Definitions
•  Field Attributes: name, type, indexed, stored,
multiValued, omitNorms, termVectors
<field name="id“
type="string"
indexed="true" stored="true"/>
<field name="sku“
type="textTight” indexed="true" stored="true"/>
<field name="name“ type="text“
indexed="true" stored="true"/>
<field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/>
<field name=“price“
type=“sfloat“
indexed="true“ stored=“false"/>
<field name="category“ type="text_ws“ indexed="true" stored="true“
multiValued="true"/>

•  Dynamic Fields
<dynamicField name="*_i" type="sint“ indexed="true" stored="true"/>
<dynamicField name="*_s" type="string“ indexed="true" stored="true"/>
<dynamicField name="*_t" type="text“ indexed="true" stored="true"/>
39
copyField
•  Copies one field to another at index time
•  Usecase #1: Analyze same field different ways
–  copy into a field with a different analyzer
–  boost exact-case, exact-punctuation matches
–  language translations, thesaurus, soundex

<field name=“title” type=“text”/>
<field name=“title_exact” type=“text_exact”
stored=“false”/>
<copyField source=“title” dest=“title_exact”/>
•  Usecase #2: Index multiple fields into single
searchable field
40
41
42
43
Facet Query

http://solr/select?q=foo&wt=json&indent=on
&facet=true&facet.field=cat
&facet.query=price:[0 TO 100]
&facet.query=manu:IBM
{"response":{"numFound":26,"start":0,"docs":[…]},
“facet_counts":{
"facet_queries":{
"price:[0 TO 100]":6,
“manu:IBM":2},
"facet_fields":{
"cat":[ "electronics",14, "memory",3,
"card",2, "connector",2]
44
}}}
Filters
•  Filters are restrictions in addition to the query
•  Use in faceting to narrow the results
•  Filters are cached separately for speed
1. User queries for memory, query sent to solr is
&q=memory&fq=inStock:true&facet=true&…
2. User selects 1GB memory size
&q=memory&fq=inStock:true&fq=size:1GB&…
3. User selects DDR2 memory type
&q=memory&fq=inStock:true&fq=size:1GB
&fq=type:DDR2&…
45
Highlighting
http://solr/select?q=lcd&wt=json&indent=on
&hl=true&hl.fl=features
{"response":{"numFound":5,"start":0,"docs":[
{"id":"3007WFP", “price”:899.95}, …]
"highlighting":{
"3007WFP":{ "features":["30" TFT active matrix
<em>LCD</em>, 2560 x 1600”
"VA902B":{ "features":["19" TFT active matrix
<em>LCD</em>, 8ms response time, 1280 x
46
1024 native resolution"]}}}
MoreLikeThis
•  Selects documents that are “similar” to the
documents matching the main query.
&q=id:6H500F0
&mlt=true&mlt.fl=name,cat,features
"moreLikeThis":{ "6H500F0":{"numFound":
5,"start":0,
"docs”: [
{"name":"Apple 60 GB iPod with Video
Playback Black", "price":399.0,
"inStock":true, "popularity":10, […]
}, […]
]
[…]

47
High Availability

Dynamic
HTML
Generation

Appservers

HTTP search
requests

Load Balancer
Solr Searchers

Index Replication
admin queries
updates

updates
admin terminal

Updater

DB

Solr Master
48
Resources
•  WWW
–  http://lucene.apache.org/solr
–  http://lucene.apache.org/solr/tutorial.html
–  http://wiki.apache.org/solr/

•  Mailing Lists
–  solr-user-subscribe@lucene.apache.org
–  solr-dev-subscribe@lucene.apache.org

49

Más contenido relacionado

La actualidad más candente

Sem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsSem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsClark & Parsia LLC
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced editionAlexander Tokarev
 
Stardog Linked Data Catalog
Stardog Linked Data CatalogStardog Linked Data Catalog
Stardog Linked Data Catalogkendallclark
 
Integrate ManifoldCF with Solr
Integrate ManifoldCF with SolrIntegrate ManifoldCF with Solr
Integrate ManifoldCF with Solrfrancelabs
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesItamar
 
Bea con anatomy-of-web-attack
Bea con anatomy-of-web-attackBea con anatomy-of-web-attack
Bea con anatomy-of-web-attackPatrick Laverty
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Martin Necasky
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsLucidworks
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIsanandology
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSematext Group, Inc.
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchRuslan Zavacky
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchRafał Kuć
 
쉽게 이해하는 LOD
쉽게 이해하는 LOD쉽게 이해하는 LOD
쉽게 이해하는 LODMyungjin Lee
 
NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Storesandyseaborne
 
Solr introduction
Solr introductionSolr introduction
Solr introductionLap Tran
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

La actualidad más candente (20)

Sem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraintsSem tech 2010_integrity_constraints
Sem tech 2010_integrity_constraints
 
Tagging search solution design Advanced edition
Tagging search solution design Advanced editionTagging search solution design Advanced edition
Tagging search solution design Advanced edition
 
Stardog Linked Data Catalog
Stardog Linked Data CatalogStardog Linked Data Catalog
Stardog Linked Data Catalog
 
Integrate ManifoldCF with Solr
Integrate ManifoldCF with SolrIntegrate ManifoldCF with Solr
Integrate ManifoldCF with Solr
 
RDFa Tutorial
RDFa TutorialRDFa Tutorial
RDFa Tutorial
 
Practical Elasticsearch - real world use cases
Practical Elasticsearch - real world use casesPractical Elasticsearch - real world use cases
Practical Elasticsearch - real world use cases
 
Bea con anatomy-of-web-attack
Bea con anatomy-of-web-attackBea con anatomy-of-web-attack
Bea con anatomy-of-web-attack
 
Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016Linked Open Data - Masaryk University in Brno 8.11.2016
Linked Open Data - Masaryk University in Brno 8.11.2016
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
Designing RESTful APIs
Designing RESTful APIsDesigning RESTful APIs
Designing RESTful APIs
 
Side by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and SolrSide by Side with Elasticsearch and Solr
Side by Side with Elasticsearch and Solr
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
Solr vs ElasticSearch
Solr vs ElasticSearchSolr vs ElasticSearch
Solr vs ElasticSearch
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Madrid SPARQL handson
Madrid SPARQL handsonMadrid SPARQL handson
Madrid SPARQL handson
 
쉽게 이해하는 LOD
쉽게 이해하는 LOD쉽게 이해하는 LOD
쉽게 이해하는 LOD
 
NoSQL and Triple Stores
NoSQL and Triple StoresNoSQL and Triple Stores
NoSQL and Triple Stores
 
Solr introduction
Solr introductionSolr introduction
Solr introduction
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Similar a Solr search engine with multiple table relation

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesRahul Jain
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdutionXuan-Chao Huang
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Shalin Shekhar Mangar
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr WorkshopJSGB
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst AgainVarun Thacker
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 

Similar a Solr search engine with multiple table relation (20)

Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
20150210 solr introdution
20150210 solr introdution20150210 solr introdution
20150210 solr introdution
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Solr
SolrSolr
Solr
 
SOLR
SOLRSOLR
SOLR
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Solr 101
Solr 101Solr 101
Solr 101
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 

Último

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 

Último (20)

OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 

Solr search engine with multiple table relation

  • 1. Powerful Full-Text Search with Solr Jay Bharat jay@carmatec.com Carmatec It solution, Bangalore 1 July 2013 1
  • 2. An introduction to Solr Implementing search with free software 2
  • 5. What is Solr? •  Solr is an open source enterprise search server based on the Lucene Java search library. •  Solr runs in a Java servlet container such as Tomcat or Jetty •  Solr is free software and a project of the Apache Software Foundation •  Solr is a sub-project of Lucene and can be found at http://lucene.apache.org/solr/ 5
  • 6. Key Features •  Advanced Full-Text search •  Optimized for High Volume Web Traffic •  Standards Based Open Interfaces – XML and HTTP •  Comprehensive HTML Administration Interface •  Server statistics exposed over JMX for monitoring •  Scalability through efficient replication •  Flexibility with XML configuration and Plugins •  Push vs Crawl indexing method 6
  • 7. Solr Clients •  Solr can be integrated with, among others… –  Ruby –  PHP –  Java –  Python –  JSON –  Forrest/Cocoon –  C# or Deveel Solr Client or solrnet –  Coldfusion –  Drupal or apacheSolr project for Drupal 7
  • 8. Indexing •  •  •  •  Push vs Crawl Schema.xml Add documents HTML interface –  Update –  Delete –  Commit •  DataImportHandler –  For searching databases 8
  • 9. Searching •  Full text search http://localhost:8983/solr/select?q=Iraq §  Search only within a field http://localhost:8983/solr/select? q=category:news §  Control which fields are displayed in result http://localhost:8983/solr/select? q=video&fl=id,category 9 §  Provide ranges to fields
  • 10. More Searching •  Faceting information http://localhost:8983/solr/select? q=news&fl=id,description&facet=true&facet.fi eld=category §  More like this (MLT) http://localhost:8983/solr/select? q=Iraq&mlt=true&mlt.fl=headline&mlt.mindf=1 &mlt.mintf=1&fl=id,score&rows=100 •  More information on how this works and the options available can be found at http://wiki.apache.org/solr/MoreLikeThis 10
  • 11. QueryResponseWriter §  A QueryResponseWriter is a Solr Plugin that defines the response format for any request §  All of the requests we have made so far are formatted with the XMLResponseWriter §  Other formats can be applied by appending wt=format to the search string like this: http://localhost:8983/solr/select?q=date: 11
  • 12. Acknowledgements •  Search smarter with Apache Solr, Part 1: Essential features and the Solr schema –  http://www.ibm.com/developerworks/java/ library/j-solr1/ •  Solr Tutorial from Lucid Imagination –  http://www.lucidimagination.com/Community/ Hear-from-the-Experts/Podcasts-and-Videos/ Solr-Tutorial •  Solr Wiki –  http://wiki.apache.org/solr/ 12
  • 13. Powered by Lucene •  Wikipedia •  Internet Archive •  LinkedIn •  monster.com 13
  • 14. Indexing aardvark 0 Little Red Riding Hood hood 0 1 little 0 2 1 Robin Hood red 0 riding 0 robin 1 2 Little Women women zoo 2 14
  • 15. Search •  Core parameters •  qt – query type (request handler) •  wt – writer type (response writer) •  Common parameters •  q •  sort •  start •  rows •  fq – filters •  fl – return fields 15
  • 16. Search Syntax •  field:term (*:* returns everything) •  A score is generated at query time, the value itself doesn’t have any meaning, the scores are relevant only when relative to each other (a scale) •  fq can filter query based on some supplied condition •  wt is the return type of the results (xml,json, etc.) •  qt is the request handler used to process the request (default is “standard”) •  fl is the list of fields to return (field must be stored) •  q is the query string •  You can specify the start value and maxrows 16
  • 17. Search Syntax •  field:term (*:* returns everything) •  A score is generated at query time, the value itself doesn’t have any meaning, the scores are relevant only when relative to each other (a scale) •  fq can filter query based on some supplied condition •  wt is the return type of the results (xml,json, etc.) •  qt is the request handler used to process the request (default is “standard”) •  fl is the list of fields to return (field must be stored) •  q is the query string •  You can specify the start value and maxrows 17
  • 18. What is Lucene •  High performance, scalable, full-text search library •  Focus: Indexing + Searching Documents –  “Document” is just a list of name+value pairs •  No crawlers or document parsing •  Flexible Text Analysis (tokenizers + token filters) •  100% Java, no dependencies, no config files 18
  • 19. What is SOLR •  Solr (pronounced "solar") is an open source enterprise search platform from the Apache Lucene project. Its major features include fulltext search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable.[1] Solr is the most popular enterprise search engine.[2] Solr 4 adds NoSQL features.[3] 19
  • 20. What is SOLR •  Solr (pronounced "solar") is an open source enterprise search platform from the Apache Lucene project. Its major features include fulltext search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Providing distributed search and index replication, Solr is highly scalable.[1] Solr is the most popular enterprise search engine.[2] Solr 4 adds NoSQL features.[3] 20
  • 21. Solr Features •  Advanced Full-Text Search Capabilities •  Optimized for High Volume Web Traffic •  Standards Based Open Interfaces - XML, JSON and HTTP •  Comprehensive HTML Administration Interfaces •  Linearly scalable, auto index replication, auto failover and recovery •  Near Real-time indexing •  Flexible and Adaptable with XML configuration •  Extensible Plugin Architecture 21
  • 22. Indexing Data HTTP POST to http://localhost:8983/solr/update <add><doc> <field name=“id”>05991</field> <field name=“name”>Peter Parker</field> <field name=“supername”>Spider-Man</field> <field name=“category”>superhero</field> <field name=“powers”>agility</field> <field name=“powers”>spider-sense</field> </doc></add> 22
  • 23. Indexing CSV data Guru, Saurabh, Vivek, Siddhartha | Lubaib , Venugopal|superhero, php|bangalore|benguluru, Magneto, Mumbai|Bombay, GB|gigabytes, cm|centimeter, Purvankara http://localhost:8983/solr/update/csv? fieldnames=supername,Vivek,Magento,gb &separator=, &f.name.split=true&f.name.separator=| &f.powers.split=true&f.powers.separator=| 23
  • 24. Data upload methods URL=http://localhost:8983/solr/update/csv •  HTTP POST body (curl, HttpClient, etc) curl $URL -H 'Content-type:text/plain; charset=utf-8' --data-binary @info.csv •  Multi-part file upload (browsers) •  Request parameter ?stream.body=‘Cyclops, Scott Summers,…’ •  Streaming from URL (must enable) ?stream.url=file://data/info.csv 24
  • 25. Indexing with SolrJ // Solr’s Java Client API… remote or embedded/local! SolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr"); SolrInputDocument doc = new SolrInputDocument(); doc.addField(”player","Dravid"); doc.addField("name",”Kumar Rahul"); doc.addField(“category",“superhero"); server.add(doc); server.commit(); 25
  • 26. Deleting Documents •  Delete by Id, most efficient <delete> <id>05591</id> <id>32552</id> </delete> •  Delete by Query <delete> <query>category:supervillain</query> </delete> 26
  • 27. Commit •  <commit/> makes changes visible –  Triggers static cache warming in solrconfig.xml –  Triggers autowarming from existing caches default on •  <optimize/> same as commit, merges all index segments for faster searching _0.fnm _0.fdt _0.fdx _0.frq _0.tis _0.tii _0.prx _0.nrm _0_1.del Lucene Index Segments _1.fnm _1.fdt _1.fdx […] 27
  • 28. Searching http://localhost:8983/solr/select?q=powers:agility &start=0&rows=2&fl=supername,category <response> <result numFound=“427" start="0"> <doc> <str name=“supername">Spider-Man</str> <str name=“category”>superhero</str> </doc> <doc> <str name=“supername">Msytique</str> <str name=“category”>supervillain</str> </doc> </result> </response> 28
  • 29. Response Format •  Add &wt=json for JSON formatted response {“result": {"numFound":427, "start":0, "docs": [ {“supername”:”Spider-Man”, “category”:”superhero”}, {“supername”:” Magento”, “category”:” Purvankara”} ] } •  Also Python, Ruby, PHP, SerializedPHP, XSLT 29
  • 30. Scoring •  •  •  •  •  •  Query results are sorted by score descending VSM – Vector Space Model tf – term frequency: numer of matching terms in field lengthNorm – number of tokens in field idf – inverse document frequency coord – coordination factor, number of matching terms •  document boost •  query clause boost http://lucene.apache.org/java/docs/scoring.html 30
  • 31. Explain http://solr/select?q=super fast&indent=on&debugQuery=on <lst name="debug"> <lst name="explain"> <str name="id=Flash,internal_docid=6"> 0.16389132 = (MATCH) product of: 0.32778263 = (MATCH) sum of: 0.32778263 = (MATCH) weight(text:fast in 6), product of: 0.5012072 = queryWeight(text:fast), product of: 2.466337 = idf(docFreq=5) 0.20321926 = queryNorm 0.65398633 = (MATCH) fieldWeight(text:fast in 6), product of: 1.4142135 = tf(termFreq(text:fast)=2) 2.466337 = idf(docFreq=5) 0.1875 = fieldNorm(field=fast, doc=6) 0.5 = coord(1/2) </str> <str name="id=Superman,internal_docid=7"> 0.1365761 = (MATCH) product of: 31
  • 32. Lucene Query Syntax 1.  justice league •  Equiv: justice OR league •  QueryParser default operator is “OR”/optional 2.  +justice +league –name:aquaman •  Equiv: justice AND league NOT name:aquaman 3.  “justice league” –name:aquaman 4.  title:spiderman^10 description:spiderman 5.  description:“spiderman movie”~100 32
  • 33. Lucene Query Examples2 1.  releaseDate:[2000 TO 2007] 2.  Wildcard searches: sup?r, su*r, super* 3.  spider~ •  •  Fuzzy search: Levenshtein distance Optional minimum similarity: spider~0.7 4.  *:* 5.  (Superman AND “Lex Luthor”) OR (+Batman +Joker) 33
  • 34. DisMax Query Syntax •  Good for handling raw user queries –  Balanced quotes for phrase query –  ‘+’ for required, ‘-’ for prohibited –  Separates query terms from query structure http://solr/select?qt=dismax &q=super man // the user query &qf=title^3 subject^2 body // field to query &pf=title^2,body // fields to do phrase queries &ps=100 // slop for those phrase q’s &tie=.1 // multi-field match reward &mm=2 // # of terms that should match &bf=popularity // boost function 34
  • 35. DisMax Query Form •  The expanded Lucene Query: +( DisjunctionMaxQuery( title:super^3 | subject:super^2 | body:super) DisjunctionMaxQuery( title:man^3 | subject:man^2 | body:man) ) DisjunctionMaxQuery(title:”super man”~100^2 body:”super man”~100) FunctionQuery(popularity) •  Tip: set up your own request handler with default parameters 35 to avoid clients having to specify them
  • 36. Function Query •  Allows adding function of field value to score –  Boost recently added or popular documents •  Current parser only supports function notation •  Example: log(sum(popularity,1)) •  sum, product, div, log, sqrt, abs, pow •  scale(x, target_min, target_max) –  calculates min & max of x across all docs •  map(x, min, max, target) –  useful for dealing with defaults 36
  • 37. Boosted Query •  Score is multiplied instead of added –  New local params <!...> syntax added &q=<!boost b=sqrt(popularity)>super man •  Parameter dereferencing in local params &q=<!boost b=$boost v=$userq> &boost=sqrt(popularity) &userq=super man 37
  • 38. Configuring Relevancy <fieldType name="text" class="solr.TextField"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt“/> <filter class="solr.StopFilterFactory“ words=“stopwords.txt”/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> </analyzer> </fieldType> 38
  • 39. Field Definitions •  Field Attributes: name, type, indexed, stored, multiValued, omitNorms, termVectors <field name="id“ type="string" indexed="true" stored="true"/> <field name="sku“ type="textTight” indexed="true" stored="true"/> <field name="name“ type="text“ indexed="true" stored="true"/> <field name=“inStock“ type=“boolean“ indexed="true“ stored=“false"/> <field name=“price“ type=“sfloat“ indexed="true“ stored=“false"/> <field name="category“ type="text_ws“ indexed="true" stored="true“ multiValued="true"/> •  Dynamic Fields <dynamicField name="*_i" type="sint“ indexed="true" stored="true"/> <dynamicField name="*_s" type="string“ indexed="true" stored="true"/> <dynamicField name="*_t" type="text“ indexed="true" stored="true"/> 39
  • 40. copyField •  Copies one field to another at index time •  Usecase #1: Analyze same field different ways –  copy into a field with a different analyzer –  boost exact-case, exact-punctuation matches –  language translations, thesaurus, soundex <field name=“title” type=“text”/> <field name=“title_exact” type=“text_exact” stored=“false”/> <copyField source=“title” dest=“title_exact”/> •  Usecase #2: Index multiple fields into single searchable field 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 44. Facet Query http://solr/select?q=foo&wt=json&indent=on &facet=true&facet.field=cat &facet.query=price:[0 TO 100] &facet.query=manu:IBM {"response":{"numFound":26,"start":0,"docs":[…]}, “facet_counts":{ "facet_queries":{ "price:[0 TO 100]":6, “manu:IBM":2}, "facet_fields":{ "cat":[ "electronics",14, "memory",3, "card",2, "connector",2] 44 }}}
  • 45. Filters •  Filters are restrictions in addition to the query •  Use in faceting to narrow the results •  Filters are cached separately for speed 1. User queries for memory, query sent to solr is &q=memory&fq=inStock:true&facet=true&… 2. User selects 1GB memory size &q=memory&fq=inStock:true&fq=size:1GB&… 3. User selects DDR2 memory type &q=memory&fq=inStock:true&fq=size:1GB &fq=type:DDR2&… 45
  • 46. Highlighting http://solr/select?q=lcd&wt=json&indent=on &hl=true&hl.fl=features {"response":{"numFound":5,"start":0,"docs":[ {"id":"3007WFP", “price”:899.95}, …] "highlighting":{ "3007WFP":{ "features":["30" TFT active matrix <em>LCD</em>, 2560 x 1600” "VA902B":{ "features":["19" TFT active matrix <em>LCD</em>, 8ms response time, 1280 x 46 1024 native resolution"]}}}
  • 47. MoreLikeThis •  Selects documents that are “similar” to the documents matching the main query. &q=id:6H500F0 &mlt=true&mlt.fl=name,cat,features "moreLikeThis":{ "6H500F0":{"numFound": 5,"start":0, "docs”: [ {"name":"Apple 60 GB iPod with Video Playback Black", "price":399.0, "inStock":true, "popularity":10, […] }, […] ] […] 47
  • 48. High Availability Dynamic HTML Generation Appservers HTTP search requests Load Balancer Solr Searchers Index Replication admin queries updates updates admin terminal Updater DB Solr Master 48
  • 49. Resources •  WWW –  http://lucene.apache.org/solr –  http://lucene.apache.org/solr/tutorial.html –  http://wiki.apache.org/solr/ •  Mailing Lists –  solr-user-subscribe@lucene.apache.org –  solr-dev-subscribe@lucene.apache.org 49