Here you can learn how to use solr search engine and implement in your application like in PHP/MYSQL.
I am introducing how to handle multiple table data handling in SOLR.
5. What is Solr?
• Solr is an open source enterprise search
server based on the Lucene Java search
library.
• Solr runs in a Java servlet container such
as Tomcat or Jetty
• Solr is free software and a project of the
Apache Software Foundation
• Solr is a sub-project of Lucene and can be
found at http://lucene.apache.org/solr/
5
6. Key Features
• Advanced Full-Text search
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces – XML and
HTTP
• Comprehensive HTML Administration Interface
• Server statistics exposed over JMX for monitoring
• Scalability through efficient replication
• Flexibility with XML configuration and Plugins
• Push vs Crawl indexing method
6
7. Solr Clients
• Solr can be integrated with, among others…
– Ruby
– PHP
– Java
– Python
– JSON
– Forrest/Cocoon
– C# or Deveel Solr Client or solrnet
– Coldfusion
– Drupal or apacheSolr project for Drupal
7
9. Searching
• Full text search
http://localhost:8983/solr/select?q=Iraq
§ Search only within a field
http://localhost:8983/solr/select?
q=category:news
§ Control which fields are displayed in result
http://localhost:8983/solr/select?
q=video&fl=id,category
9
§ Provide ranges to fields
10. More Searching
• Faceting information
http://localhost:8983/solr/select?
q=news&fl=id,description&facet=true&facet.fi
eld=category
§ More like this (MLT)
http://localhost:8983/solr/select?
q=Iraq&mlt=true&mlt.fl=headline&mlt.mindf=1
&mlt.mintf=1&fl=id,score&rows=100
• More information on how this works and the
options available can be found at
http://wiki.apache.org/solr/MoreLikeThis
10
11. QueryResponseWriter
§ A QueryResponseWriter is a Solr Plugin
that defines the response format for any
request
§ All of the requests we have made so far
are formatted with the
XMLResponseWriter
§ Other formats can be applied by
appending wt=format to the search string
like this:
http://localhost:8983/solr/select?q=date:
11
12. Acknowledgements
• Search smarter with Apache Solr, Part 1:
Essential features and the Solr schema
– http://www.ibm.com/developerworks/java/
library/j-solr1/
• Solr Tutorial from Lucid Imagination
– http://www.lucidimagination.com/Community/
Hear-from-the-Experts/Podcasts-and-Videos/
Solr-Tutorial
• Solr Wiki
– http://wiki.apache.org/solr/
12
13. Powered by Lucene
• Wikipedia
• Internet Archive
• LinkedIn
• monster.com
13
16. Search Syntax
• field:term (*:* returns everything)
• A score is generated at query time, the value itself doesn’t have any meaning, the
scores are relevant only when relative to each other (a scale)
• fq can filter query based on some supplied condition
• wt is the return type of the results (xml,json, etc.)
• qt is the request handler used to process the request (default is “standard”)
• fl is the list of fields to return (field must be stored)
• q is the query string
• You can specify the start value and maxrows
16
17. Search Syntax
• field:term (*:* returns everything)
• A score is generated at query time, the value itself
doesn’t have any meaning, the scores are relevant only
when relative to each other (a scale)
• fq can filter query based on some supplied condition
• wt is the return type of the results (xml,json, etc.)
• qt is the request handler used to process the request
(default is “standard”)
• fl is the list of fields to return (field must be stored)
• q is the query string
• You can specify the start value and maxrows
17
18. What is Lucene
• High performance, scalable, full-text
search library
• Focus: Indexing + Searching Documents
– “Document” is just a list of name+value pairs
• No crawlers or document parsing
• Flexible Text Analysis (tokenizers + token
filters)
• 100% Java, no dependencies, no config
files
18
19. What is SOLR
• Solr (pronounced "solar") is an open source
enterprise search platform from the Apache
Lucene project. Its major features include fulltext search, hit highlighting, faceted search,
dynamic clustering, database integration, and
rich document (e.g., Word, PDF) handling.
Providing distributed search and index
replication, Solr is highly scalable.[1] Solr is the
most popular enterprise search engine.[2] Solr 4
adds NoSQL features.[3]
19
20. What is SOLR
• Solr (pronounced "solar") is an open source
enterprise search platform from the Apache
Lucene project. Its major features include fulltext search, hit highlighting, faceted search,
dynamic clustering, database integration, and
rich document (e.g., Word, PDF) handling.
Providing distributed search and index
replication, Solr is highly scalable.[1] Solr is the
most popular enterprise search engine.[2] Solr 4
adds NoSQL features.[3]
20
21. Solr Features
• Advanced Full-Text Search Capabilities
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces - XML, JSON and
HTTP
• Comprehensive HTML Administration Interfaces
• Linearly scalable, auto index replication, auto failover
and recovery
• Near Real-time indexing
• Flexible and Adaptable with XML configuration
• Extensible Plugin Architecture
21
22. Indexing Data
HTTP POST to http://localhost:8983/solr/update
<add><doc>
<field name=“id”>05991</field>
<field name=“name”>Peter Parker</field>
<field name=“supername”>Spider-Man</field>
<field name=“category”>superhero</field>
<field name=“powers”>agility</field>
<field name=“powers”>spider-sense</field>
</doc></add>
22
30. Scoring
•
•
•
•
•
•
Query results are sorted by score descending
VSM – Vector Space Model
tf – term frequency: numer of matching terms in field
lengthNorm – number of tokens in field
idf – inverse document frequency
coord – coordination factor, number of matching
terms
• document boost
• query clause boost
http://lucene.apache.org/java/docs/scoring.html
30
34. DisMax Query Syntax
•
Good for handling raw user queries
– Balanced quotes for phrase query
– ‘+’ for required, ‘-’ for prohibited
– Separates query terms from query structure
http://solr/select?qt=dismax
&q=super man
// the user query
&qf=title^3 subject^2 body
// field to query
&pf=title^2,body
// fields to do phrase queries
&ps=100
// slop for those phrase q’s
&tie=.1
// multi-field match reward
&mm=2
// # of terms that should match
&bf=popularity
// boost function
34
35. DisMax Query Form
• The expanded Lucene Query:
+( DisjunctionMaxQuery( title:super^3 |
subject:super^2 | body:super)
DisjunctionMaxQuery( title:man^3 |
subject:man^2 | body:man)
)
DisjunctionMaxQuery(title:”super man”~100^2
body:”super man”~100)
FunctionQuery(popularity)
• Tip: set up your own request handler with default parameters
35
to avoid clients having to specify them
36. Function Query
• Allows adding function of field value to score
– Boost recently added or popular documents
• Current parser only supports function
notation
• Example: log(sum(popularity,1))
• sum, product, div, log, sqrt, abs, pow
• scale(x, target_min, target_max)
– calculates min & max of x across all docs
• map(x, min, max, target)
– useful for dealing with defaults
36
37. Boosted Query
• Score is multiplied instead of added
– New local params <!...> syntax added
&q=<!boost b=sqrt(popularity)>super man
• Parameter dereferencing in local params
&q=<!boost b=$boost v=$userq>
&boost=sqrt(popularity)
&userq=super man
37
40. copyField
• Copies one field to another at index time
• Usecase #1: Analyze same field different ways
– copy into a field with a different analyzer
– boost exact-case, exact-punctuation matches
– language translations, thesaurus, soundex
<field name=“title” type=“text”/>
<field name=“title_exact” type=“text_exact”
stored=“false”/>
<copyField source=“title” dest=“title_exact”/>
• Usecase #2: Index multiple fields into single
searchable field
40
45. Filters
• Filters are restrictions in addition to the query
• Use in faceting to narrow the results
• Filters are cached separately for speed
1. User queries for memory, query sent to solr is
&q=memory&fq=inStock:true&facet=true&…
2. User selects 1GB memory size
&q=memory&fq=inStock:true&fq=size:1GB&…
3. User selects DDR2 memory type
&q=memory&fq=inStock:true&fq=size:1GB
&fq=type:DDR2&…
45
47. MoreLikeThis
• Selects documents that are “similar” to the
documents matching the main query.
&q=id:6H500F0
&mlt=true&mlt.fl=name,cat,features
"moreLikeThis":{ "6H500F0":{"numFound":
5,"start":0,
"docs”: [
{"name":"Apple 60 GB iPod with Video
Playback Black", "price":399.0,
"inStock":true, "popularity":10, […]
}, […]
]
[…]
47