Does your website have a ton of data? How do your users find the relevant pages among all the noise in your site?
Solr can help deliver the pertinent search results to your users regardless of your site's size.
Apache Solr is a Java program that integrates with the Drupal contrib module that allows your users to quickly search millions of records and narrow down the results with minimal system impact.
7. What is Search?
Search (v): to go or look through (a place, area, etc.)
carefully in order to find something missing or lost: I
searched the desk for the letter.
Source: http://dictionary.reference.com/browse/search
@Mediacurren
t
8. Why Users Search
• Navigation doesn't make sense
• It can be faster
• Lots of data
• Frequent data changes
• Might just be looking for something
@Mediacurren
t
11. History
Solr was initially created in 2004 as an in-house
project for CNET. It was open sourced in 2006 and
donated to the Apache Software Foundation.
@Mediacurren
t
12. Lucene
• Solr is a layer on top of Lucene
• Lucene is a library
• Solr stores files in Lucene format
*http://wiki.apache.org/solr/SolrPerformanceData
@Mediacurren
t
15. Speed
• Important!
• It scales well
• No database required
• Clustering & Sharding
• Netflix runs 1.2MM q/day on 4 servers*
*http://wiki.apache.org/solr/SolrPerformanceData
@Mediacurren
t
16. Natural Results
• Stemming: Blogging vs. Blog
• Stop Word Removal: The
• Synonyms: Tissue vs Kleenex
• Highly Configurable
@Mediacurren
t
17. Drupal Search
• Not stemmed by default
• Queries the database
• Stores tokenized words in a single large
table
• Much slower to index
@Mediacurren
t
20. Ordering
• Score
• Comes from Lucene
• Not "out of 100"
• Bigger score first
More Info: http://lucene.apache.org/core/3_6_1/scoring.html
???
201
200
199
184
@Mediacurren
t
21. Facets
• Users do the work
• Fixes too much data
• Native to Solr
• Requires the Facet API
module
• Shopping Sites
@Mediacurren
t
23. Index?
• Index contains Documents
• Documents have Fields
• Fields have Terms
• ~2 minutes for updates
• Uses Lucene syntax
@Mediacurren
t
24. Tokenizing
• Splits words and numbers
"this" "is" "blogging"
• Excludes Stopwords
"this" "blogging"
• Handles Stemming (if enabled)
"this" "blog"
• Very configurable
@Mediacurren
t
25. Bias
• Adjusts the order of search results
• Works on: Content Type, Fields,
Comments, Promoted to Home Page and
more
• Can be dynamic with custom modules.
@Mediacurren
t
27. Modules
• Apache Solr (apachesolr)
• Facet API (facetapi)
• Chaos tool suite (ctools)
@Mediacurren
t
28. Overall
• Search is becoming more and more
important
• You want to control your search results
• If you don't provide a good search
experience, somebody else will.
• Solr doesn't have to be complex.
• Solr is fast and scales.
@Mediacurren
t
In this example Walmart found that conversion rates were directly affected by site load times. While this example is for sites it still applies to search.