SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Scaling Solr 4 to Power Big Search in Social Media
Analytics
Timothy Potter
Architect, Big Data Analytics, Dachis Group / Co-author Solr In Action
® 2011 Dachis Group.
dachisgroup.com
• Anyone running SolrCloud in
production today?
• Who is running pre-Solr 4 version in
production?
• Who has fired up Solr 4.x in SolrCloud
mode?
• Personal interest – who was
purchased Solr in Action in MEAP?
Audience poll
® 2011 Dachis Group.
dachisgroup.com
• Gain insights into the key design decisions you need
to make when using Solr cloud
Wish I knew back then ...
• Solr 4 feature overview in context
• Zookeeper
• Distributed indexing
• Distributed search
• Real-time GET
• Atomic updates
• A day in the life ...
• Day-to-day operations
• What happens if you lose a node?
Goals of this talk
® 2011 Dachis Group.
dachisgroup.com
Our business intelligence platform analyzes relationships, behaviors, and
conversations between 30,000 brands and 100M social accounts every 15 minutes.
About Dachis Group
® 2011 Dachis Group.
dachisgroup.com
® 2011 Dachis Group.
dachisgroup.com
• In production on 4.2.0
• 18 shards ~ 33M docs / shard, 25GB on disk per shard
• Multiple collections
• ~620 Million docs in main collection (still growing)
• ~100 Million docs in 30-day collection
• Inherent Parent / Child relationships (tweet and re-tweets)
• ~5M atomic updates to existing docs per day
• Batch-oriented updates
• Docs come in bursts from Hadoop; 8,000 docs/sec
• 3-4M new documents per day (deletes too)
• Business Intelligence UI, low(ish) query volume
Solution Highlights
® 2011 Dachis Group.
dachisgroup.com
• Scalability
Scale-out: sharding and replication
A little scale-up too: Fast disks (SSD), lots of RAM!
• High-availability
Redundancy: multiple replicas per shard
Automated fail-over: automated leader election
• Consistency
Distributed queries must return consistent results
Accepted writes must be on durable storage
• Simplicity - wip
Self-healing, easy to setup and maintain,
able to troubleshoot
• Elasticity - wip
Add more replicas per shard at any time
Split large shards into two smaller ones
Pillars of my ideal search solution
® 2011 Dachis Group.
dachisgroup.com
Nuts and Bolts
Nice tag cloud wordle.net!
® 2011 Dachis Group.
dachisgroup.com
1. Zookeeper needs at least 3 nodes to establish quorum with fault
tolerance. Embedded is only for evaluation purposes, you need to
deploy a stand-alone ensemble for production
2. Every Solr core creates ephemeral “znodes” in Zookeeper which
automatically disappear if the Solr process crashes
3. Zookeeper pushes notifications to all registered “watchers” when a
znode changes; Solr caches cluster state
1. Zookeeper provides “recipes” for solving common problems faced
when building distributed systems, e.g. leader election
2. Zookeeper provides centralized configuration distribution, leader
election, and cluster state notifications
Zookeeper in a nutshell
® 2011 Dachis Group.
dachisgroup.com
• Number and size of indexed fields
• Number of documents
• Update frequency
• Query complexity
• Expected growth
• Budget
Number of shards?
Yay for shard splitting in 4.3 (SOLR-3755)!
® 2011 Dachis Group.
dachisgroup.com
We use Uwe Schindler’s advice on 64-bit Linux:
<directoryFactory name="DirectoryFactory"
class="${solr.directoryFactory:solr.MMapDirectoryFactory}"/>
See: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
java -Xmx4g ...
(hint: rest of our RAM goes to the OS to load index in memory mapped I/O)
Small cache sizes with aggressive eviction – spread GC penalty out over time vs. all at once every time
you open a new searcher
<filterCache class="solr.LFUCache" size="50"
initialSize="50" autowarmCount="25"/>
Index Memory Management
® 2011 Dachis Group.
dachisgroup.com
• Not a master
• Leader is a replica (handles queries)
• Accepts update requests for the shard
• Increments the _version_ on the new or
updated doc
• Sends updates (in parallel) to all
replicas
Leader = Replica + Addl’ Work
® 2011 Dachis Group.
dachisgroup.com
Don’t let your tlog’s get too big – use “hard” commits with openSearcher=“false”
Distributed Indexing
View of cluster state from Zk
Shard 1
Leader
Node 1 Node 2
Shard 2
Leader
Shard 2
Replica
Shard 1
Replica
Zookeeper
CloudSolrServer
“smart client”
Hash on docID
1
2
3
Set the _version_
tlogtlog
Get URLs of current leaders?
4
5
2 shards with 1 replica each
<autoCommit>
<maxDocs>10000</maxDocs>
<maxTime>60000</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
8,000 docs / sec
to 18 shards
® 2011 Dachis Group.
dachisgroup.com
Send query request to any node
Two-stage process
1. Query controller sends query to all
shards and merges results
One host per shard must be online
or queries fail
2. Query controller sends 2nd query to
all shards with documents in the
merged result set to get requested
fields
Solr client applications built for 3.x do
not need to change (our query code still
uses SolrJ 3.6)
Limitations
JOINs / Grouping need custom hashing
Distributed search
View of cluster state from Zk
Shard 1
Leader
Node 1 Node 2
Shard 2
Leader
Shard 2
Replica
Shard 1
Replica
Zookeeper
CloudSolrServer
1
3
q=*:*
Get URLs of all live nodes
4
2
Query controller
Or just a load balancer works too
get fields
® 2011 Dachis Group.
dachisgroup.com
Search by daily activity volume
Drive analysis
that measures
the impact of
a social message
over time ...
Company posts
a tweet on Monday,
how much activity
around that message
on Thursday?
® 2011 Dachis Group.
dachisgroup.com
Problem: Find all documents that had activity on a specific day
• tweets that had retweets or YouTube videos that had comments
• Use Solr join support to find parent documents by matching on child criteria
fq=_val_:"{!join from=echo_grouping_id_s to=id}day_tdt:[2013-05-01T00:00:00Z
TO 2013-05-02T00:00:00Z}" ...
... But, joins don’t work in distributed queries and is probably too slow anyway
Solution: Index daily activity into multi-valued fields. Use real-time GET to lookup
document by ID to get the current daily volume fields
fq:daily_volume_tdtm('2013-05-02’)
sort=daily_vol(daily_volume_s,'2013-04-01','2013-05-01')+desc
daily_volume_tdtm: [2013-05-01, 2013-05-02] <= doc has child signals on May 1 and 2
daily_volume_ssm: 2013-05-01|99, 2013-05-02|88 <= stored only field, doc had 99 child signals on May 1, 88 on May 2
daily_volume_s: 13050288|13050199 <= flattened multi-valued field for sorting using a custom ValueSource
Atomic updates and real-time get
® 2011 Dachis Group.
dachisgroup.com
Will it work? Definitely!
Search can be addicting to your organization, queries we
tested for 6 months ago vs. what we have today are vastly
different
Buy RAM – OOMs and aggressive garbage collection
cause many issues
Give RAM from ^ to the OS – MMapDirectory
Need a disaster recovery process in addition to Solr cloud
replication; helps with migrating to new hardware too
Use Jetty ;-)
Store all fields! Atomic updates are a life saver
Lessons learned
® 2011 Dachis Group.
dachisgroup.com
Schema will evolve – we thought we understood our data model but have since
added at least 10 new fields and deprecated some too
Partition if you can! e.g. 30-day collection
We don't optimize – segment merging works great
Size your staging environment so that shards have about as many docs and same
resources as prod. I have many more nodes in prod but my staging servers have
roughly the same number of docs per shard, just fewer shards.
Don’t be afraid to customize Solr! It’s designed to be customized with plug-ins
• ValueSource is very powerful
• Check out PostFilters:
{!frange l=1 u=1 cost=200 cache=false}imca(53313,employee)
Lessons learned cont.
® 2011 Dachis Group.
dachisgroup.com
• Backups
.../replication?command=backup&location=/mnt/backups
• Monitoring
Replicas serving queries?
All replicas report same number of docs?
Zookeeper health
New search warm-up time
• Configuration update process
Our solrconfig.xml changes frequently – see Solr’s zkCli.sh
• Upgrade Solr process (it’s moving fast right now)
• Recover failed replica process
• Add new replica
• Kill the JVM on OOM (from Mark Miller)
-XX:OnOutOfMemoryError=/home/solr/on_oom.sh
-XX:+HeapDumpOnOutOfMemoryError
Minimum DevOps Reqts
® 2011 Dachis Group.
dachisgroup.com
Nodes will crash! (ephemeral znodes)
Or, sometimes you just need to restart a
JVM (rolling restarts to upgrade)
Peer sync via update log (tlog)
100 updates else ...
Good ol’ Solr replication from leader to
replica
Node recovery
® 2011 Dachis Group.
dachisgroup.com
• Moving to a near real-time streaming model using Storm
• Buying more RAM per node
• Looking forward to shard splitting as it has
become difficult to re-index 600M docs
• Re-building the index with DocValues
• We've had shards get out of sync after major failure –
resolved it by going back to raw data and doing a key by key
comparison of what we expected to be in the index and re-indexing
any missing docs.
• Custom hashing to put all docs for a specific brand in the same
shard
Roadmap / Futures
® 2011 Dachis Group.
dachisgroup.com
If you find yourself in this
situation, buy more RAM!
Obligatory lolcats slide
CONTACT
Timothy Potter
thelabdude@gmail.com
twitter: @thelabdude

Más contenido relacionado

Más de lucenerevolution

Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
lucenerevolution
 

Más de lucenerevolution (20)

Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...A Novel methodology for handling Document Level Security in Search Based Appl...
A Novel methodology for handling Document Level Security in Search Based Appl...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Query Latency Optimization with Lucene
Query Latency Optimization with LuceneQuery Latency Optimization with Lucene
Query Latency Optimization with Lucene
 
10 keys to Solr's Future
10 keys to Solr's Future10 keys to Solr's Future
10 keys to Solr's Future
 
Large Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and FriendsLarge Scale Crawling with Apache Nutch and Friends
Large Scale Crawling with Apache Nutch and Friends
 

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 

Scaling up solr 4.1 to power big search in social media analytics

  • 1. Scaling Solr 4 to Power Big Search in Social Media Analytics Timothy Potter Architect, Big Data Analytics, Dachis Group / Co-author Solr In Action
  • 2. ® 2011 Dachis Group. dachisgroup.com • Anyone running SolrCloud in production today? • Who is running pre-Solr 4 version in production? • Who has fired up Solr 4.x in SolrCloud mode? • Personal interest – who was purchased Solr in Action in MEAP? Audience poll
  • 3. ® 2011 Dachis Group. dachisgroup.com • Gain insights into the key design decisions you need to make when using Solr cloud Wish I knew back then ... • Solr 4 feature overview in context • Zookeeper • Distributed indexing • Distributed search • Real-time GET • Atomic updates • A day in the life ... • Day-to-day operations • What happens if you lose a node? Goals of this talk
  • 4. ® 2011 Dachis Group. dachisgroup.com Our business intelligence platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. About Dachis Group
  • 5. ® 2011 Dachis Group. dachisgroup.com
  • 6. ® 2011 Dachis Group. dachisgroup.com • In production on 4.2.0 • 18 shards ~ 33M docs / shard, 25GB on disk per shard • Multiple collections • ~620 Million docs in main collection (still growing) • ~100 Million docs in 30-day collection • Inherent Parent / Child relationships (tweet and re-tweets) • ~5M atomic updates to existing docs per day • Batch-oriented updates • Docs come in bursts from Hadoop; 8,000 docs/sec • 3-4M new documents per day (deletes too) • Business Intelligence UI, low(ish) query volume Solution Highlights
  • 7. ® 2011 Dachis Group. dachisgroup.com • Scalability Scale-out: sharding and replication A little scale-up too: Fast disks (SSD), lots of RAM! • High-availability Redundancy: multiple replicas per shard Automated fail-over: automated leader election • Consistency Distributed queries must return consistent results Accepted writes must be on durable storage • Simplicity - wip Self-healing, easy to setup and maintain, able to troubleshoot • Elasticity - wip Add more replicas per shard at any time Split large shards into two smaller ones Pillars of my ideal search solution
  • 8. ® 2011 Dachis Group. dachisgroup.com Nuts and Bolts Nice tag cloud wordle.net!
  • 9. ® 2011 Dachis Group. dachisgroup.com 1. Zookeeper needs at least 3 nodes to establish quorum with fault tolerance. Embedded is only for evaluation purposes, you need to deploy a stand-alone ensemble for production 2. Every Solr core creates ephemeral “znodes” in Zookeeper which automatically disappear if the Solr process crashes 3. Zookeeper pushes notifications to all registered “watchers” when a znode changes; Solr caches cluster state 1. Zookeeper provides “recipes” for solving common problems faced when building distributed systems, e.g. leader election 2. Zookeeper provides centralized configuration distribution, leader election, and cluster state notifications Zookeeper in a nutshell
  • 10. ® 2011 Dachis Group. dachisgroup.com • Number and size of indexed fields • Number of documents • Update frequency • Query complexity • Expected growth • Budget Number of shards? Yay for shard splitting in 4.3 (SOLR-3755)!
  • 11. ® 2011 Dachis Group. dachisgroup.com We use Uwe Schindler’s advice on 64-bit Linux: <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.MMapDirectoryFactory}"/> See: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html java -Xmx4g ... (hint: rest of our RAM goes to the OS to load index in memory mapped I/O) Small cache sizes with aggressive eviction – spread GC penalty out over time vs. all at once every time you open a new searcher <filterCache class="solr.LFUCache" size="50" initialSize="50" autowarmCount="25"/> Index Memory Management
  • 12. ® 2011 Dachis Group. dachisgroup.com • Not a master • Leader is a replica (handles queries) • Accepts update requests for the shard • Increments the _version_ on the new or updated doc • Sends updates (in parallel) to all replicas Leader = Replica + Addl’ Work
  • 13. ® 2011 Dachis Group. dachisgroup.com Don’t let your tlog’s get too big – use “hard” commits with openSearcher=“false” Distributed Indexing View of cluster state from Zk Shard 1 Leader Node 1 Node 2 Shard 2 Leader Shard 2 Replica Shard 1 Replica Zookeeper CloudSolrServer “smart client” Hash on docID 1 2 3 Set the _version_ tlogtlog Get URLs of current leaders? 4 5 2 shards with 1 replica each <autoCommit> <maxDocs>10000</maxDocs> <maxTime>60000</maxTime> <openSearcher>false</openSearcher> </autoCommit> 8,000 docs / sec to 18 shards
  • 14. ® 2011 Dachis Group. dachisgroup.com Send query request to any node Two-stage process 1. Query controller sends query to all shards and merges results One host per shard must be online or queries fail 2. Query controller sends 2nd query to all shards with documents in the merged result set to get requested fields Solr client applications built for 3.x do not need to change (our query code still uses SolrJ 3.6) Limitations JOINs / Grouping need custom hashing Distributed search View of cluster state from Zk Shard 1 Leader Node 1 Node 2 Shard 2 Leader Shard 2 Replica Shard 1 Replica Zookeeper CloudSolrServer 1 3 q=*:* Get URLs of all live nodes 4 2 Query controller Or just a load balancer works too get fields
  • 15. ® 2011 Dachis Group. dachisgroup.com Search by daily activity volume Drive analysis that measures the impact of a social message over time ... Company posts a tweet on Monday, how much activity around that message on Thursday?
  • 16. ® 2011 Dachis Group. dachisgroup.com Problem: Find all documents that had activity on a specific day • tweets that had retweets or YouTube videos that had comments • Use Solr join support to find parent documents by matching on child criteria fq=_val_:"{!join from=echo_grouping_id_s to=id}day_tdt:[2013-05-01T00:00:00Z TO 2013-05-02T00:00:00Z}" ... ... But, joins don’t work in distributed queries and is probably too slow anyway Solution: Index daily activity into multi-valued fields. Use real-time GET to lookup document by ID to get the current daily volume fields fq:daily_volume_tdtm('2013-05-02’) sort=daily_vol(daily_volume_s,'2013-04-01','2013-05-01')+desc daily_volume_tdtm: [2013-05-01, 2013-05-02] <= doc has child signals on May 1 and 2 daily_volume_ssm: 2013-05-01|99, 2013-05-02|88 <= stored only field, doc had 99 child signals on May 1, 88 on May 2 daily_volume_s: 13050288|13050199 <= flattened multi-valued field for sorting using a custom ValueSource Atomic updates and real-time get
  • 17. ® 2011 Dachis Group. dachisgroup.com Will it work? Definitely! Search can be addicting to your organization, queries we tested for 6 months ago vs. what we have today are vastly different Buy RAM – OOMs and aggressive garbage collection cause many issues Give RAM from ^ to the OS – MMapDirectory Need a disaster recovery process in addition to Solr cloud replication; helps with migrating to new hardware too Use Jetty ;-) Store all fields! Atomic updates are a life saver Lessons learned
  • 18. ® 2011 Dachis Group. dachisgroup.com Schema will evolve – we thought we understood our data model but have since added at least 10 new fields and deprecated some too Partition if you can! e.g. 30-day collection We don't optimize – segment merging works great Size your staging environment so that shards have about as many docs and same resources as prod. I have many more nodes in prod but my staging servers have roughly the same number of docs per shard, just fewer shards. Don’t be afraid to customize Solr! It’s designed to be customized with plug-ins • ValueSource is very powerful • Check out PostFilters: {!frange l=1 u=1 cost=200 cache=false}imca(53313,employee) Lessons learned cont.
  • 19. ® 2011 Dachis Group. dachisgroup.com • Backups .../replication?command=backup&location=/mnt/backups • Monitoring Replicas serving queries? All replicas report same number of docs? Zookeeper health New search warm-up time • Configuration update process Our solrconfig.xml changes frequently – see Solr’s zkCli.sh • Upgrade Solr process (it’s moving fast right now) • Recover failed replica process • Add new replica • Kill the JVM on OOM (from Mark Miller) -XX:OnOutOfMemoryError=/home/solr/on_oom.sh -XX:+HeapDumpOnOutOfMemoryError Minimum DevOps Reqts
  • 20. ® 2011 Dachis Group. dachisgroup.com Nodes will crash! (ephemeral znodes) Or, sometimes you just need to restart a JVM (rolling restarts to upgrade) Peer sync via update log (tlog) 100 updates else ... Good ol’ Solr replication from leader to replica Node recovery
  • 21. ® 2011 Dachis Group. dachisgroup.com • Moving to a near real-time streaming model using Storm • Buying more RAM per node • Looking forward to shard splitting as it has become difficult to re-index 600M docs • Re-building the index with DocValues • We've had shards get out of sync after major failure – resolved it by going back to raw data and doing a key by key comparison of what we expected to be in the index and re-indexing any missing docs. • Custom hashing to put all docs for a specific brand in the same shard Roadmap / Futures
  • 22. ® 2011 Dachis Group. dachisgroup.com If you find yourself in this situation, buy more RAM! Obligatory lolcats slide