SlideShare a Scribd company logo
1 of 20
Download to read offline
Solr Power FTW
   Robby Morgan
What Will I Cover?

● Who I am

● Why/how we use SOLR

● Can SOLR handle 20K queries per second?

● Lessons learned: large scale multi data center deployment

● Conclusion
Robby Morgan

● Software Engineering Lead

● 3 yrs experience w/
  Solr @ Bazaarvoice

● Jack of all trades, master of some :)
Bazaarvoice

● Bazaarvoice is a software as a service
  company powering User Generated Content
  such as ratings and reviews
  on thousands of web sites


● 5 billion page views per month

● 230 billion impressions

● 75 million UGC
SOLR Case Study
SOLR Case Study
Life Before SOLR

● Indexes for sorting and filtering

● Aggregate tables for stats

● Nightly jobs

● Bugs...
Enter SOLR

● Index content and product catalog
● De-normalization
● Filtering, faceting/stats and sorting
● Index every 15 minutes (20 seconds NRT)
SOLR usage @ Bazaarvoice

   Documents         250 MM

   Index size        200 GB

   QPS, avg          2,350

   QPS, max          10,200

   Response time, avg 12 ms

   Servers           6+20
SOLR Cloud @ Bazaarvoice

● Multiple cores (100+ per server)

● Re-balance indexes across cores and servers
   ○ Automatic
   ○ Manual

● Deployment map stored in MySQL
   ○ Host - Core - Partition
   ○ Statistics

● Partition lifecycle
Scaling reads - Replication
Replication - Multiple Data Centers
Lessons: SOLR Performance

● SOLR loves RAM!
                                     SOLR

● Simulate and measure
   ○ Same config, same hardware

● Get the most out of one instance
Lessons: Cross-DC Replication

Chatty if using multiple cores

Relay
 ● Core auto-warming disabled
 ● Connection wait and read timeouts increased
 ● Replication poll interval increased (15 min)
 ● Compression enabled
...
<str name="httpConnTimeout">20000</str>
<str name="httpReadTimeout">65000</str>
<str name="pollInterval">00:15:00</str>
<str name="compression">internal</str>
...
Performance Tuning

 ● Heap size
 ● Cache sizing
 ● Auto-warming
 ● Stored fields
 ● Merge factor
 ● Commit frequency
 ● Optimize frequency
Process: Simulate and measure
 ● Replay logs
 ● Analyze metrics
 ● Monitor GC
Performance Tuning - GC

# Java memory usage settings
# Force the NewSize to be larger than the JVM typically allocates.
# In practice, the JVM has been allocating an extremely small Young generation
which objects to be prematurely promoted to the Tenured generation
JAVA_MEM_OPTS="-Xms27g -Xmx27g -XX:NewRatio=8"

# -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps --> Turn on GC Logging
# -XX:+UseConcMarkSweepGC        --> Use the concurrent collector
# -XX:+CMSIncrementalMode        --> Incremental mode for the concurrent collector
# -XX:+CMSIncrementalPacing      --> Let the JVM adjust the amount of incremental
collection

JAVA_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:
+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=55 -XX:
ParallelGCThreads=8 -XX:SurvivorRatio=4"
Lessons: Schema Changes

● Re-indexing is time consuming for large indexes

● Process:

  1. Full re-index off-line prior to the release
  2. Incremental indexing after the release

● Bottleneck: reading from MySQL

● Goal: Transparent on-line re-indexing
Conclusion - SOLR Strengths

● Not only full-text search

● Lightning fast given enough RAM

● Good scale out support
  including multi-data center

● Great community, wide range of use cases
Conclusion - SOLR's Gaps

● Not fully elastic

● Real time takes work

● Secondary data store = sync overhead, inconsistencies

● Schema changes
Questions




            robby.morgan@
            bazaarvoice.com

        developer.bazaarvoice.com

More Related Content

What's hot

Xldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseXldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouse
liqiang xu
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
Boyang Niu
 

What's hot (20)

Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
 
MongoDB SF Ruby
MongoDB SF RubyMongoDB SF Ruby
MongoDB SF Ruby
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
10 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
10 EZ Steps to SOLR Domination - Berlin Buzzwords 201210 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
10 EZ Steps to SOLR Domination - Berlin Buzzwords 2012
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
 
Xldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouseXldb2011 tue 1120_youtube_datawarehouse
Xldb2011 tue 1120_youtube_datawarehouse
 
Aerospike - fast and furious caching @ Burgasconf 2016
Aerospike - fast and furious caching @ Burgasconf 2016Aerospike - fast and furious caching @ Burgasconf 2016
Aerospike - fast and furious caching @ Burgasconf 2016
 
Monitoring with Clickhouse
Monitoring with ClickhouseMonitoring with Clickhouse
Monitoring with Clickhouse
 
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic StackHadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
Hadoop con2016 - Implement Real-time Centralized logging System by Elastic Stack
 
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
MongoDB vs Scylla: Production Experience from Both Dev & Ops Standpoint at Nu...
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
Cost Effective Presto on AWS with Spot Nodes - Strata SF 2019
 
Amazon Web Services lection 4
Amazon Web Services lection 4  Amazon Web Services lection 4
Amazon Web Services lection 4
 
Scaling Up with PHP and AWS
Scaling Up with PHP and AWSScaling Up with PHP and AWS
Scaling Up with PHP and AWS
 
2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)2013 DATA @ NFLX (Tableau User Group)
2013 DATA @ NFLX (Tableau User Group)
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
 
Open source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applicationsOpen source big data landscape and possible ITS applications
Open source big data landscape and possible ITS applications
 
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! JapanScylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
 

Viewers also liked

Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
Steve Watt
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
Erik Hatcher
 

Viewers also liked (7)

Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
 
Living with garbage
Living with garbageLiving with garbage
Living with garbage
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Deep Dive - DynamoDB
Deep Dive - DynamoDBDeep Dive - DynamoDB
Deep Dive - DynamoDB
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
Amazon DynamoDB Design Patterns for Ultra-High Performance Apps (DAT304) | AW...
 

Similar to SOLR Power FTW: short version

JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdf
RichHagarty
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
Tier1 App
 

Similar to SOLR Power FTW: short version (20)

Solr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World OverSolr Power FTW: Powering NoSQL the World Over
Solr Power FTW: Powering NoSQL the World Over
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
Tweaking performance on high-load projects
Tweaking performance on high-load projectsTweaking performance on high-load projects
Tweaking performance on high-load projects
 
Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDBAggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 
State of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NAState of GeoServer at FOSS4G-NA
State of GeoServer at FOSS4G-NA
 
JITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdfJITServerTalk-OSS-2023.pdf
JITServerTalk-OSS-2023.pdf
 
Am I reading GC logs Correctly?
Am I reading GC logs Correctly?Am I reading GC logs Correctly?
Am I reading GC logs Correctly?
 
Evolution of DBA in the Cloud Era
 Evolution of DBA in the Cloud Era Evolution of DBA in the Cloud Era
Evolution of DBA in the Cloud Era
 
ScaleDB Technical Presentation
ScaleDB Technical PresentationScaleDB Technical Presentation
ScaleDB Technical Presentation
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Cold fusion is racecar fast
Cold fusion is racecar fastCold fusion is racecar fast
Cold fusion is racecar fast
 
The State of the GeoServer project
The State of the GeoServer projectThe State of the GeoServer project
The State of the GeoServer project
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Become a GC Hero
Become a GC HeroBecome a GC Hero
Become a GC Hero
 
Performance tuning jvm
Performance tuning jvmPerformance tuning jvm
Performance tuning jvm
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
 
Become a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day DevopsBecome a Java GC Hero - All Day Devops
Become a Java GC Hero - All Day Devops
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

SOLR Power FTW: short version

  • 1. Solr Power FTW Robby Morgan
  • 2. What Will I Cover? ● Who I am ● Why/how we use SOLR ● Can SOLR handle 20K queries per second? ● Lessons learned: large scale multi data center deployment ● Conclusion
  • 3. Robby Morgan ● Software Engineering Lead ● 3 yrs experience w/ Solr @ Bazaarvoice ● Jack of all trades, master of some :)
  • 4. Bazaarvoice ● Bazaarvoice is a software as a service company powering User Generated Content such as ratings and reviews on thousands of web sites ● 5 billion page views per month ● 230 billion impressions ● 75 million UGC
  • 7. Life Before SOLR ● Indexes for sorting and filtering ● Aggregate tables for stats ● Nightly jobs ● Bugs...
  • 8. Enter SOLR ● Index content and product catalog ● De-normalization ● Filtering, faceting/stats and sorting ● Index every 15 minutes (20 seconds NRT)
  • 9. SOLR usage @ Bazaarvoice Documents 250 MM Index size 200 GB QPS, avg 2,350 QPS, max 10,200 Response time, avg 12 ms Servers 6+20
  • 10. SOLR Cloud @ Bazaarvoice ● Multiple cores (100+ per server) ● Re-balance indexes across cores and servers ○ Automatic ○ Manual ● Deployment map stored in MySQL ○ Host - Core - Partition ○ Statistics ● Partition lifecycle
  • 11. Scaling reads - Replication
  • 12. Replication - Multiple Data Centers
  • 13. Lessons: SOLR Performance ● SOLR loves RAM! SOLR ● Simulate and measure ○ Same config, same hardware ● Get the most out of one instance
  • 14. Lessons: Cross-DC Replication Chatty if using multiple cores Relay ● Core auto-warming disabled ● Connection wait and read timeouts increased ● Replication poll interval increased (15 min) ● Compression enabled ... <str name="httpConnTimeout">20000</str> <str name="httpReadTimeout">65000</str> <str name="pollInterval">00:15:00</str> <str name="compression">internal</str> ...
  • 15. Performance Tuning ● Heap size ● Cache sizing ● Auto-warming ● Stored fields ● Merge factor ● Commit frequency ● Optimize frequency Process: Simulate and measure ● Replay logs ● Analyze metrics ● Monitor GC
  • 16. Performance Tuning - GC # Java memory usage settings # Force the NewSize to be larger than the JVM typically allocates. # In practice, the JVM has been allocating an extremely small Young generation which objects to be prematurely promoted to the Tenured generation JAVA_MEM_OPTS="-Xms27g -Xmx27g -XX:NewRatio=8" # -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps --> Turn on GC Logging # -XX:+UseConcMarkSweepGC --> Use the concurrent collector # -XX:+CMSIncrementalMode --> Incremental mode for the concurrent collector # -XX:+CMSIncrementalPacing --> Let the JVM adjust the amount of incremental collection JAVA_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX: +UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=55 -XX: ParallelGCThreads=8 -XX:SurvivorRatio=4"
  • 17. Lessons: Schema Changes ● Re-indexing is time consuming for large indexes ● Process: 1. Full re-index off-line prior to the release 2. Incremental indexing after the release ● Bottleneck: reading from MySQL ● Goal: Transparent on-line re-indexing
  • 18. Conclusion - SOLR Strengths ● Not only full-text search ● Lightning fast given enough RAM ● Good scale out support including multi-data center ● Great community, wide range of use cases
  • 19. Conclusion - SOLR's Gaps ● Not fully elastic ● Real time takes work ● Secondary data store = sync overhead, inconsistencies ● Schema changes
  • 20. Questions robby.morgan@ bazaarvoice.com developer.bazaarvoice.com