SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Cross Datacenter Replication in Apache Solr 6
Shalin Shekhar Mangar
Lucidworks Inc.
@shalinmangar
The standard
for enterprise
search.
of Fortune 500
uses Solr.
90%
Agenda
• Review a typical Solr deployment architecture
• Challenges of running a Solr deployment across data centers
• Cross Data Centre Replication (CDCR) in Solr
• Setup and configuration
• Limitations
• Alternative strategies
• Future work
Client ClientClient
Solr
Zookeeper
Datacenter
CDCR Anti-patterns - Remote Solr instances
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
DC 3
Why not a single Solr Cloud?
• Same update is transferred to each replica
• Synchronous indexing means burst-indexing is constrained by cross
DC bandwidth
• Increased latency for indexing operations
• Need a ZooKeeper node in a 3rd DC to break ties
• Search requests are not DC-aware, may choose a remote replica
Cross Datacenter Replication in Solr
• Let’s call it CDCR for short
• Accommodate two or more data centres
• Active/passive setup for disaster recovery
• Support limited bandwidth links
• Eventually consistent passive cluster
Source: http://yonik.com/solr-cross-data-center-replication/
CDCR in Solr 6
• Scalable: no SPoF and/or bottleneck
• Peer cluster can have a different replication factor
• Asynchronous updates; no penalty for indexing
• Push operations for low latency replication
• Low overhead — uses existing transaction logs and indexes
• Leader-to-leader communication ensures update is sent only once
to peer cluster
Target Cluster
Tune replication
Synchronize logs
CdcrUpdateLog
Enable APIs
Update chains
Update chains
Update log
CDCR APIs
• http://host:port/solr/collection_name/cdcr?action=START
• Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS
• Monitoring APIs: QUEUES, OPS, ERRORS
How to failover?
• Change configuration on target to make it the source
• Point indexers to the new target
• Change configuration on source to make it the new target
• May require stopping indexing during the conversion process —
especially if you want to revert the change
CDCR support in Solr 6+
• Active/passive setup either for disaster recovery or for low latency
querying
• Solr clusters with existing data can be converted to a source cluster
from Solr 6.2 onwards
• Low to medium indexing traffic
CDCR Limitations and gotchas
• By default CDCR is disabled — invoke START to enable on both
source and target
• Soft commits are not replicated to target — must schedule
autoSoftCommit explicitly on target
• Different set of configurations required on source and target
• Daisy-chaining is possible but not well tested — add all targets to
the same source cluster
CDCR Limitations and gotchas
• Not suitable for applications requiring high throughput indexing —
some knobs exist for tuning replication speeds
• Update log buffers can grow indefinitely when target clusters are
down — can work around by disabling buffering for the time being
if there is only one target
• No automatic failover between source and target — explicit actions
required to modify configurations and point indexing pipelines to
the new source
• No Active/active setup
Alternative strategy
• Use a proper queue such as Apache Kafka to feed source and target DCs
simultaneously
• Use external versions in conjunction with versions generated by Solr —
DocBasedVersionConstraintsProcessorFactory
• Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale”
by Oliver Bates, Apple Inc. — http://sched.co/8ArU
• Pros: Supports high indexing throughputs and active/active replication
• Cons: Additional systems required, managing consistency is difficult and requires in
depth Solr expertise, all atomic updates must go to a single DC, cannot support
delete-by-query
Problems we solved
• Synchronous indexing to replicas — build separate asynchronous
indexing pipeline
• Limited size of the update log — use update log as the queue
• How to track replication progress to preserve consistency on target
clusters in case the source leader dies — checkpoints
• Bootstrapping target cluster with indexes when update logs are
incomplete
• New replicas on source have no logs to replicate — replicate
update logs during recovery
Future work
• Move configuration out of solrconfig.xml and into API calls
• Dynamically add/remove/change target cluster information
• Cap update log to a max size and fall back to index replication if
necessary
• Refactor and combine CdcrUpdateLog
• Better monitoring: capture transfer rate and latency info
• Add support for rate limiting replication between source and target
• Active/active?
Resources
• CDCR page on ref guide — https://cwiki.apache.org/confluence/
pages/viewpage.action?pageId=62687462
• http://yonik.com/solr-cross-data-center-replication/
• https://cwiki.apache.org/confluence/display/solr/
Updating+Parts+of+Documents
Thank you!
shalin@apache.org

Más contenido relacionado

La actualidad más candente

SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
Mark Miller
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
thelabdude
 

La actualidad más candente (20)

Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4Scaling Through Partitioning and Shard Splitting in Solr 4
Scaling Through Partitioning and Shard Splitting in Solr 4
 
GIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big DataGIDS2014: SolrCloud: Searching Big Data
GIDS2014: SolrCloud: Searching Big Data
 
Solr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloudSolr Exchange: Introduction to SolrCloud
Solr Exchange: Introduction to SolrCloud
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Solrcloud Leader Election
Solrcloud Leader ElectionSolrcloud Leader Election
Solrcloud Leader Election
 
Call me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networksCall me maybe: Jepsen and flaky networks
Call me maybe: Jepsen and flaky networks
 
Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)Solr cluster with SolrCloud at lucenerevolution (tutorial)
Solr cluster with SolrCloud at lucenerevolution (tutorial)
 
SolrCloud Failover and Testing
SolrCloud Failover and TestingSolrCloud Failover and Testing
SolrCloud Failover and Testing
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
 
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
Lucene Revolution 2013 - Scaling Solr Cloud for Large-scale Social Media Anal...
 
Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.Solr 4: Run Solr in SolrCloud Mode on your local file system.
Solr 4: Run Solr in SolrCloud Mode on your local file system.
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
NYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / SolrNYC Lucene/Solr Meetup: Spark / Solr
NYC Lucene/Solr Meetup: Spark / Solr
 
First oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoyFirst oslo solr community meetup lightning talk janhoy
First oslo solr community meetup lightning talk janhoy
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
 
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBMBuilding and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
 
How SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded EnvironmentHow SolrCloud Changes the User Experience In a Sharded Environment
How SolrCloud Changes the User Experience In a Sharded Environment
 
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, EtsySearch-time Parallelism: Presented by Shikhar Bhushan, Etsy
Search-time Parallelism: Presented by Shikhar Bhushan, Etsy
 

Similar a Cross Datacenter Replication in Apache Solr 6

Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
Lucidworks
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Lucidworks
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
James Chen
 

Similar a Cross Datacenter Replication in Apache Solr 6 (20)

Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
 
Failover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptxFailover-Apachecon-Asia-2022.pptx
Failover-Apachecon-Asia-2022.pptx
 
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco(Re)Indexing Large Repositories in Alfresco
(Re)Indexing Large Repositories in Alfresco
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
SolrCloud Cluster management via APIs
SolrCloud Cluster management via APIsSolrCloud Cluster management via APIs
SolrCloud Cluster management via APIs
 
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.Updated Power of the AWR Warehouse, Dallas, HQ, etc.
Updated Power of the AWR Warehouse, Dallas, HQ, etc.
 
Scaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of CollectionsScaling SolrCloud to a large number of Collections
Scaling SolrCloud to a large number of Collections
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...  Automated Cluster Management and Recovery  for Large Scale Multi-Tenant Sea...
Automated Cluster Management and Recovery for Large Scale Multi-Tenant Sea...
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
Cross Data Center Replication Options - A Practical Guide to CDCR - Patrick H...
 
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Autoscaling Solr - Shalin Shekhar Mangar, LucidworksAutoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)Capacity Management/Provisioning (Cloud's full, Can't build here)
Capacity Management/Provisioning (Cloud's full, Can't build here)
 

Más de Shalin Shekhar Mangar

Más de Shalin Shekhar Mangar (7)

Solr BoF (Birds of a Feather) session at Fifth Elephant 2018
Solr BoF (Birds of a Feather) session at Fifth Elephant 2018Solr BoF (Birds of a Feather) session at Fifth Elephant 2018
Solr BoF (Birds of a Feather) session at Fifth Elephant 2018
 
Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6Parallel SQL and Streaming Expressions in Apache Solr 6
Parallel SQL and Streaming Expressions in Apache Solr 6
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene MeetupInside Solr 5 - Bangalore Solr/Lucene Meetup
Inside Solr 5 - Bangalore Solr/Lucene Meetup
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
SolrCloud and Shard Splitting
SolrCloud and Shard SplittingSolrCloud and Shard Splitting
SolrCloud and Shard Splitting
 
Get involved with the Apache Software Foundation
Get involved with the Apache Software FoundationGet involved with the Apache Software Foundation
Get involved with the Apache Software Foundation
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Último (20)

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 

Cross Datacenter Replication in Apache Solr 6

  • 1.
  • 2. Cross Datacenter Replication in Apache Solr 6 Shalin Shekhar Mangar Lucidworks Inc. @shalinmangar
  • 3. The standard for enterprise search. of Fortune 500 uses Solr. 90%
  • 4. Agenda • Review a typical Solr deployment architecture • Challenges of running a Solr deployment across data centers • Cross Data Centre Replication (CDCR) in Solr • Setup and configuration • Limitations • Alternative strategies • Future work
  • 6. CDCR Anti-patterns - Remote Solr instances C Solr Zookeeper DC 1 C C DC 2 C C C
  • 7. CDCR Anti-patterns - Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C
  • 8. CDCR Anti-patterns - Remote ZK and Solr C Solr Zookeeper DC 1 C C DC 2 C C C DC 3
  • 9. Why not a single Solr Cloud? • Same update is transferred to each replica • Synchronous indexing means burst-indexing is constrained by cross DC bandwidth • Increased latency for indexing operations • Need a ZooKeeper node in a 3rd DC to break ties • Search requests are not DC-aware, may choose a remote replica
  • 10. Cross Datacenter Replication in Solr • Let’s call it CDCR for short • Accommodate two or more data centres • Active/passive setup for disaster recovery • Support limited bandwidth links • Eventually consistent passive cluster
  • 12. CDCR in Solr 6 • Scalable: no SPoF and/or bottleneck • Peer cluster can have a different replication factor • Asynchronous updates; no penalty for indexing • Push operations for low latency replication • Low overhead — uses existing transaction logs and indexes • Leader-to-leader communication ensures update is sent only once to peer cluster
  • 14. Enable APIs Update chains Update chains Update log
  • 15. CDCR APIs • http://host:port/solr/collection_name/cdcr?action=START • Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS • Monitoring APIs: QUEUES, OPS, ERRORS
  • 16. How to failover? • Change configuration on target to make it the source • Point indexers to the new target • Change configuration on source to make it the new target • May require stopping indexing during the conversion process — especially if you want to revert the change
  • 17. CDCR support in Solr 6+ • Active/passive setup either for disaster recovery or for low latency querying • Solr clusters with existing data can be converted to a source cluster from Solr 6.2 onwards • Low to medium indexing traffic
  • 18. CDCR Limitations and gotchas • By default CDCR is disabled — invoke START to enable on both source and target • Soft commits are not replicated to target — must schedule autoSoftCommit explicitly on target • Different set of configurations required on source and target • Daisy-chaining is possible but not well tested — add all targets to the same source cluster
  • 19. CDCR Limitations and gotchas • Not suitable for applications requiring high throughput indexing — some knobs exist for tuning replication speeds • Update log buffers can grow indefinitely when target clusters are down — can work around by disabling buffering for the time being if there is only one target • No automatic failover between source and target — explicit actions required to modify configurations and point indexing pipelines to the new source • No Active/active setup
  • 20. Alternative strategy • Use a proper queue such as Apache Kafka to feed source and target DCs simultaneously • Use external versions in conjunction with versions generated by Solr — DocBasedVersionConstraintsProcessorFactory • Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale” by Oliver Bates, Apple Inc. — http://sched.co/8ArU • Pros: Supports high indexing throughputs and active/active replication • Cons: Additional systems required, managing consistency is difficult and requires in depth Solr expertise, all atomic updates must go to a single DC, cannot support delete-by-query
  • 21. Problems we solved • Synchronous indexing to replicas — build separate asynchronous indexing pipeline • Limited size of the update log — use update log as the queue • How to track replication progress to preserve consistency on target clusters in case the source leader dies — checkpoints • Bootstrapping target cluster with indexes when update logs are incomplete • New replicas on source have no logs to replicate — replicate update logs during recovery
  • 22. Future work • Move configuration out of solrconfig.xml and into API calls • Dynamically add/remove/change target cluster information • Cap update log to a max size and fall back to index replication if necessary • Refactor and combine CdcrUpdateLog • Better monitoring: capture transfer rate and latency info • Add support for rate limiting replication between source and target • Active/active?
  • 23. Resources • CDCR page on ref guide — https://cwiki.apache.org/confluence/ pages/viewpage.action?pageId=62687462 • http://yonik.com/solr-cross-data-center-replication/ • https://cwiki.apache.org/confluence/display/solr/ Updating+Parts+of+Documents