SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Distributed RDBMS
Challenges, Solutions & Trade-offs
by Ahmed Magdy
Blog: ahmed.a1cv.com
DBMS Rank by Popularity
• http://db-engines.com/en/ranking
Agenda
• Distributed RDBMS: What & Why
• Fallacies of Distributed Computing
• ACID Challenges
• Two Phase Commit Algorithm (2PC)
• Distributed Relational Challenges
• How to Distribute
• Replication
• Sharding
• Trade-offs
• Latency vs Throughput
• Consistency vs Availability
• Consistency vs Latency
• Some Distributed RDBMS Providers
Before we start!
If I had an hour to solve a problem
I'd spend 55 minutes thinking
about the problem and 5 minutes
thinking about solutions.
Distributed RDBMS: What & Why
What?
• Relational DBMS installed on multiple machines/VMs sharing the same database
serving common applications.
Why?
• Scalability (more throughput)
• Performance (geographically nearer servers)
• Availability (fail-over)
Fallacies of Distributed Computing
• The network is reliable.
• Latency is zero.
• Bandwidth is infinite.
• The network is secure.
• Topology doesn't change.
• There is one administrator.
• Transport cost is zero.
• The network is homogeneous.
ACID Challenges
Atomicity
• Either all operations occur or nothing
• Preventing partial updates
Consistency
• Transactions do not violate data integrity:
• Entity integrity
• Referential integrity
• Domain integrity
• User-defined
• Sequential consistency (Serializability)
• Eventual Consistency
Isolation
Isolation level Dirty reads Non-repeatable reads Phantoms
Read Uncommitted may occur may occur may occur
Read Committed - may occur may occur
Repeatable Read - - may occur
Serializable - - -
Read phenomena:
• Dirty reads: read uncommitted writes
• Non-repeatable reads: the data changes between reads
• Phantom reads: 2 identical queries return different number of rows
Isolation Levels: Read Uncommitted, Read Committed,
Repeatable Read, Serializable
Durability
• Changes are persisted to disk before reporting
the transaction as committed
Approaches:
• Write representation to disk
• Write operations to transaction log
Two Phase Commit Algorithm (2PC)
1. Commit-request Phase (voting phase):
• Coordinator: “Hi Participants, Do you agree to commit this transaction?”
• Participants: “Yes Sir”
2. Commit Phase:
• Coordinator: “Let’s do it guys!”
• Blocks client until all participants
Commit or rollback
• Provides Atomicity & Consistency
How to Distribute
• Replication
• Master-slave
• Multi-master
• Partitioning
• Horizontal (Sharding)
• Vertical (like one-to-one relationships)
• Functional (like in Microservice architecture)
Replication
Benefits:
• Higher Availability
• Load balancing
• Performance gains by replication to geographically nearer data centers
Master-Slave vs Multi-Master Replication
Master-Slave Replication:
• Master for writing, slaves for
reading
• Single point of failure
• A slave can be promoted to be
master, if the master is down
(manually or automatically)
Multi-Master Replication:
• High fault tolerance
• Better load balancing of write and
read operations
• Complex transactional conflict
prevention is required
Pessimistic vs Optimistic Replication
Pessimistic Replication:
• Eager / synchronous
• Higher latency (blocking)
• Better Consistency
• Conflicting transactions are
detected before commit so
they can rollback.
Optimistic Replication:
• Lazy / asynchronous
• Lower latency
• Eventual Consistency
• Complex conflict resolution:
• Syntactic
• Semantic
Sharding
Strategies
• Lookup (routing table & virtual shards)
• Range (better for range queries)
• Hash (less hotspots for monotonic shard keys)
Best Practices:
• Ensure that shard keys are unique.
• Use stable data for the shard key.
• Keep shards balanced to handle similar volumes of I/O.
• Shard the data to support the most frequently performed queries
• Use parallel tasks if you need to access more than 1 shard
• Minimize operations that affect data in multiple shards
• Shards can be geo-located to reduce latency
Latency vs Throughput
Latency = 1 min
Throughput = 1 car / min
Latency = 1 min
Throughput = 3 cars / min
Latency vs Throughput [continued]
Latency vs Throughput
Latency = 1.5 min
Throughput = 1 car every 1.5 min
= 0.66 cars / min
Latency = 1.5 min
Throughput =
3 cars every 2.5 min
= 1.2 cars / min
Latency vs Throughput [continued]
• More nodes are added to
improve throughput, but latency
is deteriorated.
Latency vs Throughput [continued]
Consistency vs Availability
CAP Theorem
Partitioned
Consistency vs Latency
DDBS P+A P+C E+L E+C
Dynamo Yes Yes
Cassandra Yes Yes
Riak Yes Yes
MySQL Yes Yes
MongoDB Yes Yes
PACELC Theorem
PACELC: Partitioned  Availability | Consistency
Else  Latency | Consistency
Distributed Relational Challenges
• The Aggregation Challenge
SELECT AVG(salary) FROM employees;
• The Distinctive Values Challenge
SELECT DISTINCT country_id FROM employees;
• The Joins Challenge
SELECT e.first_name, e.last_name, d.name FROM employees AS e INNER JOIN
departments AS d ON e.department_id = d.id;
• The Sub-Queries Challenge
SELECT first_name, last_name FROM employees WHERE department_id IN
(SELECT id FROM departments WHERE rating > 4);
• The “Combination” Challenge
SELECT AVG(salary) FROM employees WHERE department_id IN
(SELECT id FROM departments WHERE rating > 4);
Average Salary Query with MapReduce in MongoDB
var mapFunc = function() {
emit(0, this.salary);
};
var reduceFunc = function(key, salaries) {
return Array.sum(salaries) / salaries.length;
};
db.employees.mapReduce(
mapFunc,
reduceFunc,
{
out: {inline: 1}
}
)['results'][0]['value'];
Some Distributed RDBMS Providers
Questions?

Más contenido relacionado

La actualidad más candente

Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936Alf Baez
 
Architecting for the cloud storage misc topics
Architecting for the cloud storage misc topicsArchitecting for the cloud storage misc topics
Architecting for the cloud storage misc topicsLen Bass
 
Datavail Health Check
Datavail Health CheckDatavail Health Check
Datavail Health CheckDatavail
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2Shelton Reese
 
Make a Move to AWS Now
Make a Move to AWS Now Make a Move to AWS Now
Make a Move to AWS Now Buurst
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability ConsiderationsNavid Malek
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sqlDr-Dipali Meher
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Simple Solutions for Complex Problems
Simple Solutions for Complex ProblemsSimple Solutions for Complex Problems
Simple Solutions for Complex ProblemsTyler Treat
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
high performance databases
high performance databaseshigh performance databases
high performance databasesmahdi_92
 

La actualidad más candente (20)

Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Architecting for the cloud storage misc topics
Architecting for the cloud storage misc topicsArchitecting for the cloud storage misc topics
Architecting for the cloud storage misc topics
 
Datavail Health Check
Datavail Health CheckDatavail Health Check
Datavail Health Check
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Cloud database
Cloud databaseCloud database
Cloud database
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
MYSQL_Basic_Performance_Tuning_Guidelines_-_V2
 
Make a Move to AWS Now
Make a Move to AWS Now Make a Move to AWS Now
Make a Move to AWS Now
 
Scalability Considerations
Scalability ConsiderationsScalability Considerations
Scalability Considerations
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Schema migrations in no sql
Schema migrations in no sqlSchema migrations in no sql
Schema migrations in no sql
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Cloud
CloudCloud
Cloud
 
Simple Solutions for Complex Problems
Simple Solutions for Complex ProblemsSimple Solutions for Complex Problems
Simple Solutions for Complex Problems
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
high performance databases
high performance databaseshigh performance databases
high performance databases
 

Similar a Distributed RDBMS challenges, solutions and trade-offs

02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_hamlraviol
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayDataStax Academy
 
Performance and Scalability Tuning
Performance and Scalability TuningPerformance and Scalability Tuning
Performance and Scalability TuningAndres March
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterContinuent
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that growGibraltar Software
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Sharma Podila
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenParticular Software
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBMariaDB plc
 
Replication in Distributed Real Time Database
Replication in Distributed Real Time DatabaseReplication in Distributed Real Time Database
Replication in Distributed Real Time DatabaseGhanshyam Yadav
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
MariaDB High Availability
MariaDB High AvailabilityMariaDB High Availability
MariaDB High AvailabilityMariaDB plc
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersRan Ziv
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptxMichael Ming Lei
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategyMariaDB plc
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 

Similar a Distributed RDBMS challenges, solutions and trade-offs (20)

No stress with state
No stress with stateNo stress with state
No stress with state
 
02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha02 2017 emea_roadshow_milan_ha
02 2017 emea_roadshow_milan_ha
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Client Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right WayClient Drivers and Cassandra, the Right Way
Client Drivers and Cassandra, the Right Way
 
Performance and Scalability Tuning
Performance and Scalability TuningPerformance and Scalability Tuning
Performance and Scalability Tuning
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017Podila mesos con-northamerica_sep2017
Podila mesos con-northamerica_sep2017
 
NoSQL Evolution
NoSQL EvolutionNoSQL Evolution
NoSQL Evolution
 
The impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves GoelevenThe impact of cloud NSBCon NY by Yves Goeleven
The impact of cloud NSBCon NY by Yves Goeleven
 
Best Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDBBest Practice for Achieving High Availability in MariaDB
Best Practice for Achieving High Availability in MariaDB
 
Replication in Distributed Real Time Database
Replication in Distributed Real Time DatabaseReplication in Distributed Real Time Database
Replication in Distributed Real Time Database
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
MariaDB High Availability
MariaDB High AvailabilityMariaDB High Availability
MariaDB High Availability
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptx
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 

Más de Ahmed Magdy Ezzeldin, MSc.

Más de Ahmed Magdy Ezzeldin, MSc. (12)

Win any Interview like a Boss
Win any Interview like a BossWin any Interview like a Boss
Win any Interview like a Boss
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
A survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithmsA survey of fault prediction using machine learning algorithms
A survey of fault prediction using machine learning algorithms
 
GATE : General Architecture for Text Engineering
GATE : General Architecture for Text EngineeringGATE : General Architecture for Text Engineering
GATE : General Architecture for Text Engineering
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
Distributed Coordination-Based Systems
Distributed Coordination-Based SystemsDistributed Coordination-Based Systems
Distributed Coordination-Based Systems
 
Distributed Systems Naming
Distributed Systems NamingDistributed Systems Naming
Distributed Systems Naming
 
Cyclcone a safe dialect of C
Cyclcone a safe dialect of CCyclcone a safe dialect of C
Cyclcone a safe dialect of C
 
Objective C Memory Management
Objective C Memory ManagementObjective C Memory Management
Objective C Memory Management
 
Bash Scripting Workshop
Bash Scripting WorkshopBash Scripting Workshop
Bash Scripting Workshop
 
Object Role Modeling
Object Role ModelingObject Role Modeling
Object Role Modeling
 

Último

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 

Último (20)

SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 

Distributed RDBMS challenges, solutions and trade-offs

  • 1. Distributed RDBMS Challenges, Solutions & Trade-offs by Ahmed Magdy Blog: ahmed.a1cv.com
  • 2. DBMS Rank by Popularity • http://db-engines.com/en/ranking
  • 3. Agenda • Distributed RDBMS: What & Why • Fallacies of Distributed Computing • ACID Challenges • Two Phase Commit Algorithm (2PC) • Distributed Relational Challenges • How to Distribute • Replication • Sharding • Trade-offs • Latency vs Throughput • Consistency vs Availability • Consistency vs Latency • Some Distributed RDBMS Providers
  • 4. Before we start! If I had an hour to solve a problem I'd spend 55 minutes thinking about the problem and 5 minutes thinking about solutions.
  • 5. Distributed RDBMS: What & Why What? • Relational DBMS installed on multiple machines/VMs sharing the same database serving common applications. Why? • Scalability (more throughput) • Performance (geographically nearer servers) • Availability (fail-over)
  • 6. Fallacies of Distributed Computing • The network is reliable. • Latency is zero. • Bandwidth is infinite. • The network is secure. • Topology doesn't change. • There is one administrator. • Transport cost is zero. • The network is homogeneous.
  • 8. Atomicity • Either all operations occur or nothing • Preventing partial updates
  • 9. Consistency • Transactions do not violate data integrity: • Entity integrity • Referential integrity • Domain integrity • User-defined • Sequential consistency (Serializability) • Eventual Consistency
  • 10. Isolation Isolation level Dirty reads Non-repeatable reads Phantoms Read Uncommitted may occur may occur may occur Read Committed - may occur may occur Repeatable Read - - may occur Serializable - - - Read phenomena: • Dirty reads: read uncommitted writes • Non-repeatable reads: the data changes between reads • Phantom reads: 2 identical queries return different number of rows Isolation Levels: Read Uncommitted, Read Committed, Repeatable Read, Serializable
  • 11. Durability • Changes are persisted to disk before reporting the transaction as committed Approaches: • Write representation to disk • Write operations to transaction log
  • 12. Two Phase Commit Algorithm (2PC) 1. Commit-request Phase (voting phase): • Coordinator: “Hi Participants, Do you agree to commit this transaction?” • Participants: “Yes Sir” 2. Commit Phase: • Coordinator: “Let’s do it guys!” • Blocks client until all participants Commit or rollback • Provides Atomicity & Consistency
  • 13. How to Distribute • Replication • Master-slave • Multi-master • Partitioning • Horizontal (Sharding) • Vertical (like one-to-one relationships) • Functional (like in Microservice architecture)
  • 14. Replication Benefits: • Higher Availability • Load balancing • Performance gains by replication to geographically nearer data centers
  • 15. Master-Slave vs Multi-Master Replication Master-Slave Replication: • Master for writing, slaves for reading • Single point of failure • A slave can be promoted to be master, if the master is down (manually or automatically) Multi-Master Replication: • High fault tolerance • Better load balancing of write and read operations • Complex transactional conflict prevention is required
  • 16. Pessimistic vs Optimistic Replication Pessimistic Replication: • Eager / synchronous • Higher latency (blocking) • Better Consistency • Conflicting transactions are detected before commit so they can rollback. Optimistic Replication: • Lazy / asynchronous • Lower latency • Eventual Consistency • Complex conflict resolution: • Syntactic • Semantic
  • 17. Sharding Strategies • Lookup (routing table & virtual shards) • Range (better for range queries) • Hash (less hotspots for monotonic shard keys) Best Practices: • Ensure that shard keys are unique. • Use stable data for the shard key. • Keep shards balanced to handle similar volumes of I/O. • Shard the data to support the most frequently performed queries • Use parallel tasks if you need to access more than 1 shard • Minimize operations that affect data in multiple shards • Shards can be geo-located to reduce latency
  • 18. Latency vs Throughput Latency = 1 min Throughput = 1 car / min
  • 19. Latency = 1 min Throughput = 3 cars / min Latency vs Throughput [continued]
  • 20. Latency vs Throughput Latency = 1.5 min Throughput = 1 car every 1.5 min = 0.66 cars / min
  • 21. Latency = 1.5 min Throughput = 3 cars every 2.5 min = 1.2 cars / min Latency vs Throughput [continued]
  • 22. • More nodes are added to improve throughput, but latency is deteriorated. Latency vs Throughput [continued]
  • 23.
  • 24. Consistency vs Availability CAP Theorem Partitioned
  • 25. Consistency vs Latency DDBS P+A P+C E+L E+C Dynamo Yes Yes Cassandra Yes Yes Riak Yes Yes MySQL Yes Yes MongoDB Yes Yes PACELC Theorem PACELC: Partitioned  Availability | Consistency Else  Latency | Consistency
  • 26. Distributed Relational Challenges • The Aggregation Challenge SELECT AVG(salary) FROM employees; • The Distinctive Values Challenge SELECT DISTINCT country_id FROM employees; • The Joins Challenge SELECT e.first_name, e.last_name, d.name FROM employees AS e INNER JOIN departments AS d ON e.department_id = d.id; • The Sub-Queries Challenge SELECT first_name, last_name FROM employees WHERE department_id IN (SELECT id FROM departments WHERE rating > 4); • The “Combination” Challenge SELECT AVG(salary) FROM employees WHERE department_id IN (SELECT id FROM departments WHERE rating > 4);
  • 27. Average Salary Query with MapReduce in MongoDB var mapFunc = function() { emit(0, this.salary); }; var reduceFunc = function(key, salaries) { return Array.sum(salaries) / salaries.length; }; db.employees.mapReduce( mapFunc, reduceFunc, { out: {inline: 1} } )['results'][0]['value'];