SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Scalable Persistent Storage for Erlang Theory and Practice 
Amir Ghaffari Jon Meredith 
Natalia Chechina , Phil Trinder 
London Riak Meetup - October 22, 2013 
1 
http://www.release-project.eu
Outline 
•RELEASE Project 
•General principles of scalable DBMSs 
•NoSQL DBMSs for Erlang 
•Riak 1.1.1 Scalability in Practice 
•Investigating the scalability of distributed Erlang 
•Riak Elasticity 
•Conclusion & Future work 
2
RELEASE project 
•RELEASE is an European project aiming to scale Erlang onto commodity architectures with 100,000 cores. 
3
RELEASE project 
The RELEASE consortium work at following levels: 
Virtual machine 
Language 
scalable Computation model 
Scalable In-memory data structures 
Scalable Persistent data structures 
Infrastructure levels 
Profiling and refactoring tools 
4
General principles of scalable DBMSs 
Data Fragmentation 
1.Decentralized model (e.g. P2P model) 
2.Systematic load balancing (make life easier for developer) 
3.Location transparency 
5 
0-2K 
2k-4K 
4k-6K 
16k-18K 
18k-20K 
20kB 
e.g. 20k data is fragmented among 10 nodes
General principles of scalable DBMSs 
Replication 
1.Decentralized model (e.g. P2P model) 
2.Location transparency 
3.Asynchronous replication (write is considered complete as soon as on node acknowledges it) 
6 
X 
e.g. Key X is replicated on three nodes 
. 
. 
X 
. 
. 
X 
. . X
General principles of scalable DBMSs 
Consistency 
Availability 
Partition Tolerance 
ACID Systems 
Eventual Consistency 
CAP theorem: cannot simultaneously guarantee: 
•Partition tolerance: system continues to operate despite nodes can't talk to each other 
•Availability: guarantee that every request receives a response 
•Consistency: all nodes see the same data at the same time 
Not achievable because network failures are inevitable 
7 
Solution: Eventual consistency and reconciling conflicts via data versioning 
ACID=Atomicity, Consistency, Isolation, Durability
NoSQL DBMSs for Erlang 
Mnesia 
CouchDB 
Riak 
Cassandra 
Fragmentation 
•Explicit placement 
•Client-server 
•Automatic by using a hash function 
•Explicit placement 
•Multi-server 
•Lounge is not part of each CouchDB node 
•Implicit placement 
•Peer to peer 
•Automatic by using consistent hash technique 
•Implicit placement 
•Peer to peer 
•Automatic by using consistent hash technique 
Replication 
•Explicit placement 
•Client-server 
•Asynchronous ( Dirty operation) 
•Explicit placement 
•Multi-server 
•Asynchronous 
•Implicit placement 
•Peer to peer 
•Asynchronous 
•Implicit placement 
•Peer to peer 
•Asynchronous 
Partition Tolerant 
•Strong consistency 
•Eventual consistency 
•Multi-Version Concurrency Control for reconciliation 
•Eventual consistency 
•Vector clocks for reconciliation 
•Eventual consistency 
•Use timestamp to reconcile 
Query Processing 
& 
Backend Storage 
•The largest possible Mnesia table is 4Gb 
•No limitation 
•Supports Map/Reduce Queries 
•Bitcask has memory limitation 
•LevelDB has no limitation 
•Supports Map/Reduce queries 
•No limitation 
•Supports Map/Reduce queries 
8
Initial Evaluation Results 
General Principles 
Initial Evaluation 
•Mnesia 
•CouchDB 
•Riak 
•Cassendra 
Scalable persistent storage for SD Erlang can be provided by Dynamo-style DBMSs such as Riak,Cassandra 
9
Riak Scalability in Practice 
•Basho Bench: a benchmarking tool for Riak 
•We measure Basho Bench on 348-node Kalkyl cluster 
•Scalability: How does adding more Riak nodes affect the throughput? 
•There are two kinds of nodes in a cluster: 
•Traffic generators 
•Riak nodes 
10
Node Organisation 
11 
Heuristic: one traffic generator per 3 Riak nodes
Traffic Generator 
12
Riak 1.1.1 Scalability 
Benchmark on 100-node cluster (800 cores) 
13
Failures 
14
Profiling Resource Usage 
15 
CPU Usage
Profiling Resource Usage 
16 
DISK Usage
Profiling Resource Usage 
17 
Memory Usage
Profiling Resource Usage 
18 
Network Traffic of Generator Nodes
Profiling Resource Usage 
19 
Network Traffic of Riak Nodes
Bottleneck for Riak Scalability 
CPU, RAM, Disk, and Network profiling reveal that they can't be bottleneck for Riak scalability. 
Is the Riak scalability limits due to limits in distributed Erlang? 
To find out, let’s measure the scalability of distributed Erlang. 
20
DE-Bench 
21 
•DE-Bench: a benchmarking tool for distributed Erlang 
•It is based on Basho Bench 
•Measures the throughput of a cluster of Erlang nodes 
•Records the latency of distributed Erlang commands individually
Distributed Erlang Commands 
22 
• Spawn/RPC: peer to peer commands 
• register_name : global name tables located on every node 
• unregister_name : global name tables located on every node 
• whereis_name : a lookup in the local table 
Register 
Unregister 
Erlang VM 
Erlang VM 
Erlang VM 
Erlang VM 
Global name table 
Global name table 
Global name table 
Global name table
DE-Bench’s P2P Design 
23 
Physical host 1 
Physical host 2
Frequency of Global Operation 
24 
Frequently Max Throughput 1% 30 nodes 0.5% 50 nodes 0.33% 70 nodes 0% 1600 nodes 
Global Operations limit the scalability of distributed Erlang
Riak Software Scalability 
•Monitoring global.erl module from OTP library shows that Riak does NOT use any global operation. 
•Instrumenting gen_server.erl module reveals that: 
 Of the 15 most time-consuming operations, only the time of rpc:call grows with cluster size. 
Moreover, of the five Riak RPC calls, only start_put_fsm function from module riak_kv_put_fsm_sup grows with cluster size. 
25
Eliminating the Bottlenecks 
•Independently, Basho identified that two supervisor processes, i.e. riak_kv_get/put_fsm_sup, become bottleneck under heavy load, exhibiting build up in message queue length. 
•To improve the Riak scalability in version 1.3 and 1.4 Basho applied a number of techniques and introduced new library sidejob (https://github.com/basho/sidejob). 
26
Riak1.1.1 Elasticity 
Time-line shows Riak cluster losing and gaining nodes 
27
Riak1.1.1 Elasticity 
How Riak cluster deals with nodes leaving and joining 
28
Observation 
• Number of failures (37) 
• Number of successful operations (approximately 3.41 million) 
• When failed nodes come back up, the throughput has grown that shows Riak1.1.1 has a good elasticity. 
29
Conclusion and Future work 
 Our benchmark confirms that Riak has a good elasticity. 
 We establish for the first time scientifically the scalability limit of Riak 1.1.1 as 60 nodes. 
 We have shown how global operations limits the scalability of distributed Erlang. 
Riak scalability bottelnecks are eliminated in Riak versions 1.3 and upcoming versions. 
 In RELEASE, we are working to scale up distributed Erlang by grouping nodes in smaller partitions. 
30
References 
 Benchmarking Riak https://github.com/amirghaffari/benchmark_riak 
Basho Bench http://docs.basho.com/riak/latest/ops/building/benchmarking/ 
 DE-Bench https://github.com/amirghaffari/DEbench 
 A. Ghaffari, N. Chechina, P. Trinder, and J. Meredith. Scalable Persistent Storage for Erlang: Theory and Practice. In Proceedings of the Twelfth ACM SIGPLAN Workshop on Erlang, pages 73-74, September 2013. ACM Press. 
 Clusters at UPPMAX http://www.uppmax.uu.se/hardware 
 Sidejob https://github.com/basho/sidejob 
31

Más contenido relacionado

La actualidad más candente

OpenStack Control Plane High Availability
OpenStack Control Plane High AvailabilityOpenStack Control Plane High Availability
OpenStack Control Plane High AvailabilityMichael Solberg
 
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...confluent
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High AvailabilityJakub Pavlik
 
vSphere With OpenStack
vSphere With OpenStackvSphere With OpenStack
vSphere With OpenStackKenneth Hui
 
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedis Labs
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpNathan Handler
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kiloSteven Li
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreAndy Gross
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...HostedbyConfluent
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?Ricardo Paiva
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Chef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HAChef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HAAdam Spiers
 
How Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeHow Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeRicardo Paiva
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021StreamNative
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...confluent
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 

La actualidad más candente (20)

OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
OpenStack Control Plane High Availability
OpenStack Control Plane High AvailabilityOpenStack Control Plane High Availability
OpenStack Control Plane High Availability
 
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
A Tale of Two Data Centers: Kafka Streams Resiliency (Anna McDonald, Confluen...
 
OpenStack High Availability
OpenStack High AvailabilityOpenStack High Availability
OpenStack High Availability
 
vSphere With OpenStack
vSphere With OpenStackvSphere With OpenStack
vSphere With OpenStack
 
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per DayRedisConf18 - Redis at LINE - 25 Billion Messages Per Day
RedisConf18 - Redis at LINE - 25 Billion Messages Per Day
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Open stack ha design & deployment kilo
Open stack ha design & deployment   kiloOpen stack ha design & deployment   kilo
Open stack ha design & deployment kilo
 
Building Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak CoreBuilding Distributed Systems With Riak and Riak Core
Building Distributed Systems With Riak and Riak Core
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
 
How is Kafka so Fast?
How is Kafka so Fast?How is Kafka so Fast?
How is Kafka so Fast?
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Chef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HAChef cookbooks for OpenStack HA
Chef cookbooks for OpenStack HA
 
How Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in EuropeHow Criteo is managing one of the largest Kafka Infrastructure in Europe
How Criteo is managing one of the largest Kafka Infrastructure in Europe
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 

Similar a Scalable Persistent Storage for Erlang: Theory and Practice

Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStackTesora
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackMatt Lord
 
A Performance Comparison of Container-based Virtualization Systems for MapRed...
A Performance Comparison of Container-based Virtualization Systems for MapRed...A Performance Comparison of Container-based Virtualization Systems for MapRed...
A Performance Comparison of Container-based Virtualization Systems for MapRed...Miguel Xavier
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]Rainforest QA
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
Orchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresOrchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresDocker, Inc.
 
NAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerNAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerJangseon Ryu
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Big data talk barcelona - jsr - jc
Big data talk   barcelona - jsr - jcBig data talk   barcelona - jsr - jc
Big data talk barcelona - jsr - jcJames Saint-Rossy
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oakMichael Dürig
 

Similar a Scalable Persistent Storage for Erlang: Theory and Practice (20)

Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
MySQL Options in OpenStack
MySQL Options in OpenStackMySQL Options in OpenStack
MySQL Options in OpenStack
 
OpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStackOpenStack Days East -- MySQL Options in OpenStack
OpenStack Days East -- MySQL Options in OpenStack
 
A Performance Comparison of Container-based Virtualization Systems for MapRed...
A Performance Comparison of Container-based Virtualization Systems for MapRed...A Performance Comparison of Container-based Virtualization Systems for MapRed...
A Performance Comparison of Container-based Virtualization Systems for MapRed...
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
Orchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresOrchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failures
 
MySQL on Ceph
MySQL on CephMySQL on Ceph
MySQL on Ceph
 
My SQL on Ceph
My SQL on CephMy SQL on Ceph
My SQL on Ceph
 
NAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerNAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for Container
 
Oss4b - pxc introduction
Oss4b   - pxc introductionOss4b   - pxc introduction
Oss4b - pxc introduction
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Big data talk barcelona - jsr - jc
Big data talk   barcelona - jsr - jcBig data talk   barcelona - jsr - jc
Big data talk barcelona - jsr - jc
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
The architecture of oak
The architecture of oakThe architecture of oak
The architecture of oak
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 

Último

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 

Último (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 

Scalable Persistent Storage for Erlang: Theory and Practice

  • 1. Scalable Persistent Storage for Erlang Theory and Practice Amir Ghaffari Jon Meredith Natalia Chechina , Phil Trinder London Riak Meetup - October 22, 2013 1 http://www.release-project.eu
  • 2. Outline •RELEASE Project •General principles of scalable DBMSs •NoSQL DBMSs for Erlang •Riak 1.1.1 Scalability in Practice •Investigating the scalability of distributed Erlang •Riak Elasticity •Conclusion & Future work 2
  • 3. RELEASE project •RELEASE is an European project aiming to scale Erlang onto commodity architectures with 100,000 cores. 3
  • 4. RELEASE project The RELEASE consortium work at following levels: Virtual machine Language scalable Computation model Scalable In-memory data structures Scalable Persistent data structures Infrastructure levels Profiling and refactoring tools 4
  • 5. General principles of scalable DBMSs Data Fragmentation 1.Decentralized model (e.g. P2P model) 2.Systematic load balancing (make life easier for developer) 3.Location transparency 5 0-2K 2k-4K 4k-6K 16k-18K 18k-20K 20kB e.g. 20k data is fragmented among 10 nodes
  • 6. General principles of scalable DBMSs Replication 1.Decentralized model (e.g. P2P model) 2.Location transparency 3.Asynchronous replication (write is considered complete as soon as on node acknowledges it) 6 X e.g. Key X is replicated on three nodes . . X . . X . . X
  • 7. General principles of scalable DBMSs Consistency Availability Partition Tolerance ACID Systems Eventual Consistency CAP theorem: cannot simultaneously guarantee: •Partition tolerance: system continues to operate despite nodes can't talk to each other •Availability: guarantee that every request receives a response •Consistency: all nodes see the same data at the same time Not achievable because network failures are inevitable 7 Solution: Eventual consistency and reconciling conflicts via data versioning ACID=Atomicity, Consistency, Isolation, Durability
  • 8. NoSQL DBMSs for Erlang Mnesia CouchDB Riak Cassandra Fragmentation •Explicit placement •Client-server •Automatic by using a hash function •Explicit placement •Multi-server •Lounge is not part of each CouchDB node •Implicit placement •Peer to peer •Automatic by using consistent hash technique •Implicit placement •Peer to peer •Automatic by using consistent hash technique Replication •Explicit placement •Client-server •Asynchronous ( Dirty operation) •Explicit placement •Multi-server •Asynchronous •Implicit placement •Peer to peer •Asynchronous •Implicit placement •Peer to peer •Asynchronous Partition Tolerant •Strong consistency •Eventual consistency •Multi-Version Concurrency Control for reconciliation •Eventual consistency •Vector clocks for reconciliation •Eventual consistency •Use timestamp to reconcile Query Processing & Backend Storage •The largest possible Mnesia table is 4Gb •No limitation •Supports Map/Reduce Queries •Bitcask has memory limitation •LevelDB has no limitation •Supports Map/Reduce queries •No limitation •Supports Map/Reduce queries 8
  • 9. Initial Evaluation Results General Principles Initial Evaluation •Mnesia •CouchDB •Riak •Cassendra Scalable persistent storage for SD Erlang can be provided by Dynamo-style DBMSs such as Riak,Cassandra 9
  • 10. Riak Scalability in Practice •Basho Bench: a benchmarking tool for Riak •We measure Basho Bench on 348-node Kalkyl cluster •Scalability: How does adding more Riak nodes affect the throughput? •There are two kinds of nodes in a cluster: •Traffic generators •Riak nodes 10
  • 11. Node Organisation 11 Heuristic: one traffic generator per 3 Riak nodes
  • 13. Riak 1.1.1 Scalability Benchmark on 100-node cluster (800 cores) 13
  • 15. Profiling Resource Usage 15 CPU Usage
  • 16. Profiling Resource Usage 16 DISK Usage
  • 17. Profiling Resource Usage 17 Memory Usage
  • 18. Profiling Resource Usage 18 Network Traffic of Generator Nodes
  • 19. Profiling Resource Usage 19 Network Traffic of Riak Nodes
  • 20. Bottleneck for Riak Scalability CPU, RAM, Disk, and Network profiling reveal that they can't be bottleneck for Riak scalability. Is the Riak scalability limits due to limits in distributed Erlang? To find out, let’s measure the scalability of distributed Erlang. 20
  • 21. DE-Bench 21 •DE-Bench: a benchmarking tool for distributed Erlang •It is based on Basho Bench •Measures the throughput of a cluster of Erlang nodes •Records the latency of distributed Erlang commands individually
  • 22. Distributed Erlang Commands 22 • Spawn/RPC: peer to peer commands • register_name : global name tables located on every node • unregister_name : global name tables located on every node • whereis_name : a lookup in the local table Register Unregister Erlang VM Erlang VM Erlang VM Erlang VM Global name table Global name table Global name table Global name table
  • 23. DE-Bench’s P2P Design 23 Physical host 1 Physical host 2
  • 24. Frequency of Global Operation 24 Frequently Max Throughput 1% 30 nodes 0.5% 50 nodes 0.33% 70 nodes 0% 1600 nodes Global Operations limit the scalability of distributed Erlang
  • 25. Riak Software Scalability •Monitoring global.erl module from OTP library shows that Riak does NOT use any global operation. •Instrumenting gen_server.erl module reveals that:  Of the 15 most time-consuming operations, only the time of rpc:call grows with cluster size. Moreover, of the five Riak RPC calls, only start_put_fsm function from module riak_kv_put_fsm_sup grows with cluster size. 25
  • 26. Eliminating the Bottlenecks •Independently, Basho identified that two supervisor processes, i.e. riak_kv_get/put_fsm_sup, become bottleneck under heavy load, exhibiting build up in message queue length. •To improve the Riak scalability in version 1.3 and 1.4 Basho applied a number of techniques and introduced new library sidejob (https://github.com/basho/sidejob). 26
  • 27. Riak1.1.1 Elasticity Time-line shows Riak cluster losing and gaining nodes 27
  • 28. Riak1.1.1 Elasticity How Riak cluster deals with nodes leaving and joining 28
  • 29. Observation • Number of failures (37) • Number of successful operations (approximately 3.41 million) • When failed nodes come back up, the throughput has grown that shows Riak1.1.1 has a good elasticity. 29
  • 30. Conclusion and Future work  Our benchmark confirms that Riak has a good elasticity.  We establish for the first time scientifically the scalability limit of Riak 1.1.1 as 60 nodes.  We have shown how global operations limits the scalability of distributed Erlang. Riak scalability bottelnecks are eliminated in Riak versions 1.3 and upcoming versions.  In RELEASE, we are working to scale up distributed Erlang by grouping nodes in smaller partitions. 30
  • 31. References  Benchmarking Riak https://github.com/amirghaffari/benchmark_riak Basho Bench http://docs.basho.com/riak/latest/ops/building/benchmarking/  DE-Bench https://github.com/amirghaffari/DEbench  A. Ghaffari, N. Chechina, P. Trinder, and J. Meredith. Scalable Persistent Storage for Erlang: Theory and Practice. In Proceedings of the Twelfth ACM SIGPLAN Workshop on Erlang, pages 73-74, September 2013. ACM Press.  Clusters at UPPMAX http://www.uppmax.uu.se/hardware  Sidejob https://github.com/basho/sidejob 31