SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
In-Memory Data Grid
Ampool
- Girish Verma
- Chinmay Kulkarni
Latency
- Why do we care about it ?
- because Amazon, Google and other financial firms care about
it :)
- Google: 500ms == 20% traffic drop
- Citi: 100ms == $1M
- How we reduce it ?
- reduce data access time
- Cache
- Redis, Memcached
Redis
- Master - slave configuration
- slaves are just redundant copies
- Mesh topology with TCP connections between nodes
- How client reads the data ?
Memcached
- There is nothing like memcached cluster
- Everything needs to be managed by client
In-Memory Data Grid
- Sophisticated In-memory data store
- Low latency Reads and Writes
- Partitioning and Replication
- Highly Scalable and Available
- Work with your existing data store
Ampool
- Operational Analytics
- Store, Analyse and Serve your data from same place
- Active Data Store between compute and long-term storage
- Benefits
*Reference - http://docs.ampool-inc.com/adocs/core/index.html
Ampool
Ampool Architecture
- Based on Apache Geode
- Topology : Client -> Locator(s) -> Servers
- Data Partitioning and Replication
- Recoverability
Ampool Vs Others
- In-memory Data Grids (GridGain, Hazelcast)
- Designed for low latency, No or embedded analytics, Limited
persistence options
- In-memory File Systems (Alluxio)
- FS Interface with high serialization overhead, No low-latency
workloads
- In-memory Databases (MemSql, SAP-Hana)
- Vertically integrated, designed for transactions, proprietary and
expensive, Local persistence only
Demo time !!!
Old Query MySQL + query using mysql shell More than 1 hour
(may be machine
issue)
MySQL + Spark OOM error (can’t be
done on my
machine :P)
Ampool + Spark (1 node each) 17 mins
New Query MySQL + Spark 6 mins
Ampool + Spark 1 node each 5 mins
2 nodes each
(6 cores per spark
executor)
1.4 mins
2 nodes each
(8 cores per spark
executor)
1.2 mins
Thank you
Extra Info
REDIS IMDGs(Ampool/Gemfire/Geode)
No SQL Support SQL Support (Ampool)
Master Slave architecture Peer-to-Peer based configuration.
No member discovery service, managing
slaves a bit difficult and not possible to
bring up a crashed slave.
Inbuilt member discovery service
(Locators).
Single threaded Multi Threaded. Configurable
Application-level sharding Auto-sharding. Auto rebalancing
Application must know which node has the
data and which node to send request to
Application unware about the partitioning.
Query automatically routed to the node
where data resides
Based on Redis Virtual Memory
subsystem. Stores Redis objects
JVM based.
Redis and in-memory data grids are pretty different animals. I would characterize IMDG's like Geode to
be concurrent write intensive, and have flexible data models. It also scales out better than Redis in a
more automated fashion.
Redis is a great read-intensive cache. It also has a powerful data model, but you have to use their data
models. Example: If you want to run calculations on lists or sets, they have powerful operations you can
call.
IMDG's such as Geode were built with the rise of automated trading in the finance industry.
https://news.ycombinator.com/item?id=10596859
http://vschart.com/compare/memcached/vs/gemfire
http://www.infoworld.com/article/3063161/application-development/why-redis-beats-memcache
d-for-caching.html
If avoiding disk I/O is the goal, why not achieve that through database caching?
Caching is the process whereby on-disk databases keep frequently-accessed records in memory, for
faster access. However, caching only speeds up retrieval of information, or “database reads.” Any
database write – that is, an update to a record or creation of a new record – must still be written through
the cache, to disk. So, the performance benefit only applies to a subset of database tasks. In addition,
managing the cache is itself a process that requires substantial memory and CPU resources, so even a
“cache hit” underperforms an in-memory database.
http://www.mcobject.com/in_memory_database
http://www.slideshare.net/MaxAlexejev/from-distributed-caches-to-inmemory-data-grids
https://spiegela.com/2014/04/30/but-i-need-a-database-that-scales-part-2/
Distributed - in memory cache
● Group membership and failure detection
● Consistent hashing to distribute data across cluster of nodes.
● Fault tolerant
●
Comparisons
Data - ~1 GB / ~15 million records
Local - 1 ampool server 1 spark node with 4 threads -> 10 mins
Local - 1 ampool server 1 spark node with 2 threads
AWS - 1 ampool server 1 spark node with 4 threads
AWS - 2 ampool server 2 spark node
Rewrite sql query with Spark
AWS - change spark version and try with parquet data file
Ampool cluster
With No redundant copies for table:
Initial cluster members - locator, server1
-> Stop server1 - no queries can be served
Restart server1 everything works
-> Start server2 now and stop server1 - no queries can be served
Data distribution doesn’t happen automatically
Ampool cluster
With redundant copies for table set to 1:
Initial cluster members - locator, server1, server2
Load data - which will get distributed to both the servers
-> Stop on of the server - everything works fine
-> Stop both servers - no queries can be served
Start one of the servers - still no queries can be served
Start both servers - everything works fine
Ampool cluster
With redundant copies for table set to 1:
Initial cluster members - locator, server1 - Load data - the data will be on only one server
-> start a new server - server2 and stop server1
Queries work :) - because data is replicated to server2 when it started
-> start server1 and stop server2
Queries still work - same reason as above
-> stop both servers and start one of them
Queries work :)
-> When only server1 is up add data to it
Start server2 and Stop server1

Más contenido relacionado

La actualidad más candente

Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Ashnikbiz
 
Databases on aws part 2
Databases on aws   part 2Databases on aws   part 2
Databases on aws part 2Parag Patil
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0Sigmoid
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with ScyllaScyllaDB
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSqlOmid Vahdaty
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...
PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...
PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...Equnix Business Solutions
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLliuknag
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use caseDavin Abraham
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...In-Memory Computing Summit
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBSJared Rosoff
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)Alexander Kukushkin
 

La actualidad más candente (20)

Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
Zarafa SummerCamp 2012 - Tips & tricks for running Zarafa is larger scale env...
 
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
Countdown to PostgreSQL v9.5 - Foriegn Tables can be part of Inheritance Tree
 
Databases on aws part 2
Databases on aws   part 2Databases on aws   part 2
Databases on aws part 2
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0SORT & JOIN IN SPARK 2.0
SORT & JOIN IN SPARK 2.0
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
Introduction to NoSql
Introduction to NoSqlIntroduction to NoSql
Introduction to NoSql
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
re:dash is awesome
re:dash is awesomere:dash is awesome
re:dash is awesome
 
PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...
PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...
PGConf.ASIA 2019 Bali - PostgreSQL Database Migration and Maintenance - Koich...
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)
 

Destacado

What is and what goes wrong with the Prostate
What is and what goes wrong with the ProstateWhat is and what goes wrong with the Prostate
What is and what goes wrong with the ProstateEuropa Uomo EPAD
 
Efficacy of bed bug control methods and techniques
Efficacy of bed bug control methods and techniquesEfficacy of bed bug control methods and techniques
Efficacy of bed bug control methods and techniquesWendy Wen
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache GeodePivotalOpenSourceHub
 
Yole Emerging Non-Volatile Memory - 2016 Report by Yole Developpement
Yole Emerging Non-Volatile Memory - 2016 Report by Yole DeveloppementYole Emerging Non-Volatile Memory - 2016 Report by Yole Developpement
Yole Emerging Non-Volatile Memory - 2016 Report by Yole DeveloppementYole Developpement
 
いろいろ引き出し作って見ました
いろいろ引き出し作って見ましたいろいろ引き出し作って見ました
いろいろ引き出し作って見ましたMutsumi IWAISHI
 
Mayur_PB_Developer
Mayur_PB_DeveloperMayur_PB_Developer
Mayur_PB_DeveloperMayur Prabhu
 

Destacado (8)

Coal
CoalCoal
Coal
 
What is and what goes wrong with the Prostate
What is and what goes wrong with the ProstateWhat is and what goes wrong with the Prostate
What is and what goes wrong with the Prostate
 
Efficacy of bed bug control methods and techniques
Efficacy of bed bug control methods and techniquesEfficacy of bed bug control methods and techniques
Efficacy of bed bug control methods and techniques
 
ROUNDTABLE 2016: STEEVER
ROUNDTABLE 2016: STEEVERROUNDTABLE 2016: STEEVER
ROUNDTABLE 2016: STEEVER
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
 
Yole Emerging Non-Volatile Memory - 2016 Report by Yole Developpement
Yole Emerging Non-Volatile Memory - 2016 Report by Yole DeveloppementYole Emerging Non-Volatile Memory - 2016 Report by Yole Developpement
Yole Emerging Non-Volatile Memory - 2016 Report by Yole Developpement
 
いろいろ引き出し作って見ました
いろいろ引き出し作って見ましたいろいろ引き出し作って見ました
いろいろ引き出し作って見ました
 
Mayur_PB_Developer
Mayur_PB_DeveloperMayur_PB_Developer
Mayur_PB_Developer
 

Similar a In-Memory Data Grids - Ampool (1)

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsVoltDB
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
Zing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value DatabaseZing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value Databasezingopen
 
Zing Database
Zing Database Zing Database
Zing Database Long Dao
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...DataStax Academy
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...Lviv Startup Club
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...In-Memory Computing Summit
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 

Similar a In-Memory Data Grids - Ampool (1) (20)

AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
Severalnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IXSeveralnines Training: MySQL® Cluster - Part IX
Severalnines Training: MySQL® Cluster - Part IX
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Zing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value DatabaseZing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value Database
 
Zing Database
Zing Database Zing Database
Zing Database
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...Yaroslav Nedashkovsky  "How to manage hundreds of pipelines for processing da...
Yaroslav Nedashkovsky "How to manage hundreds of pipelines for processing da...
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
IMC Summit 2016 Breakout - William Bain - Implementing Extensible Data Struct...
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 

In-Memory Data Grids - Ampool (1)

  • 1. In-Memory Data Grid Ampool - Girish Verma - Chinmay Kulkarni
  • 2. Latency - Why do we care about it ? - because Amazon, Google and other financial firms care about it :) - Google: 500ms == 20% traffic drop - Citi: 100ms == $1M - How we reduce it ? - reduce data access time - Cache - Redis, Memcached
  • 3. Redis - Master - slave configuration - slaves are just redundant copies - Mesh topology with TCP connections between nodes - How client reads the data ?
  • 4. Memcached - There is nothing like memcached cluster - Everything needs to be managed by client
  • 5. In-Memory Data Grid - Sophisticated In-memory data store - Low latency Reads and Writes - Partitioning and Replication - Highly Scalable and Available - Work with your existing data store
  • 6. Ampool - Operational Analytics - Store, Analyse and Serve your data from same place - Active Data Store between compute and long-term storage - Benefits *Reference - http://docs.ampool-inc.com/adocs/core/index.html
  • 8. Ampool Architecture - Based on Apache Geode - Topology : Client -> Locator(s) -> Servers - Data Partitioning and Replication - Recoverability
  • 9.
  • 10.
  • 11. Ampool Vs Others - In-memory Data Grids (GridGain, Hazelcast) - Designed for low latency, No or embedded analytics, Limited persistence options - In-memory File Systems (Alluxio) - FS Interface with high serialization overhead, No low-latency workloads - In-memory Databases (MemSql, SAP-Hana) - Vertically integrated, designed for transactions, proprietary and expensive, Local persistence only
  • 13.
  • 14. Old Query MySQL + query using mysql shell More than 1 hour (may be machine issue) MySQL + Spark OOM error (can’t be done on my machine :P) Ampool + Spark (1 node each) 17 mins New Query MySQL + Spark 6 mins Ampool + Spark 1 node each 5 mins 2 nodes each (6 cores per spark executor) 1.4 mins 2 nodes each (8 cores per spark executor) 1.2 mins
  • 17.
  • 18.
  • 19. REDIS IMDGs(Ampool/Gemfire/Geode) No SQL Support SQL Support (Ampool) Master Slave architecture Peer-to-Peer based configuration. No member discovery service, managing slaves a bit difficult and not possible to bring up a crashed slave. Inbuilt member discovery service (Locators). Single threaded Multi Threaded. Configurable Application-level sharding Auto-sharding. Auto rebalancing Application must know which node has the data and which node to send request to Application unware about the partitioning. Query automatically routed to the node where data resides Based on Redis Virtual Memory subsystem. Stores Redis objects JVM based.
  • 20. Redis and in-memory data grids are pretty different animals. I would characterize IMDG's like Geode to be concurrent write intensive, and have flexible data models. It also scales out better than Redis in a more automated fashion. Redis is a great read-intensive cache. It also has a powerful data model, but you have to use their data models. Example: If you want to run calculations on lists or sets, they have powerful operations you can call. IMDG's such as Geode were built with the rise of automated trading in the finance industry. https://news.ycombinator.com/item?id=10596859 http://vschart.com/compare/memcached/vs/gemfire http://www.infoworld.com/article/3063161/application-development/why-redis-beats-memcache d-for-caching.html
  • 21. If avoiding disk I/O is the goal, why not achieve that through database caching? Caching is the process whereby on-disk databases keep frequently-accessed records in memory, for faster access. However, caching only speeds up retrieval of information, or “database reads.” Any database write – that is, an update to a record or creation of a new record – must still be written through the cache, to disk. So, the performance benefit only applies to a subset of database tasks. In addition, managing the cache is itself a process that requires substantial memory and CPU resources, so even a “cache hit” underperforms an in-memory database. http://www.mcobject.com/in_memory_database http://www.slideshare.net/MaxAlexejev/from-distributed-caches-to-inmemory-data-grids https://spiegela.com/2014/04/30/but-i-need-a-database-that-scales-part-2/
  • 22. Distributed - in memory cache ● Group membership and failure detection ● Consistent hashing to distribute data across cluster of nodes. ● Fault tolerant ●
  • 23. Comparisons Data - ~1 GB / ~15 million records Local - 1 ampool server 1 spark node with 4 threads -> 10 mins Local - 1 ampool server 1 spark node with 2 threads AWS - 1 ampool server 1 spark node with 4 threads AWS - 2 ampool server 2 spark node Rewrite sql query with Spark AWS - change spark version and try with parquet data file
  • 24. Ampool cluster With No redundant copies for table: Initial cluster members - locator, server1 -> Stop server1 - no queries can be served Restart server1 everything works -> Start server2 now and stop server1 - no queries can be served Data distribution doesn’t happen automatically
  • 25. Ampool cluster With redundant copies for table set to 1: Initial cluster members - locator, server1, server2 Load data - which will get distributed to both the servers -> Stop on of the server - everything works fine -> Stop both servers - no queries can be served Start one of the servers - still no queries can be served Start both servers - everything works fine
  • 26. Ampool cluster With redundant copies for table set to 1: Initial cluster members - locator, server1 - Load data - the data will be on only one server -> start a new server - server2 and stop server1 Queries work :) - because data is replicated to server2 when it started -> start server1 and stop server2 Queries still work - same reason as above -> stop both servers and start one of them Queries work :) -> When only server1 is up add data to it Start server2 and Stop server1