SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Non-Stop Hadoop 
Enterprise Ready Hadoop 
Presentation for Big Data Meetup 
October 8, 2014
REALIZING THE POSSIBILITIES OF BIG DATA 2 
WWW.WANDISCO.COM 
WANdisco Background 
• WANdisco: Wide Area Network Distributed Computing 
– Enterprise ready, high availability software solutions that enable globally distributed 
organizations to meet today’s data challenges of secure storage, scalability and availability 
• Leader in tools for software engineers – Subversion 
– Apache Software Foundation sponsor 
• Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) 
• US patented active-active replication technology granted, November 2012 
• Global locations 
– San Ramon (CA) 
– Chengdu (China) 
– Tokyo (Japan) 
– Boston (MA) 
– Sheffield (UK) 
– Belfast (UK)
REALIZING THE POSSIBILITIES OF BIG DATA 3 
WWW.WANDISCO.COM 
Customers
REALIZING THE POSSIBILITIES OF BIG DATA 4 
WWW.WANDISCO.COM 
Non-Stop Hadoop 
Non-Intrusive Plugin 
Provides Continuous Availability 
In the LAN / Across the WAN 
Active/Active
REALIZING THE POSSIBILITIES OF BIG DATA 5 
WWW.WANDISCO.COM 
3 Key Problems For Multi Cluster Hadoop 
LAN / WAN
REALIZING THE POSSIBILITIES OF BIG DATA 6 
WWW.WANDISCO.COM 
Enterprise Ready Hadoop 
Characteristics of Mission Critical Applications 
• Require 100% Uptime of Hadoop 
– SLA’s, Regulatory Compliance 
• Require HDFS to be Deployed Globally 
– Share Data Between Data Centers 
– Data is Consistent and Not Eventual 
• Ease Administrative Burden 
– Reduce Operational Complexity 
– Simplify Disaster Recovery 
– Lower RTO/RPO 
• Allow Maximum Utilization of Resource 
– Within the Data Center 
– Across Data Centers
Breaking Away from Active/Passive 
What’s in a NameNode 
REALIZING THE POSSIBILITIES OF BIG DATA 7 
WWW.WANDISCO.COM 
Single Standby 
• Inefficient utilization of resource 
– Journal Nodes 
– ZooKeeper Nodes 
– Standby Node 
• Performance Bottleneck 
• Still tied to the beeper 
• Limited to LAN scope 
Active / Active 
• All resources utilized 
– Only NameNode configuration 
– Scale as the cluster grows 
– All NameNodes active 
• Load balancing 
• Set resiliency (# of active NN) 
• Global Consistency
Breaking Away from Active/Passive 
What’s in a Data Center 
REALIZING THE POSSIBILITIES OF BIG DATA 8 
WWW.WANDISCO.COM 
Standby Datacenter 
• Idle Resource 
– Single Data Center Ingest 
– Disaster Recovery Only 
• One way synchronization 
– DistCp 
• Error Prone 
– Clusters can diverge over time 
• Difficult to scale > 2 Data Centers 
– Complexity of sharing data 
increases 
Active / Active 
• DR Resource Available 
– Ingest at all Data Centers 
– Run Jobs in both Data Centers 
• Replication is Multi-Directional 
– active/active 
• Absolute Consistency 
– Single HDFS spans locations 
• ‘N’ Data Center support 
– Global HDFS allows appropriate 
data to be shared
REALIZING THE POSSIBILITIES OF BIG DATA 9 
WWW.WANDISCO.COM 
One Cluster Approach 
• Example 
Applications 
– HBASE 
– RT Query 
– Map Reduce 
• Poor Resource 
Management 
– Data Locality Issues 
– Network Use 
– Complex 
Multiple Clusters
REALIZING THE POSSIBILITIES OF BIG DATA 10 
WWW.WANDISCO.COM 
Creating Multiple Clusters 
• Example 
Applications 
– HBASE 
– RT Query 
– Map Reduce 
• Need to share data 
between clusters 
– DistCp / Stale Data 
– Inefficient use of 
storage and or 
network 
– Some clusters may 
not be available 
Multiple Clusters
REALIZING THE POSSIBILITIES OF BIG DATA 11 
WWW.WANDISCO.COM 
Cluster Zones 
Zoning for Optimal Efficiency 
1 
HDFS 
100% 
Consistency
Absolute 
Consistency 
Maximum 
Resource 
Use 
Lower 
Recovery 
Time/Point 
REALIZING THE POSSIBILITIES OF BIG DATA 12 
WWW.WANDISCO.COM 
Multi Datacenter Hadoop 
Disaster Recovery 
WAN 
REPLICATION 
Replicate 
Only 
What 
You 
Want 
BeCer 
UFlizaFon 
of 
Power/Cooling 
Lower 
TCO 
LAN 
Speed 
Performance
Technical Overview 
Hadoop Powered by WANdisco
Multi Data Center Hadoop Today 
What's wrong with the status quo 
REALIZING THE POSSIBILITIES OF BIG DATA 14 
WWW.WANDISCO.COM 
Periodic Synchronization 
DistCp 
Parallel Data Ingest 
Load Balancer, Streaming
Multi Data Center Hadoop Today 
Hacks currently in use 
REALIZING THE POSSIBILITIES OF BIG DATA 15 
WWW.WANDISCO.COM 
Periodic Synchronization 
DistCp 
• Runs as Map reduce 
• DR Data Center is read only 
• Over time, Hadoop clusters 
become inconsistent 
• Manual and labor intensive 
process to reconcile differences 
• Inefficient use of the network
Multi Data Center Hadoop Today 
Hacks currently in use 
REALIZING THE POSSIBILITIES OF BIG DATA 16 
WWW.WANDISCO.COM 
Parallel Data Ingest 
Load Balancer, Flume 
• Hiccups in either of the Hadoop 
cluster causes the two file 
systems to diverge 
• Potential to run out of buffer when 
WAN is down 
• Requires constant attention and 
sys-admin hours to keep running 
• Data created on the cluster is not 
replicated 
• Use of streaming technologies 
(like flume) for data redirection are 
only for streaming
PAXOS 
Paxos is a family of protocols for solving consensus in a network of 
unreliable processors. 
Consensus is the process of agreeing on one result among a group of 
participants. 
This problem becomes difficult when the participants or their 
communication medium may experience failures. 
REALIZING THE POSSIBILITIES OF BIG DATA 17 
WWW.WANDISCO.COM 
DConE 
Distributed Coordination Engine 
• WANdisco’s patented WAN capable paxos implementation 
– Mathematically proven 
– Provides distributed co-ordination of File system metadata 
• Active/Active (All locations) 
• Create, Modify, Delete 
• Shared nothing (No Leader) 
• No restrictions on distance between datacenters 
– US Patent granted for time independent implementation of Paxos 
• Not based on SAN block device synchronization such as EMC SRDF 
– SAN block replication has distance limits resulting from the inability of file systems 
such as NTFS and ext4 to tolerate long RTTs to block storage 
– Possible distribution of corrupted blocks
How DConE Works 
WANdisco Active/Active Replication 
REALIZING THE POSSIBILITIES OF BIG DATA 18 
WWW.WANDISCO.COM 
• Majority Quorum 
– A fixed number of participants 
– The Majority must agree for change 
• Failure 
– Failed nodes are unavailable 
– Normal operation continue on nodes 
with quorum 
• Recovery / Self Healing 
– Nodes that rejoin stay in safe mode 
until they are caught up 
• Disaster Recovery 
– A complete loss can be brought back 
from another replica 
TX 
id: 
168 
TX 
id: 
169 
TX 
id: 
TX 
id: 
171 
TX 
id: 
172 
TX 
id: 
173 
TX 
id: 
168 
TX 
id: 
169 
TX 
id: 
TX 
id: 
171 
TX 
id: 
172 
TX 
id: 
173 
TX 
id: 
168 
TX 
id: 
169 
TX 
id: 
TX 
id: 
171 
TX 
id: 
172 
TX 
id: 
173 
Proposal 
170 
Agree 
170 
Agree 
170 
Proposal 
171 
Agree 
172 
Agree 
173 
Agree 
Proposal 
172 
Proposal 
173 
B 
A 
C 
Agree 
170 
Agree 
Agree 
Agree 
173
REALIZING THE POSSIBILITIES OF BIG DATA 19 
WWW.WANDISCO.COM 
Architecture of a Non-Stop Hadoop
REALIZING THE POSSIBILITIES OF BIG DATA 20 
WWW.WANDISCO.COM 
Use Cases 
• Eliminate The Performance Bottleneck of a Single Active NameNode 
• Multi Data-Center Ingest 
– Information doesn't need to be sent to one DC and then copied back to the other using DistCP 
– Parallel ingest methods don’t require redirected data streams 
– Ingest data at, or close to the source 
– Global Analysis (Logs, Click Streams, etc…) 
• Cluster Zones 
– Efficient use of resource based on application profile 
– HBASE, IMPALA, Storm, Map Reduce, SPARK, etc… 
– Heterogeneous Clusters Supported 
• Maximize Data Center Resource Utilization 
– All datacenters can be used to run different jobs concurrently 
• Disaster Recovery 
– Data is as current as possible (no periodic synchs) 
– Virtually zero downtime to recover from regional data center failure 
– Regulatory compliance
Use Case: Heterogeneous Hardware 
REALIZING THE POSSIBILITIES OF BIG DATA 21 
WWW.WANDISCO.COM 
• Optimized hardware profiles 
for job specific tasks 
– Batch 
– Real-time 
– NoSQL (HBASE) 
• Set replication factors per 
sub-cluster 
• Use at LAN or WAN scope 
• Resilient to NameNode 
failures
Use Case: Sub-Clusters 
REALIZING THE POSSIBILITIES OF BIG DATA 22 
WWW.WANDISCO.COM 
• Maximize Resource Utilization 
– No idle standby 
• Isolate Dev and Test Clusters 
– Share data not resource 
• Carve off hardware for a specific 
group 
– Prevents a bad map/reduce job from 
bringing down the cluster 
• Guarantee Consistency and 
availability of data 
– Data is instantly available
REALIZING THE POSSIBILITIES OF BIG DATA 23 
WWW.WANDISCO.COM 
Non-Stop Hadoop Demonstration
Question and Answer 
Feel free to submit your questions 
REALIZING THE POSSIBILITIES OF BIG DATA 24 
WWW.WANDISCO.COM 
Q & A
REALIZING THE POSSIBILITIES OF BIG DATA 25 
WWW.WANDISCO.COM 
Thank you

Más contenido relacionado

La actualidad más candente

Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceChris Nauroth
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentDataWorks Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadooplarsgeorge
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableDisaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableStefan Kupstaitis-Dunkler
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Michael Arnold
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopGwen (Chen) Shapira
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 

La actualidad más candente (20)

HDFS tiered storage
HDFS tiered storageHDFS tiered storage
HDFS tiered storage
 
Storage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduceStorage and-compute-hdfs-map reduce
Storage and-compute-hdfs-map reduce
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake Environment
 
Big Data Platform Industrialization
Big Data Platform Industrialization Big Data Platform Industrialization
Big Data Platform Industrialization
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in HadoopBackup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the ImprobableDisaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
Disaster Recovery in the Hadoop Ecosystem: Preparing for the Improbable
 
HBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 MinutesHBaseCon 2013: 1500 JIRAs in 20 Minutes
HBaseCon 2013: 1500 JIRAs in 20 Minutes
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)
 
Data Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for HadoopData Wrangling and Oracle Connectors for Hadoop
Data Wrangling and Oracle Connectors for Hadoop
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 

Similar a SD Big Data Monthly Meetup #4 - Session 2 - WANDisco

Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmDataWorks Summit
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2hdhappy001
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Red_Hat_Storage
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoopMohit Tare
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Chris Nauroth
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...DataWorks Summit
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoopChiou-Nan Chen
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 

Similar a SD Big Data Monthly Meetup #4 - Session 2 - WANDisco (20)

Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos AlgorithmSolving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2Nicholas:hdfs what is new in hadoop 2
Nicholas:hdfs what is new in hadoop 2
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
HDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and SupportabilityHDFS: Optimization, Stabilization and Supportability
HDFS: Optimization, Stabilization and Supportability
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
Startup Case Study: Leveraging the Broad Hadoop Ecosystem to Develop World-Fi...
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 

Más de Big Data Joe™ Rossi

Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionBig Data Joe™ Rossi
 
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakBig Data Joe™ Rossi
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Big Data Joe™ Rossi
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleBig Data Joe™ Rossi
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicBig Data Joe™ Rossi
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Big Data Joe™ Rossi
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Big Data Joe™ Rossi
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Big Data Joe™ Rossi
 

Más de Big Data Joe™ Rossi (11)

Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA EditionHadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
 
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/RiakOC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
OC Big Data Monthly Meetup #6 - Session 2 - Basho/Riak
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
Hadoop: Past, Present and Future - v2.1 - SQLSaturday #340
 
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - AltiscaleOC Big Data Monthly Meetup #5 - Session 1 - Altiscale
OC Big Data Monthly Meetup #5 - Session 1 - Altiscale
 
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo LogicOC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
OC Big Data Monthly Meetup #5 - Session 2 - Sumo Logic
 
Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0Hadoop - Past, Present and Future - v2.0
Hadoop - Past, Present and Future - v2.0
 
Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2Hadoop - Past, Present and Future - v1.2
Hadoop - Past, Present and Future - v1.2
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 

Último

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

SD Big Data Monthly Meetup #4 - Session 2 - WANDisco

  • 1. Non-Stop Hadoop Enterprise Ready Hadoop Presentation for Big Data Meetup October 8, 2014
  • 2. REALIZING THE POSSIBILITIES OF BIG DATA 2 WWW.WANDISCO.COM WANdisco Background • WANdisco: Wide Area Network Distributed Computing – Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability • Leader in tools for software engineers – Subversion – Apache Software Foundation sponsor • Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) • US patented active-active replication technology granted, November 2012 • Global locations – San Ramon (CA) – Chengdu (China) – Tokyo (Japan) – Boston (MA) – Sheffield (UK) – Belfast (UK)
  • 3. REALIZING THE POSSIBILITIES OF BIG DATA 3 WWW.WANDISCO.COM Customers
  • 4. REALIZING THE POSSIBILITIES OF BIG DATA 4 WWW.WANDISCO.COM Non-Stop Hadoop Non-Intrusive Plugin Provides Continuous Availability In the LAN / Across the WAN Active/Active
  • 5. REALIZING THE POSSIBILITIES OF BIG DATA 5 WWW.WANDISCO.COM 3 Key Problems For Multi Cluster Hadoop LAN / WAN
  • 6. REALIZING THE POSSIBILITIES OF BIG DATA 6 WWW.WANDISCO.COM Enterprise Ready Hadoop Characteristics of Mission Critical Applications • Require 100% Uptime of Hadoop – SLA’s, Regulatory Compliance • Require HDFS to be Deployed Globally – Share Data Between Data Centers – Data is Consistent and Not Eventual • Ease Administrative Burden – Reduce Operational Complexity – Simplify Disaster Recovery – Lower RTO/RPO • Allow Maximum Utilization of Resource – Within the Data Center – Across Data Centers
  • 7. Breaking Away from Active/Passive What’s in a NameNode REALIZING THE POSSIBILITIES OF BIG DATA 7 WWW.WANDISCO.COM Single Standby • Inefficient utilization of resource – Journal Nodes – ZooKeeper Nodes – Standby Node • Performance Bottleneck • Still tied to the beeper • Limited to LAN scope Active / Active • All resources utilized – Only NameNode configuration – Scale as the cluster grows – All NameNodes active • Load balancing • Set resiliency (# of active NN) • Global Consistency
  • 8. Breaking Away from Active/Passive What’s in a Data Center REALIZING THE POSSIBILITIES OF BIG DATA 8 WWW.WANDISCO.COM Standby Datacenter • Idle Resource – Single Data Center Ingest – Disaster Recovery Only • One way synchronization – DistCp • Error Prone – Clusters can diverge over time • Difficult to scale > 2 Data Centers – Complexity of sharing data increases Active / Active • DR Resource Available – Ingest at all Data Centers – Run Jobs in both Data Centers • Replication is Multi-Directional – active/active • Absolute Consistency – Single HDFS spans locations • ‘N’ Data Center support – Global HDFS allows appropriate data to be shared
  • 9. REALIZING THE POSSIBILITIES OF BIG DATA 9 WWW.WANDISCO.COM One Cluster Approach • Example Applications – HBASE – RT Query – Map Reduce • Poor Resource Management – Data Locality Issues – Network Use – Complex Multiple Clusters
  • 10. REALIZING THE POSSIBILITIES OF BIG DATA 10 WWW.WANDISCO.COM Creating Multiple Clusters • Example Applications – HBASE – RT Query – Map Reduce • Need to share data between clusters – DistCp / Stale Data – Inefficient use of storage and or network – Some clusters may not be available Multiple Clusters
  • 11. REALIZING THE POSSIBILITIES OF BIG DATA 11 WWW.WANDISCO.COM Cluster Zones Zoning for Optimal Efficiency 1 HDFS 100% Consistency
  • 12. Absolute Consistency Maximum Resource Use Lower Recovery Time/Point REALIZING THE POSSIBILITIES OF BIG DATA 12 WWW.WANDISCO.COM Multi Datacenter Hadoop Disaster Recovery WAN REPLICATION Replicate Only What You Want BeCer UFlizaFon of Power/Cooling Lower TCO LAN Speed Performance
  • 13. Technical Overview Hadoop Powered by WANdisco
  • 14. Multi Data Center Hadoop Today What's wrong with the status quo REALIZING THE POSSIBILITIES OF BIG DATA 14 WWW.WANDISCO.COM Periodic Synchronization DistCp Parallel Data Ingest Load Balancer, Streaming
  • 15. Multi Data Center Hadoop Today Hacks currently in use REALIZING THE POSSIBILITIES OF BIG DATA 15 WWW.WANDISCO.COM Periodic Synchronization DistCp • Runs as Map reduce • DR Data Center is read only • Over time, Hadoop clusters become inconsistent • Manual and labor intensive process to reconcile differences • Inefficient use of the network
  • 16. Multi Data Center Hadoop Today Hacks currently in use REALIZING THE POSSIBILITIES OF BIG DATA 16 WWW.WANDISCO.COM Parallel Data Ingest Load Balancer, Flume • Hiccups in either of the Hadoop cluster causes the two file systems to diverge • Potential to run out of buffer when WAN is down • Requires constant attention and sys-admin hours to keep running • Data created on the cluster is not replicated • Use of streaming technologies (like flume) for data redirection are only for streaming
  • 17. PAXOS Paxos is a family of protocols for solving consensus in a network of unreliable processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures. REALIZING THE POSSIBILITIES OF BIG DATA 17 WWW.WANDISCO.COM DConE Distributed Coordination Engine • WANdisco’s patented WAN capable paxos implementation – Mathematically proven – Provides distributed co-ordination of File system metadata • Active/Active (All locations) • Create, Modify, Delete • Shared nothing (No Leader) • No restrictions on distance between datacenters – US Patent granted for time independent implementation of Paxos • Not based on SAN block device synchronization such as EMC SRDF – SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage – Possible distribution of corrupted blocks
  • 18. How DConE Works WANdisco Active/Active Replication REALIZING THE POSSIBILITIES OF BIG DATA 18 WWW.WANDISCO.COM • Majority Quorum – A fixed number of participants – The Majority must agree for change • Failure – Failed nodes are unavailable – Normal operation continue on nodes with quorum • Recovery / Self Healing – Nodes that rejoin stay in safe mode until they are caught up • Disaster Recovery – A complete loss can be brought back from another replica TX id: 168 TX id: 169 TX id: TX id: 171 TX id: 172 TX id: 173 TX id: 168 TX id: 169 TX id: TX id: 171 TX id: 172 TX id: 173 TX id: 168 TX id: 169 TX id: TX id: 171 TX id: 172 TX id: 173 Proposal 170 Agree 170 Agree 170 Proposal 171 Agree 172 Agree 173 Agree Proposal 172 Proposal 173 B A C Agree 170 Agree Agree Agree 173
  • 19. REALIZING THE POSSIBILITIES OF BIG DATA 19 WWW.WANDISCO.COM Architecture of a Non-Stop Hadoop
  • 20. REALIZING THE POSSIBILITIES OF BIG DATA 20 WWW.WANDISCO.COM Use Cases • Eliminate The Performance Bottleneck of a Single Active NameNode • Multi Data-Center Ingest – Information doesn't need to be sent to one DC and then copied back to the other using DistCP – Parallel ingest methods don’t require redirected data streams – Ingest data at, or close to the source – Global Analysis (Logs, Click Streams, etc…) • Cluster Zones – Efficient use of resource based on application profile – HBASE, IMPALA, Storm, Map Reduce, SPARK, etc… – Heterogeneous Clusters Supported • Maximize Data Center Resource Utilization – All datacenters can be used to run different jobs concurrently • Disaster Recovery – Data is as current as possible (no periodic synchs) – Virtually zero downtime to recover from regional data center failure – Regulatory compliance
  • 21. Use Case: Heterogeneous Hardware REALIZING THE POSSIBILITIES OF BIG DATA 21 WWW.WANDISCO.COM • Optimized hardware profiles for job specific tasks – Batch – Real-time – NoSQL (HBASE) • Set replication factors per sub-cluster • Use at LAN or WAN scope • Resilient to NameNode failures
  • 22. Use Case: Sub-Clusters REALIZING THE POSSIBILITIES OF BIG DATA 22 WWW.WANDISCO.COM • Maximize Resource Utilization – No idle standby • Isolate Dev and Test Clusters – Share data not resource • Carve off hardware for a specific group – Prevents a bad map/reduce job from bringing down the cluster • Guarantee Consistency and availability of data – Data is instantly available
  • 23. REALIZING THE POSSIBILITIES OF BIG DATA 23 WWW.WANDISCO.COM Non-Stop Hadoop Demonstration
  • 24. Question and Answer Feel free to submit your questions REALIZING THE POSSIBILITIES OF BIG DATA 24 WWW.WANDISCO.COM Q & A
  • 25. REALIZING THE POSSIBILITIES OF BIG DATA 25 WWW.WANDISCO.COM Thank you