SlideShare una empresa de Scribd logo
1 de 46
Page 1
DataWorks Summit - Breakout Session
Disaster Recovery Experience at CACIB:
Hardening Hadoop for Critical Financial Applications
March 21st, 2019
Abdelkrim HADJIDJ – Cloudera
Mohamed Mehdi BEN AISSA – CA-GIP
Page 2
Speakers
Mohamed Mehdi BEN AISSA
Big Data Technical Architect at CA-GIP
Big Data Infrastructure Technical Owner for CA-CIB
Abdelkrim HADJIDJ
Solution Engineer at Cloudera
Page 3
Agenda
Big Data at CA-GIP & CA-CIB
Disaster Recovery Strategies
Stretch Cluster : Architecture & Configuration
Questions & Answers
Page 4
Big Data at CA-GIP & CA-CIB
Page 5
Big Data at CA-GIP & CA-CIB
Big Data at
CA-GIP & CA-CIB
15
Infrastructure B&R
Big Data Experts
Big Data
Run Team
Big Data
BuildTeam
Big Data
Storage
8PB
2019 80% 1500
CA Group
Infrastructure
Collaborators Sites in FranceCreation
Date
17
8000
CollaboratorsThe world's n°13
bank *
13 36
Locations
around World
* In 2017, measured by Tier One Capital
36TB of Memory
4000 Cores
Page 6
Big Data at CA-GIP & CA-CIB : Use Cases
Risk
Management
Decision
Making
Cash
Management
Regulations
Page 7
Big Data at CA-GIP & CA-CIB : Principal Use Cases
Risk Management/ Regulation
• Aims to replace the current market risk eco-system and phase out the legacy system
(over 10 applications to decommission) to provide the bank with a golden source on
deal & risk indicators across business lines and worldwide
• Address ongoing and future regulations (LBF/Volker rules, FRTB, BCBS239, Initial
Margin, Stress EBA/AQR …)
• 3PB of Data on Production to date
Cash Management Transformation
• Strategic program for CA-CIB new business
• Real time Transaction Processing
• Redesign the SI payment for CACIB and international deployment
• Target : 800 millions transactions/day (8 TB/day)
Data-Lake
Real Time
Processing
Page 8
Big Data at CA-GIP & CA-CIB : Service Offer Architecture
ACCESSPROCESSINGINGESTION
Scheduling, Security, Monitoring & Administration
STORAGE &
MESSAGING
DATA SOURCES
Data storage
Messaging
Batch processing
Stream
Processing
App 1
App 2
…
App n
Records
Documents
Files
Messages
Streams
Dataviz Data Governance
APPLICATIONS
Batch Mode
Stream Mode
Data query (SQL)
NoSQL Database
Indexed Data
OLAP
RAW DATA ENHANCED DATA
OPTIMIZED DATA
RAW DATA
ENHANCED DATA
OPTIMIZED DATA
Page 9
Big Data at CA-GIP & CA-CIB : Service Level Agreements
Disaster Recovery Performance Security
Resiliency
Service Availability
24/24 7/7
Zero Data Loss
Distributed Systems
Scalability
Data Locality
In-Memory Processing
Authentication
Authorization
Data Protection
Audit
Page 10
Disaster Recovery Strategies
Page 11
Disaster Recovery vs Backup vs Archive
Disaster Recovery (DR)
• Protects from the complete outage of a data center (eg. Natural disaster)
• Disaster Recovery includes replication, but also incorporates failover and failback
• Disaster Recovery Site can be an on-premise or cloud cluster
Backup / Restore
• Protects against the logical errors (e.g. accidental deletion, corruption of data, etc)
• Incremental/full backup mechanisms are required to restore data from previous Point
In Time version (PIT). This usually involves a snapshot mechanism for PIT protection.
• Backups/Snapshots are kept for relatively short time (from days to months)
Archive
• A single static copy of data for long-term preservation (several years)
• This is required by some regulations
Page 12
Objective of a Disaster Recovery plan
• SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability,
responsibilities) :
• RTO (Recovery Time Objective) : acceptable service interruption measured in time
• RPO (Recovery Point Objective) : maximum acceptable amount of data loss measured in
time
€
Minimize service
interruption (RTO)
Minimize data
Loss (RPO)
Reduce Costs Guarantee
Consistency
Optimize
Performance
Page 13
DR options
Node
Node
Node
Node
Node
Node
Node
Node
DC1 DC2
Data
Node
Node
Node
Node
Node
Node
Node
Node
DC1 DC2
Data
Node
Node
Node
Node
Node
Node
Node
Node
DC1 DC2
Data
Dual ingest
Low RPO/RTO
Mirroring
High RPO/RTO
Multiple DC
Low RPO/RTO
Node
Node
Node
Node
DC3
Page 14
Dual ingest
DR Cluster
PROD Cluster
Synchronicity Checks / Checksums
Pub-sub/
Streaming / Batch
Routing
Data sources Global
Traffic
Manager
Local Traffic
Manager
Local Traffic
Manager
End Applications/
Users
• Significant investment
• Might meet RPO=0 (in sync)
• Active/active site
Page 15
Dual ingest pros and cons
Pros
• Very low RPO/RTO (almost 0)
• Dual run makes failover and failback
easier
• Easy to implement from an
infrastructure standpoint. Tools like
NiFi or Kafka make implementation
easier
• Help detect application’s bugs/errors
(except ML)
Cons
• Requires two clusters with preferably
iso-resources
• Requires dual configurations injections
(and automation)
• Impact on applications makes
implementation complex (ex self
service)
• Requires a cluster diff implementation
• Data export should be run once
Page 16
Mirroring
Raw Data Ingest
Replicated Data
PROD Cluster DR Cluster
Pub-sub/
Streaming / Batch
Routing
Global
Traffic
Manager
Local Traffic
Manager
Local Traffic
Manager
End Applications/
Users
• Can meet RPO = 1h to24 hrs
• Active/passive site
Data sources
Page 17
Mirroring pros and cons
Pros
• Loose requirements, easy to
implement
• Big Data technologies are designed
for this architecture
• Better performance (throughput,
network, latency)
• Can support other use cases
(isolation, geo-locality, legal, etc)
Cons
• Requires two clusters
• High RPO: Potential data loss (asynch
replication) that could be recovered
from the source
• Require a replication layer
• Need to define fail-over/fail-back
logic and process that goes beyond
just data
Page 18
Things to consider for mirroring
Applications
(Spark jobs, Hive queries, Zeppelin
notebooks, etc)
Data
(HDFS Files, Hive tables, Kafka
msgs, etc)
Infrastructure
(network, hardware, etc)
Configurations
(OS, Binaries, Ambari, Agents, RPM,
etc)
Process
(SLAs, Business
Continuity, Dev, etc)
Metadata
(Atlas, Ranger, Topics, etc)
Client configurations
(BI tools, Hbase client, Rest API, etc)
Infrastructure
services
(LDAP, AD, LB, etc)
Page 19
Replication tools
DN
DNDN
DN
Inotify events
DN
DNDN
DN
NN
HDFS data
HDFS data
NN
Page 20
What RPO can we realistically target?
We can achieves smaller replication frequency and better RPO (ex. 10 mins) – but
this depends on several parameters
Data volume, Data burst, # of
partitions/files/tables, Insert vs
update ratio
Internal/external bandwidth,
latency, dedicated/shared
(day/time), CPU **
** Asynchronous: RPO = F( max(data_generation_rate), available_bandwidth )
* Synchronous: very slow RPOs by throttling writes (impact on performances)
InfrastructureData
Synchronicity*, Incremental
replication, latency (snapshots,
compression, encryption, integrity)
Software
Page 21
Spanning Multiple Data Centers
Data sources
Raw Data Ingest
DC1 DC2
DN
NN1
ZK1 JN1
DN
NN2
ZK2 JN2
Traffic
Manager
End Applications/
Users
DC3 (witness)
ZK3 JN3
• Restricted to data centers within a geographic region
(few km).
• Strong constraints: 3 DCs, single digit ms latency,
guaranteed bandwidth *
• Multi-DC is not native in Hadoop
* https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_metal.pdf
Page 22
Multiple Data Centers pros and cons
Pros
• Better RPO (synch replication)
• Cheaper, it’s just one cluster
• Simpler for applications
• No need for fail-over/fail-back
Cons
• Strong constraints: nearby 3 DCs, single
digit ms latency, guaranteed bandwidth *
• Advanced configurations: replicas
placement strategy, Yarn labels, etc
• Performance impact by inter DC network
• Not suited for all the animals in the Zoo
(ex. Streaming)
* https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_metal.pdf
Page 23
Stretch Cluster : Architecture & Configuration
Page 24
Stretch Cluster : Why !?
• SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability,
responsibilities) :
• RTO (Recovery Time Objective) : The targeted duration of time and a service level within
which a business process must be restored after a disaster
• RPO (Recovery Point Objective) : The maximum targeted period in which data might be
lost
• Goals :
24/7 RPO €
RTO->0 RPO=0 Reduce Costs Consistency Performance
Page 25
Stretch Cluster : Why !?
• SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability,
responsibilities) :
• RTO (Recovery Time Objective) : The targeted duration of time and a service level within
which a business process must be restored after a disaster
• RPO (Recovery Point Objective) : The maximum targeted period in which data might be
lost
• Goals :
24/7 RPO €
RTO->0 RPO=0 Reduce Costs Consistency Performance
Financial Context
Page 26
Stretch Cluster : Architecture
Control NodesControl Nodes
Gateway Node
Witness Nodes
Master Nodes
Worker Nodes
Gateway Node
DC1 DC2
DC3
Page 27
Stretch Cluster for HDFS: Architecture & Configuration
Page 28
Stretch Cluster : HDFS Architecture
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK : Zookeeper
JN : JournalNode
NN : NameNode
ZK + JN
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
Page 29
Stretch Cluster : HDFS Architecture – Before Rack (One-Layer) Awareness
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK + JN
B1
B1
B1
B1
2 replicas per DC / 1 replica per Rack
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 30
Stretch Cluster : HDFS Architecture – After Rack (One-Layer) Awareness
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK + JN
B1
B1
B1B1
Rack Awareness Configuration
/dc1/rack1 /dc2/rack3/dc1/rack2 /dc2/rack4
1
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
2 replicas per DC / 1 replica per Rack
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 31
Stretch Cluster : HDFS Architecture – After Rack (One-Layer) Awareness
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK + JN
B1
B1
B1B1
Rack Awareness Configuration
/dc1/rack1 /dc2/rack3/dc1/rack2 /dc2/rack4
1
2 replicas per DC / 1 replica per Rack
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 32
Stretch Cluster : HDFS Architecture – After Rack (One-Layer) Awareness
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK + JN
B1
B1
B1B1
Rack Awareness Configuration
/dc1/rack1 /dc2/rack3/dc1/rack2 /dc2/rack4
1
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
2 replicas per DC / 1 replica per Rack
HDFS (Default) Block Placement Strategy :
• One replica on local Node
• Second replica on a remote Rack
• Third replica on same remote Rack
• Additional replicas are randomly placed
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 33
Stretch Cluster : HDFS Architecture – Advanced Configuration (Two-Layers Awareness)
Topology (Data Center) Awareness & advanced Replicator
• core-site.xml
• net.topology.impl -> org.apache.hadoop.net.NetworkTopologyWithNodeGroup
• net.topology.nodegroup.aware -> true
• dfs.block.replicator.classname-> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeGroup
Adjust Timeouts (RTO -> 0)
• core-site.xml
• dfs.heartbeat.interval
• dfs.namenode.heartbeat.recheck-interval
Recovery from Close Failure (DFSOutputStream)
• hdfs-site.xml
• dfs.client.block.write.replace-datanode-
on-failure.best-effort -> true
• dfs.client.block.write.replace-datanode-
on-failure.enable -> true
2
3 4
Page 34
Stretch Cluster : HDFS Architecture – After Rack Awareness & Advanced Configuration
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK + JN
B1 B1 B1 B1
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
1 replica per Rack / 2 replicas per DC
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 35
Stretch Cluster : HDFS Architecture – Failover Management
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK + JN
B1 B1 B1 B1
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 36
Stretch Cluster : HDFS Architecture – Failover Management
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
Datanode 1
Datanode 2
Datanode 3
Datanode 4
Datanode 5
Datanode 6
Datanode 7
Datanode 8
Datanode 9
Datanode 10
Datanode 11
Datanode 12
ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN
ZK + JN
B1 B1 B1 B1
Keep Only 2 replicas per DC
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
NN : NameNode
Page 37
Stretch Cluster for YARN: Architecture & Configuration
Page 38
Stretch Cluster : YARN Architecture
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
NodeManger 1
NodeManger 2
NodeManger 3
NodeManger 4
NodeManger 5
NodeManger 6
NodeManager 7
NodeManager 8
NodeManager 9
NodeManager 10
NodeManager 11
NodeManager 12
ZK + RM ZK ZK + RM ZK
ZK : Zookeeper
JN : JournalNode
RM : ResourceManager
ZK
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
Page 39
Stretch Cluster : YARN Architecture – Advanced Configuration
Topology (Data Center) Awareness : additional layer with node & Rack
• Yarn-site.xml
• org.apache.hadoop.mapreduce.v2.app.rm.ScheduledRequestsWithNodeGroup->
net.topology.with.nodegroup
• yarn.resourcemanager.scheduler.elements.factory.impl->
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerElementsFactoryWithNodeGroup
Adjust Timeouts (RTO -> 0)
• core-site.xml
• ipc.client.connection.maxidletime
• yarn-site.xml
• yarn.nodemanager.health-checker.interval-ms
• yarn.nm.liveness-monitor.expiry-interval-ms
1
2
Page 40
Stretch Cluster : YARN Architecture – Before Node Labels
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
NodeManger 1
NodeManger 2
NodeManger 3
NodeManger 4
NodeManger 5
NodeManger 6
NodeManager 7
NodeManager 8
NodeManager 9
NodeManager 10
NodeManager 11
NodeManager 12
ZK + RM ZK ZK + RM ZK
ZK
A1
A1A1
A1
Inter-DCs exchange Optimization
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
RM : ResourceManager
Page 41
Stretch Cluster : YARN Architecture – After Node Labels
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
NodeManger 1
NodeManger 2
NodeManger 3
NodeManger 4
NodeManger 5
NodeManger 6
NodeManager 7
NodeManager 8
NodeManager 9
NodeManager 10
NodeManager 11
NodeManager 12
ZK + RM ZK ZK + RM ZK
ZK
Node Labels Configuration
Node.label: dc1 Node.label: dc2
A1
A1
A1
A1
3
Inter-DCs exchange Optimization
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
RM : ResourceManager
Page 42
Stretch Cluster : YARN Architecture – Failover
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
NodeManger 1
NodeManger 2
NodeManger 3
NodeManger 4
NodeManger 5
NodeManger 6
NodeManager 7
NodeManager 8
NodeManager 9
NodeManager 10
NodeManager 11
NodeManager 12
ZK + RM ZK ZK + RM ZK
ZK
Node.label: dc1 Node.label: dc2
A1
A1
A1
A1
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
RM : ResourceManager
Page 43
Stretch Cluster : YARN Architecture – Failover
DC1 DC2
DC3
Rack 1 Rack 2 Rack 3 Rack 4
NodeManger 1
NodeManger 2
NodeManger 3
NodeManger 4
NodeManger 5
NodeManger 6
NodeManager 7
NodeManager 8
NodeManager 9
NodeManager 10
NodeManager 11
NodeManager 12
ZK + RM ZK ZK + RM ZK
ZK
Node.label: dc1 Node.label: dc2
A1
A1
A1
A1
Automatic Failover Management
Inter-DCs Link
Bandwidth : 100 Gbits/s, Latency < 1ms
ZK : Zookeeper
JN : JournalNode
RM : ResourceManager
Page 44
Conclusion
Page 45
Conclusion
• DRP Tests & Concept Validation (including Infrastructures & Applications) :
• Disk Failure
• Node Failure
• Rack Failure
• DC Failure
• Inter-DCs Link Failure (avoid Split-Brain scenario)
• Stretch Clusters is implemented and validated for all HDP components :
Ambari, Kafka, Storm, AMS, HBase, Ranger, etc.
• SLAs Validation : Performance, RPO=0, RTO -> 0, Consistency, etc.
• Advanced Monitoring : Infrastructure, Inter-DCs Link, Applications, etc.
Page 46
Questions ?

Más contenido relacionado

La actualidad más candente

서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017
클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017
클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017Amazon Web Services Korea
 
AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화
AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화
AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화Amazon Web Services Korea
 
Terraform in deployment pipeline
Terraform in deployment pipelineTerraform in deployment pipeline
Terraform in deployment pipelineAnton Babenko
 
Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019Anton Babenko
 
Drupalによる大規模サイトの設計・実装 において何に気をつけるべきか
Drupalによる大規模サイトの設計・実装において何に気をつけるべきかDrupalによる大規模サイトの設計・実装において何に気をつけるべきか
Drupalによる大規模サイトの設計・実装 において何に気をつけるべきかdgcircus
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 
[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...
[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...
[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...Amazon Web Services Korea
 
Hashicorp-Certified-Terraform-Associate-v3-edited.pptx
Hashicorp-Certified-Terraform-Associate-v3-edited.pptxHashicorp-Certified-Terraform-Associate-v3-edited.pptx
Hashicorp-Certified-Terraform-Associate-v3-edited.pptxssuser0d6c88
 
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...SlideTeam
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015Amazon Web Services Korea
 
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...FIWARE
 
DSpace-CRIS, anticipating innovation
DSpace-CRIS, anticipating innovationDSpace-CRIS, anticipating innovation
DSpace-CRIS, anticipating innovation4Science
 
Observability and Management on OCI - Logging and Monitoring
Observability and Management on OCI - Logging and MonitoringObservability and Management on OCI - Logging and Monitoring
Observability and Management on OCI - Logging and MonitoringKnoldus Inc.
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API4Science
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform TrainingYevgeniy Brikman
 

La actualidad más candente (20)

서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
서버리스 IoT 백엔드 개발 및 구현 사례 : 윤석찬 (AWS 테크에반젤리스트)
 
클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017
클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017
클라우드 마이그레이션을 통한 비지니스 성공 사례- AWS Summit Seoul 2017
 
AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화
AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화
AWS 12월 웨비나 │성공적인 마이그레이션을 위한 클라우드 아키텍처 및 운영 고도화
 
Terraform in deployment pipeline
Terraform in deployment pipelineTerraform in deployment pipeline
Terraform in deployment pipeline
 
Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019Terraform modules and some of best-practices - March 2019
Terraform modules and some of best-practices - March 2019
 
NiFi 시작하기
NiFi 시작하기NiFi 시작하기
NiFi 시작하기
 
Drupalによる大規模サイトの設計・実装 において何に気をつけるべきか
Drupalによる大規模サイトの設計・実装において何に気をつけるべきかDrupalによる大規模サイトの設計・実装において何に気をつけるべきか
Drupalによる大規模サイトの設計・実装 において何に気をつけるべきか
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...
[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...
[AWS Dev Day] 앱 현대화 | 코드 기반 인프라(IaC)를 활용한 현대 애플리케이션 개발 가속화, 우리도 할 수 있어요 - 김필중...
 
Anthos
AnthosAnthos
Anthos
 
02 terraform core concepts
02 terraform core concepts02 terraform core concepts
02 terraform core concepts
 
Hashicorp-Certified-Terraform-Associate-v3-edited.pptx
Hashicorp-Certified-Terraform-Associate-v3-edited.pptxHashicorp-Certified-Terraform-Associate-v3-edited.pptx
Hashicorp-Certified-Terraform-Associate-v3-edited.pptx
 
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
 
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
오토스케일링 제대로 활용하기 (김일호) - AWS 웨비나 시리즈 2015
 
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
FIWARE Global Summit - The Scorpio NGSI-LD Broker: Features and Supported Arc...
 
Terraform on Azure
Terraform on AzureTerraform on Azure
Terraform on Azure
 
DSpace-CRIS, anticipating innovation
DSpace-CRIS, anticipating innovationDSpace-CRIS, anticipating innovation
DSpace-CRIS, anticipating innovation
 
Observability and Management on OCI - Logging and Monitoring
Observability and Management on OCI - Logging and MonitoringObservability and Management on OCI - Logging and Monitoring
Observability and Management on OCI - Logging and Monitoring
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
 

Similar a Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financial Applications

Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosionactifio
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh
 
SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQLPASSTW
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
Cloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryCloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryAli Hodroj
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerAlexDepo
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedDataCore Software
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Prolifics
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsVMware Tanzu
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-finalHaluk Ulubay
 
Tổng quan công nghệ Net backup - Phần 1
Tổng quan công nghệ Net backup - Phần 1Tổng quan công nghệ Net backup - Phần 1
Tổng quan công nghệ Net backup - Phần 1NguyenDat Quoc
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
CtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceCtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceeTailing India
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt omalreda
 

Similar a Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financial Applications (20)

Audax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data ExplosionAudax Group: CIO Perspectives - Managing The Copy Data Explosion
Audax Group: CIO Perspectives - Managing The Copy Data Explosion
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Greenplum feature
Greenplum featureGreenplum feature
Greenplum feature
 
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...
 
SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1SQL PASS Taiwan 七月份聚會-1
SQL PASS Taiwan 七月份聚會-1
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
Cloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryCloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster Recovery
 
Compare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL ServerCompare Clustering Methods for MS SQL Server
Compare Clustering Methods for MS SQL Server
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the Unexpected
 
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliabi...
 
Infrastructure Strategies 2007
Infrastructure Strategies 2007Infrastructure Strategies 2007
Infrastructure Strategies 2007
 
Cloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native appsCloud-Native Data: What data questions to ask when building cloud-native apps
Cloud-Native Data: What data questions to ask when building cloud-native apps
 
Data core overview - haluk-final
Data core overview - haluk-finalData core overview - haluk-final
Data core overview - haluk-final
 
Tổng quan công nghệ Net backup - Phần 1
Tổng quan công nghệ Net backup - Phần 1Tổng quan công nghệ Net backup - Phần 1
Tổng quan công nghệ Net backup - Phần 1
 
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
Denodo Platform 7.0: Redefine Analytics with In-Memory Parallel Processing an...
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
CtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceCtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerce
 
Data center disaster recovery.ppt
Data center disaster recovery.ppt Data center disaster recovery.ppt
Data center disaster recovery.ppt
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financial Applications

  • 1. Page 1 DataWorks Summit - Breakout Session Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financial Applications March 21st, 2019 Abdelkrim HADJIDJ – Cloudera Mohamed Mehdi BEN AISSA – CA-GIP
  • 2. Page 2 Speakers Mohamed Mehdi BEN AISSA Big Data Technical Architect at CA-GIP Big Data Infrastructure Technical Owner for CA-CIB Abdelkrim HADJIDJ Solution Engineer at Cloudera
  • 3. Page 3 Agenda Big Data at CA-GIP & CA-CIB Disaster Recovery Strategies Stretch Cluster : Architecture & Configuration Questions & Answers
  • 4. Page 4 Big Data at CA-GIP & CA-CIB
  • 5. Page 5 Big Data at CA-GIP & CA-CIB Big Data at CA-GIP & CA-CIB 15 Infrastructure B&R Big Data Experts Big Data Run Team Big Data BuildTeam Big Data Storage 8PB 2019 80% 1500 CA Group Infrastructure Collaborators Sites in FranceCreation Date 17 8000 CollaboratorsThe world's n°13 bank * 13 36 Locations around World * In 2017, measured by Tier One Capital 36TB of Memory 4000 Cores
  • 6. Page 6 Big Data at CA-GIP & CA-CIB : Use Cases Risk Management Decision Making Cash Management Regulations
  • 7. Page 7 Big Data at CA-GIP & CA-CIB : Principal Use Cases Risk Management/ Regulation • Aims to replace the current market risk eco-system and phase out the legacy system (over 10 applications to decommission) to provide the bank with a golden source on deal & risk indicators across business lines and worldwide • Address ongoing and future regulations (LBF/Volker rules, FRTB, BCBS239, Initial Margin, Stress EBA/AQR …) • 3PB of Data on Production to date Cash Management Transformation • Strategic program for CA-CIB new business • Real time Transaction Processing • Redesign the SI payment for CACIB and international deployment • Target : 800 millions transactions/day (8 TB/day) Data-Lake Real Time Processing
  • 8. Page 8 Big Data at CA-GIP & CA-CIB : Service Offer Architecture ACCESSPROCESSINGINGESTION Scheduling, Security, Monitoring & Administration STORAGE & MESSAGING DATA SOURCES Data storage Messaging Batch processing Stream Processing App 1 App 2 … App n Records Documents Files Messages Streams Dataviz Data Governance APPLICATIONS Batch Mode Stream Mode Data query (SQL) NoSQL Database Indexed Data OLAP RAW DATA ENHANCED DATA OPTIMIZED DATA RAW DATA ENHANCED DATA OPTIMIZED DATA
  • 9. Page 9 Big Data at CA-GIP & CA-CIB : Service Level Agreements Disaster Recovery Performance Security Resiliency Service Availability 24/24 7/7 Zero Data Loss Distributed Systems Scalability Data Locality In-Memory Processing Authentication Authorization Data Protection Audit
  • 11. Page 11 Disaster Recovery vs Backup vs Archive Disaster Recovery (DR) • Protects from the complete outage of a data center (eg. Natural disaster) • Disaster Recovery includes replication, but also incorporates failover and failback • Disaster Recovery Site can be an on-premise or cloud cluster Backup / Restore • Protects against the logical errors (e.g. accidental deletion, corruption of data, etc) • Incremental/full backup mechanisms are required to restore data from previous Point In Time version (PIT). This usually involves a snapshot mechanism for PIT protection. • Backups/Snapshots are kept for relatively short time (from days to months) Archive • A single static copy of data for long-term preservation (several years) • This is required by some regulations
  • 12. Page 12 Objective of a Disaster Recovery plan • SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability, responsibilities) : • RTO (Recovery Time Objective) : acceptable service interruption measured in time • RPO (Recovery Point Objective) : maximum acceptable amount of data loss measured in time € Minimize service interruption (RTO) Minimize data Loss (RPO) Reduce Costs Guarantee Consistency Optimize Performance
  • 13. Page 13 DR options Node Node Node Node Node Node Node Node DC1 DC2 Data Node Node Node Node Node Node Node Node DC1 DC2 Data Node Node Node Node Node Node Node Node DC1 DC2 Data Dual ingest Low RPO/RTO Mirroring High RPO/RTO Multiple DC Low RPO/RTO Node Node Node Node DC3
  • 14. Page 14 Dual ingest DR Cluster PROD Cluster Synchronicity Checks / Checksums Pub-sub/ Streaming / Batch Routing Data sources Global Traffic Manager Local Traffic Manager Local Traffic Manager End Applications/ Users • Significant investment • Might meet RPO=0 (in sync) • Active/active site
  • 15. Page 15 Dual ingest pros and cons Pros • Very low RPO/RTO (almost 0) • Dual run makes failover and failback easier • Easy to implement from an infrastructure standpoint. Tools like NiFi or Kafka make implementation easier • Help detect application’s bugs/errors (except ML) Cons • Requires two clusters with preferably iso-resources • Requires dual configurations injections (and automation) • Impact on applications makes implementation complex (ex self service) • Requires a cluster diff implementation • Data export should be run once
  • 16. Page 16 Mirroring Raw Data Ingest Replicated Data PROD Cluster DR Cluster Pub-sub/ Streaming / Batch Routing Global Traffic Manager Local Traffic Manager Local Traffic Manager End Applications/ Users • Can meet RPO = 1h to24 hrs • Active/passive site Data sources
  • 17. Page 17 Mirroring pros and cons Pros • Loose requirements, easy to implement • Big Data technologies are designed for this architecture • Better performance (throughput, network, latency) • Can support other use cases (isolation, geo-locality, legal, etc) Cons • Requires two clusters • High RPO: Potential data loss (asynch replication) that could be recovered from the source • Require a replication layer • Need to define fail-over/fail-back logic and process that goes beyond just data
  • 18. Page 18 Things to consider for mirroring Applications (Spark jobs, Hive queries, Zeppelin notebooks, etc) Data (HDFS Files, Hive tables, Kafka msgs, etc) Infrastructure (network, hardware, etc) Configurations (OS, Binaries, Ambari, Agents, RPM, etc) Process (SLAs, Business Continuity, Dev, etc) Metadata (Atlas, Ranger, Topics, etc) Client configurations (BI tools, Hbase client, Rest API, etc) Infrastructure services (LDAP, AD, LB, etc)
  • 19. Page 19 Replication tools DN DNDN DN Inotify events DN DNDN DN NN HDFS data HDFS data NN
  • 20. Page 20 What RPO can we realistically target? We can achieves smaller replication frequency and better RPO (ex. 10 mins) – but this depends on several parameters Data volume, Data burst, # of partitions/files/tables, Insert vs update ratio Internal/external bandwidth, latency, dedicated/shared (day/time), CPU ** ** Asynchronous: RPO = F( max(data_generation_rate), available_bandwidth ) * Synchronous: very slow RPOs by throttling writes (impact on performances) InfrastructureData Synchronicity*, Incremental replication, latency (snapshots, compression, encryption, integrity) Software
  • 21. Page 21 Spanning Multiple Data Centers Data sources Raw Data Ingest DC1 DC2 DN NN1 ZK1 JN1 DN NN2 ZK2 JN2 Traffic Manager End Applications/ Users DC3 (witness) ZK3 JN3 • Restricted to data centers within a geographic region (few km). • Strong constraints: 3 DCs, single digit ms latency, guaranteed bandwidth * • Multi-DC is not native in Hadoop * https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_metal.pdf
  • 22. Page 22 Multiple Data Centers pros and cons Pros • Better RPO (synch replication) • Cheaper, it’s just one cluster • Simpler for applications • No need for fail-over/fail-back Cons • Strong constraints: nearby 3 DCs, single digit ms latency, guaranteed bandwidth * • Advanced configurations: replicas placement strategy, Yarn labels, etc • Performance impact by inter DC network • Not suited for all the animals in the Zoo (ex. Streaming) * https://www.cloudera.com/documentation/other/reference-architecture/PDF/cloudera_ref_arch_metal.pdf
  • 23. Page 23 Stretch Cluster : Architecture & Configuration
  • 24. Page 24 Stretch Cluster : Why !? • SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability, responsibilities) : • RTO (Recovery Time Objective) : The targeted duration of time and a service level within which a business process must be restored after a disaster • RPO (Recovery Point Objective) : The maximum targeted period in which data might be lost • Goals : 24/7 RPO € RTO->0 RPO=0 Reduce Costs Consistency Performance
  • 25. Page 25 Stretch Cluster : Why !? • SLA (Service-Level Agreement) : Particular aspects of the service (quality, availability, responsibilities) : • RTO (Recovery Time Objective) : The targeted duration of time and a service level within which a business process must be restored after a disaster • RPO (Recovery Point Objective) : The maximum targeted period in which data might be lost • Goals : 24/7 RPO € RTO->0 RPO=0 Reduce Costs Consistency Performance Financial Context
  • 26. Page 26 Stretch Cluster : Architecture Control NodesControl Nodes Gateway Node Witness Nodes Master Nodes Worker Nodes Gateway Node DC1 DC2 DC3
  • 27. Page 27 Stretch Cluster for HDFS: Architecture & Configuration
  • 28. Page 28 Stretch Cluster : HDFS Architecture DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK : Zookeeper JN : JournalNode NN : NameNode ZK + JN Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms
  • 29. Page 29 Stretch Cluster : HDFS Architecture – Before Rack (One-Layer) Awareness DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK + JN B1 B1 B1 B1 2 replicas per DC / 1 replica per Rack Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode NN : NameNode
  • 30. Page 30 Stretch Cluster : HDFS Architecture – After Rack (One-Layer) Awareness DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK + JN B1 B1 B1B1 Rack Awareness Configuration /dc1/rack1 /dc2/rack3/dc1/rack2 /dc2/rack4 1 Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms 2 replicas per DC / 1 replica per Rack ZK : Zookeeper JN : JournalNode NN : NameNode
  • 31. Page 31 Stretch Cluster : HDFS Architecture – After Rack (One-Layer) Awareness DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK + JN B1 B1 B1B1 Rack Awareness Configuration /dc1/rack1 /dc2/rack3/dc1/rack2 /dc2/rack4 1 2 replicas per DC / 1 replica per Rack ZK : Zookeeper JN : JournalNode NN : NameNode
  • 32. Page 32 Stretch Cluster : HDFS Architecture – After Rack (One-Layer) Awareness DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK + JN B1 B1 B1B1 Rack Awareness Configuration /dc1/rack1 /dc2/rack3/dc1/rack2 /dc2/rack4 1 Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms 2 replicas per DC / 1 replica per Rack HDFS (Default) Block Placement Strategy : • One replica on local Node • Second replica on a remote Rack • Third replica on same remote Rack • Additional replicas are randomly placed ZK : Zookeeper JN : JournalNode NN : NameNode
  • 33. Page 33 Stretch Cluster : HDFS Architecture – Advanced Configuration (Two-Layers Awareness) Topology (Data Center) Awareness & advanced Replicator • core-site.xml • net.topology.impl -> org.apache.hadoop.net.NetworkTopologyWithNodeGroup • net.topology.nodegroup.aware -> true • dfs.block.replicator.classname-> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeGroup Adjust Timeouts (RTO -> 0) • core-site.xml • dfs.heartbeat.interval • dfs.namenode.heartbeat.recheck-interval Recovery from Close Failure (DFSOutputStream) • hdfs-site.xml • dfs.client.block.write.replace-datanode- on-failure.best-effort -> true • dfs.client.block.write.replace-datanode- on-failure.enable -> true 2 3 4
  • 34. Page 34 Stretch Cluster : HDFS Architecture – After Rack Awareness & Advanced Configuration DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK + JN B1 B1 B1 B1 Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms 1 replica per Rack / 2 replicas per DC ZK : Zookeeper JN : JournalNode NN : NameNode
  • 35. Page 35 Stretch Cluster : HDFS Architecture – Failover Management DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK + JN B1 B1 B1 B1 Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode NN : NameNode
  • 36. Page 36 Stretch Cluster : HDFS Architecture – Failover Management DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 Datanode 1 Datanode 2 Datanode 3 Datanode 4 Datanode 5 Datanode 6 Datanode 7 Datanode 8 Datanode 9 Datanode 10 Datanode 11 Datanode 12 ZK + JN + NN ZK + JN ZK + JN + NN ZK + JN ZK + JN B1 B1 B1 B1 Keep Only 2 replicas per DC Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode NN : NameNode
  • 37. Page 37 Stretch Cluster for YARN: Architecture & Configuration
  • 38. Page 38 Stretch Cluster : YARN Architecture DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 NodeManger 1 NodeManger 2 NodeManger 3 NodeManger 4 NodeManger 5 NodeManger 6 NodeManager 7 NodeManager 8 NodeManager 9 NodeManager 10 NodeManager 11 NodeManager 12 ZK + RM ZK ZK + RM ZK ZK : Zookeeper JN : JournalNode RM : ResourceManager ZK Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms
  • 39. Page 39 Stretch Cluster : YARN Architecture – Advanced Configuration Topology (Data Center) Awareness : additional layer with node & Rack • Yarn-site.xml • org.apache.hadoop.mapreduce.v2.app.rm.ScheduledRequestsWithNodeGroup-> net.topology.with.nodegroup • yarn.resourcemanager.scheduler.elements.factory.impl-> org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerElementsFactoryWithNodeGroup Adjust Timeouts (RTO -> 0) • core-site.xml • ipc.client.connection.maxidletime • yarn-site.xml • yarn.nodemanager.health-checker.interval-ms • yarn.nm.liveness-monitor.expiry-interval-ms 1 2
  • 40. Page 40 Stretch Cluster : YARN Architecture – Before Node Labels DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 NodeManger 1 NodeManger 2 NodeManger 3 NodeManger 4 NodeManger 5 NodeManger 6 NodeManager 7 NodeManager 8 NodeManager 9 NodeManager 10 NodeManager 11 NodeManager 12 ZK + RM ZK ZK + RM ZK ZK A1 A1A1 A1 Inter-DCs exchange Optimization Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode RM : ResourceManager
  • 41. Page 41 Stretch Cluster : YARN Architecture – After Node Labels DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 NodeManger 1 NodeManger 2 NodeManger 3 NodeManger 4 NodeManger 5 NodeManger 6 NodeManager 7 NodeManager 8 NodeManager 9 NodeManager 10 NodeManager 11 NodeManager 12 ZK + RM ZK ZK + RM ZK ZK Node Labels Configuration Node.label: dc1 Node.label: dc2 A1 A1 A1 A1 3 Inter-DCs exchange Optimization Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode RM : ResourceManager
  • 42. Page 42 Stretch Cluster : YARN Architecture – Failover DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 NodeManger 1 NodeManger 2 NodeManger 3 NodeManger 4 NodeManger 5 NodeManger 6 NodeManager 7 NodeManager 8 NodeManager 9 NodeManager 10 NodeManager 11 NodeManager 12 ZK + RM ZK ZK + RM ZK ZK Node.label: dc1 Node.label: dc2 A1 A1 A1 A1 Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode RM : ResourceManager
  • 43. Page 43 Stretch Cluster : YARN Architecture – Failover DC1 DC2 DC3 Rack 1 Rack 2 Rack 3 Rack 4 NodeManger 1 NodeManger 2 NodeManger 3 NodeManger 4 NodeManger 5 NodeManger 6 NodeManager 7 NodeManager 8 NodeManager 9 NodeManager 10 NodeManager 11 NodeManager 12 ZK + RM ZK ZK + RM ZK ZK Node.label: dc1 Node.label: dc2 A1 A1 A1 A1 Automatic Failover Management Inter-DCs Link Bandwidth : 100 Gbits/s, Latency < 1ms ZK : Zookeeper JN : JournalNode RM : ResourceManager
  • 45. Page 45 Conclusion • DRP Tests & Concept Validation (including Infrastructures & Applications) : • Disk Failure • Node Failure • Rack Failure • DC Failure • Inter-DCs Link Failure (avoid Split-Brain scenario) • Stretch Clusters is implemented and validated for all HDP components : Ambari, Kafka, Storm, AMS, HBase, Ranger, etc. • SLAs Validation : Performance, RPO=0, RTO -> 0, Consistency, etc. • Advanced Monitoring : Infrastructure, Inter-DCs Link, Applications, etc.

Notas del editor

  1. Dr Be able to keep data, services and application even if a disaster occurs causing the failure of a complete data center A separate site (or sites) used to recover against a disaster. Can be <100KM (dark fiber) or >100KM (WAN) Synchronous replication is desired (RPO is almost 0) but hard at large scale Backup A consistent backup occurs when the database is in consistent state-meaning you can restore the backup and open without performing the media recovery. When a database is restored from an inconsistent backup, database must perform media recovery before the database can be opened, applying any pending changes from the redo logs