SlideShare una empresa de Scribd logo
1 de 23
Hadoop 1.x vs Hadoop 2
Rommel Garcia
Solutions Engineer - Big Data
Hortonworks
Transition To Big Data
Relational Dimensional
(EDW)
Big Data
Data Explosion
3 Design Dimensions
Key Hadoop Data Types
Sentiment
Clickstream
Sensor/Machine
Geographic
Server Logs
Text
Hadoop is NOT
ESB
NoSQL
HPC
Relational
Real-time
The “Jack of all Trades”
Hadoop 1
Limited up to 4,000 nodes per cluster
O(# of tasks in a cluster)
JobTracker bottleneck - resource
management, job scheduling and monitoring
Only has one namespace for managing HDFS
Map and Reduce slots are static
Only job to run is MapReduce
Hadoop 1 - Basics
BBBB CCCC AAAA AAAA AAAA
AAAA BBBB CCCC CCCC BBBB
MapReduce (Computation Framework)
HDFS (Storage Framework)
Hadoop 1 - Reading
Files
Rack1 Rack2 Rack3 RackN
read file (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs,
block ids, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
heartbeat/
block reportread blocks
Hadoop 1 - Writing Files
Rack1 Rack2 Rack3 RackN
request write (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
block report
write blocks
replication pipelining
Hadoop 1 - Running
Jobs
Rack1 Rack2 Rack3 RackN
Hadoop Client
JobTracker
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
submit job
deploy job
part 0part 0part 0part 0
map
reduce
shuffle
Hadoop 1 - Security
UsersUsersUsersUsers
FF
II
RR
EE
WW
AA
LL
LL
LDAP/AD
Client Node/
Spoke Server
KDC
Hadoop Cluster
authN/authZ
service request
block token
delegate token
* block token is for accessing data
* delegate token is for running jobs
Encryption PluginEncryption Plugin
Hadoop 1 - APIs
org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Reducer
org.apache.hadoop.mapreduce.Job
Hadoop 2
Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing
HDFS
Efficient cluster utilization (YARN)
MRv1 backward and forward compatible
Any apps can integrate with Hadoop
Beyond Java
Hadoop 2 - Basics
Hadoop 2 - Reading Files
(w/ NN Federation)
(w/ NN Federation)
Rack1 Rack2 Rack3 RackN
read file
fsimage/edit copy
Hadoop Client NN1/ns1
SNameNode
per NN
return DNs,
block ids, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
checkpoint
register/
heartbeat/
block report
read blocks
fs sync Backup NN
per NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
ns1 ns2 ns3 ns4
dn1, dn2
dn1, dn3
dn4, dn5 dn4, dn5
Block Pools
Hadoop 2 - Writing Files
Rack1 Rack2 Rack3 RackN
request write
Hadoop Client
return DNs, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
write blocks
replication pipelining
fsimage/edit copy
NN1/ns1
SNameNode
per NN
checkpoint
block report
fs sync Backup NN
per NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
Hadoop 2 - Running Jobs
RackN
NodeManager
NodeManager
NodeManager
Rack2
NodeManager
NodeManager
NodeManager
Rack1
NodeManager
NodeManager
NodeManager
C2.1
C1.4
AM2
C2.2 C2.3
AM1
C1.3
C1.2
C1.1
Hadoop Client 1
Hadoop Client 2
create app2
submit app1
submit app2
create app1
ASM Scheduler
queues
ASM Containers
NM ASM
Scheduler Resources
.......negotiates.......
.......reports to.......
.......partitions.......
ResourceManager
status report
Hadoop 2 - Security
FF
II
RR
EE
WW
AA
LL
LL
LDAP/AD
Knox Gateway Cluster
KDC
Hadoop Cluster
Enterprise/
Cloud SSO
Provider
JDBC ClientJDBC Client
REST ClientREST Client
FF
II
RR
EE
WW
AA
LL
LL
DMZ
Browser(HUE)Browser(HUE) Native Hive/HBase EncryptionNative Hive/HBase Encryption
Hadoop 2 - APIs
org.apache.hadoop.yarn.api.ApplicationClientProtocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocol
org.apache.hadoop.yarn.api.ContainerManagementProtoc
ol
Resources
http://hortonworks.com/products/hortonworks-sandbox/
http://hortonworks.com/products/hdp-2/
http://hortonworks.com/resources/
http://hadoopsummit.org/san-jose/
Hadoop Summit 2014
Thank you!
www.linkedin.com/in/rommelgarcia
twitter.com/rommelgarcia
rgarcia@hortonworks.com
Hortonworks

Más contenido relacionado

La actualidad más candente

Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
Derek Stainer
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
Kanike Krishna
 

La actualidad más candente (20)

Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Column oriented database
Column oriented databaseColumn oriented database
Column oriented database
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 

Destacado

Understanding The Gist
Understanding The GistUnderstanding The Gist
Understanding The Gist
ebenimzo
 
Top 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answersTop 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answers
jomgori
 
Software Product Development - Simple Process flow
Software Product Development - Simple Process flowSoftware Product Development - Simple Process flow
Software Product Development - Simple Process flow
Sabina Siddiqi
 
Ecommerce and internet marketing
Ecommerce and internet marketingEcommerce and internet marketing
Ecommerce and internet marketing
akkapeddi
 
Bài 20: Mạng máy tính
Bài 20: Mạng máy tínhBài 20: Mạng máy tính
Bài 20: Mạng máy tính
Châu Trần
 
7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance
Forrester High School
 
Analysis of working capital management shriram piston finance
Analysis of working capital management  shriram piston  financeAnalysis of working capital management  shriram piston  finance
Analysis of working capital management shriram piston finance
anuragmaurya
 
Online supply inventory system
Online supply inventory systemOnline supply inventory system
Online supply inventory system
rokista
 

Destacado (20)

Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Understanding The Gist
Understanding The GistUnderstanding The Gist
Understanding The Gist
 
Top 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answersTop 10 lead engineer interview questions and answers
Top 10 lead engineer interview questions and answers
 
Manualtesting
ManualtestingManualtesting
Manualtesting
 
Use of glass powder as fine aggregate in high strength concrete
Use of glass powder as fine aggregate in high strength concreteUse of glass powder as fine aggregate in high strength concrete
Use of glass powder as fine aggregate in high strength concrete
 
Industrial housing
Industrial housingIndustrial housing
Industrial housing
 
Software Product Development - Simple Process flow
Software Product Development - Simple Process flowSoftware Product Development - Simple Process flow
Software Product Development - Simple Process flow
 
How Hedge Funds Are Structured
How Hedge Funds Are StructuredHow Hedge Funds Are Structured
How Hedge Funds Are Structured
 
Ecommerce and internet marketing
Ecommerce and internet marketingEcommerce and internet marketing
Ecommerce and internet marketing
 
Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 
Bài 20: Mạng máy tính
Bài 20: Mạng máy tínhBài 20: Mạng máy tính
Bài 20: Mạng máy tính
 
Surgical Bleeding
Surgical BleedingSurgical Bleeding
Surgical Bleeding
 
7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance7. The Software Development Process - Maintenance
7. The Software Development Process - Maintenance
 
Analysis of working capital management shriram piston finance
Analysis of working capital management  shriram piston  financeAnalysis of working capital management  shriram piston  finance
Analysis of working capital management shriram piston finance
 
Online supply inventory system
Online supply inventory systemOnline supply inventory system
Online supply inventory system
 
Enterprise Analysis
Enterprise AnalysisEnterprise Analysis
Enterprise Analysis
 
Cold water supply system & Components
Cold water supply system & ComponentsCold water supply system & Components
Cold water supply system & Components
 
Financial planning & forecasting
Financial planning & forecastingFinancial planning & forecasting
Financial planning & forecasting
 
Basic Photography 101
Basic Photography 101Basic Photography 101
Basic Photography 101
 
STRATEGIC OUTSOURCING
STRATEGIC OUTSOURCINGSTRATEGIC OUTSOURCING
STRATEGIC OUTSOURCING
 

Similar a Hadoop 1.x vs 2

HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
thevijayps
 

Similar a Hadoop 1.x vs 2 (20)

Hadoop Architecture in Depth
Hadoop Architecture in DepthHadoop Architecture in Depth
Hadoop Architecture in Depth
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Design for a Distributed Name Node
Design for a Distributed Name NodeDesign for a Distributed Name Node
Design for a Distributed Name Node
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Unit 1
Unit 1Unit 1
Unit 1
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Big data overview of apache hadoop
Big data overview of apache hadoopBig data overview of apache hadoop
Big data overview of apache hadoop
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 

Más de Rommel Garcia

Más de Rommel Garcia (12)

The of Operational Analytics Data Store
The of Operational Analytics Data StoreThe of Operational Analytics Data Store
The of Operational Analytics Data Store
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
 
GPU 101: The Beast In Data Centers
GPU 101: The Beast In Data CentersGPU 101: The Beast In Data Centers
GPU 101: The Beast In Data Centers
 
PCI Compliane With Hadoop
PCI Compliane With HadoopPCI Compliane With Hadoop
PCI Compliane With Hadoop
 
Virtualizing Hadoop
Virtualizing HadoopVirtualizing Hadoop
Virtualizing Hadoop
 
Open Source Security Tools for Big Data
Open Source Security Tools for Big DataOpen Source Security Tools for Big Data
Open Source Security Tools for Big Data
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Hadoop Meets Scrum
Hadoop Meets ScrumHadoop Meets Scrum
Hadoop Meets Scrum
 
Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0Realtime analytics + hadoop 2.0
Realtime analytics + hadoop 2.0
 
Interactive query in hadoop
Interactive query in hadoopInteractive query in hadoop
Interactive query in hadoop
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Hadoop 1.x vs 2

  • 1. Hadoop 1.x vs Hadoop 2 Rommel Garcia Solutions Engineer - Big Data Hortonworks
  • 2. Transition To Big Data Relational Dimensional (EDW) Big Data
  • 5. Key Hadoop Data Types Sentiment Clickstream Sensor/Machine Geographic Server Logs Text
  • 7. Hadoop 1 Limited up to 4,000 nodes per cluster O(# of tasks in a cluster) JobTracker bottleneck - resource management, job scheduling and monitoring Only has one namespace for managing HDFS Map and Reduce slots are static Only job to run is MapReduce
  • 8. Hadoop 1 - Basics BBBB CCCC AAAA AAAA AAAA AAAA BBBB CCCC CCCC BBBB MapReduce (Computation Framework) HDFS (Storage Framework)
  • 9. Hadoop 1 - Reading Files Rack1 Rack2 Rack3 RackN read file (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, block ids, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint heartbeat/ block reportread blocks
  • 10. Hadoop 1 - Writing Files Rack1 Rack2 Rack3 RackN request write (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint block report write blocks replication pipelining
  • 11. Hadoop 1 - Running Jobs Rack1 Rack2 Rack3 RackN Hadoop Client JobTracker DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT submit job deploy job part 0part 0part 0part 0 map reduce shuffle
  • 12. Hadoop 1 - Security UsersUsersUsersUsers FF II RR EE WW AA LL LL LDAP/AD Client Node/ Spoke Server KDC Hadoop Cluster authN/authZ service request block token delegate token * block token is for accessing data * delegate token is for running jobs Encryption PluginEncryption Plugin
  • 13. Hadoop 1 - APIs org.apache.hadoop.mapreduce.Partitioner org.apache.hadoop.mapreduce.Mapper org.apache.hadoop.mapreduce.Reducer org.apache.hadoop.mapreduce.Job
  • 14. Hadoop 2 Potentially up to 10,000 nodes per cluster O(cluster size) Supports multiple namespace for managing HDFS Efficient cluster utilization (YARN) MRv1 backward and forward compatible Any apps can integrate with Hadoop Beyond Java
  • 15. Hadoop 2 - Basics
  • 16. Hadoop 2 - Reading Files (w/ NN Federation) (w/ NN Federation) Rack1 Rack2 Rack3 RackN read file fsimage/edit copy Hadoop Client NN1/ns1 SNameNode per NN return DNs, block ids, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM checkpoint register/ heartbeat/ block report read blocks fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or ns1 ns2 ns3 ns4 dn1, dn2 dn1, dn3 dn4, dn5 dn4, dn5 Block Pools
  • 17. Hadoop 2 - Writing Files Rack1 Rack2 Rack3 RackN request write Hadoop Client return DNs, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM write blocks replication pipelining fsimage/edit copy NN1/ns1 SNameNode per NN checkpoint block report fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or
  • 18. Hadoop 2 - Running Jobs RackN NodeManager NodeManager NodeManager Rack2 NodeManager NodeManager NodeManager Rack1 NodeManager NodeManager NodeManager C2.1 C1.4 AM2 C2.2 C2.3 AM1 C1.3 C1.2 C1.1 Hadoop Client 1 Hadoop Client 2 create app2 submit app1 submit app2 create app1 ASM Scheduler queues ASM Containers NM ASM Scheduler Resources .......negotiates....... .......reports to....... .......partitions....... ResourceManager status report
  • 19. Hadoop 2 - Security FF II RR EE WW AA LL LL LDAP/AD Knox Gateway Cluster KDC Hadoop Cluster Enterprise/ Cloud SSO Provider JDBC ClientJDBC Client REST ClientREST Client FF II RR EE WW AA LL LL DMZ Browser(HUE)Browser(HUE) Native Hive/HBase EncryptionNative Hive/HBase Encryption
  • 20. Hadoop 2 - APIs org.apache.hadoop.yarn.api.ApplicationClientProtocol org.apache.hadoop.yarn.api.ApplicationMasterProtocol org.apache.hadoop.yarn.api.ContainerManagementProtoc ol