SlideShare una empresa de Scribd logo
1 de 13
M
KU

Y

F
O

OM P
AT OO
AN AD
H

LE
FI

TE
RI
W

IN

AR

A
OM
.C
ND
OO
NA
H
A
YA
SH
K@
JE
90
RA
12
H_
ES
AJ
R
A SAMPLE HADOOP CLUSTER
Data center D1
Name Node
Rack R1

R1N1

R1N2

R1N3

R1N4

Rack R2

R2N1

R2N2

R2N3

R2N4

1. This is our example Hadoop cluster.
2. This with has one name node and two racks named R1 and R2 in a data center D1. Each rack has 4
nodes and they are uniquely identified as R1N1, R1N2 and so on.
3. Replication factor is 3.
4. HDFS block size is 64 MB.
FACTS TO BE KNOW
1. Name node saves part of HDFS metadata like file location, permission, etc. in files
called namespace image and edit logs. Files are stored in HDFS as blocks. These
block information are not saved in any file. Instead it is gathered every time the
cluster is started. And this information is stored in name node’s memory.
2. Replica Placement : Assuming the replication factor is 3; When a file is written from
a data node (say R1N1), Hadoop attempts to save the first replica in same data
node (R1N1). Second replica is written into another node (R2N2) in a different rack
(R2). Third replica is written into another node (R2N1) in the same rack (R2) where
the second replica was saved.
3. Hadoop takes a simple approach in which the network is represented as a tree and
the distance between two nodes is the sum of their distances to their closest
common ancestor. The levels can be like; “Data Center” > “Rack” > “Node”.
Example; ‘/d1/r1/n1’ is a representation for a node named n1 on rack r1 in data
center d1. Distance calculation has 4 possible scenarios as;

1. distance(/d1/r1/n1, /d1/r1/n1) = 0 [Processes on same
node]
2. distance(/d1/r1/n1, /d1/r1/n2) = 2 [different node is
same rack]
3. distance(/d1/r1/n1, /d1/r2/n3) = 4 [node in different rack
ANATOMY OF FILE WRITE – HAPPY PATH
HDFS
Client

create()

RPC call to create a new file

DistributedFileSystem
RPC call is complete

Name Node
sfo_crimes.csv

FSDataOutputStream
DFSOutputStream

RIN1 JVM

• Let’s say we are trying to write the “sfo_crimes.csv” file from R1N1.
• So a HDFS Client program will run on R1N1’s JVM.
• First the HDFS client program calls the method create() on a Java class
DistributedFileSystem (subclass of FileSystem).
• DFS makes a RPC call to name node to create a new file in the file system's
namespace. No blocks are associated to the file at this stage.
• Name node performs various checks; ensures the file doesn't exists, the user has the
right permissions to create the file. Then name node creates a record for the new file.
• Then DFS creates a FSDataOutputStream for the client to write data to. FSDOS wraps
a DFSOutputStream, which handles communication with DN and NN.
• In response to ‘FileSystem.create()’, HDFS Client receives this FSDataOutputStream.
HDFS
Client

write()

FSDataOutputStream
DFSOutputStream

Name Node

Data Queue
Ack Queue

DataStreamer

RIN1 JVM

• From now on HDFS Client deals with FSDataOutputStream.
• HDFS Client invokes write() on the stream.
• Following are the important components involved in a file write;
• Data Queue: When client writes data, DFSOS splits into packets and writes into
this internal queue.
• DataStreamer: The data queue is consumed by this component, which also
communicates with name node for block allocation.
• Ack Queue: Packets consumed by DataStreamer are temporaroly saved in an
this internal queue.
HDFS
Client

write()

FSDataOutputStream
DFSOutputStream

Name Node

Data Queue
P
6

P
5

P
4

P
3

P
2

P
1

Ack Queue

DataStreamer
Pipeline
RIN1 JVM

R1N1

R2N1

R1N2

• As said, data written by client will be converted into packets and stored in data queue.
• DataStreamer communicates with NN to allocate new blocks by picking a list of
suitable DNs to store the replicas. NN uses ‘Replica Placement’ as a strategy to pick
DNs for a block.
• The list of DNs form a pipeline. Since the replication factor is assumed as 3, there are
3 nodes picked by NN.
HDFS
Client

write()

FSDataOutputStream

close()

DFSOutputStream

Name Node

Data Queue
P
8

P
7

P
6

P
5

P
4

P
3

Ack Queue
P
2

P
1

DataStreamer
Pipeline
RIN1 JVM
Ac
k

P1

P1

R1N1

R2N1
Ack

•

P1

R1N2
Ack

DataStreamer consumes few packets from data queue. A copy of the consumed data is stored in
‘ack queue’.
• DataStreamer streams the packet to first node in pipeline. Once the data is written in DN1, the
data is forwarded to next DN. This repeats till last DN.
• Once the packet is written to the last DN, an acknowledgement is sent from each DN to DFSOS.
The packet P1 is removed from Ack Queue.
• The whole process continues till a block is filled. After that, the pipeline is closed and
DataStreamer asks NN for fresh set of DNs for next block. And the cycle repeats.
• HDFS Client calls the close() method once the write is finished. This would flush all the remaining
packets to the pipeline & waits for ack before informing the NN that the write is complete.
ANATOMY OF FILE WRITE – DATA NODE WRITE
ERROR
HDFS
Client

write()

FSDataOutputStream
DFSOutputStream

Name Node

Data Queue
P
8

P7

PP
68

P
P5
7

P
P
6
4

P
P
5
3

P
P2
4

P
3
1

Ack Queue
P
2

P
1

DataStreamer
Pipeline
RIN1 JVM
P1

R1N1
•

R2N1

R1N2

A normal write begins with a write() method call from HDFS client on the stream. And let’s say an
error occurred while writing to R2N1.
• The pipeline will be closed.
• Packets in ack queue are moved to front data queue.
• The current block on good DNs are given a new identity and its communicated to NN, so the
partial block on the failed DN will be deleted if the failed DN recovers later.
• The failed data node is removed from pipeline and the remaining data is written to the remaining
two DNs.
• NN notices that the block is under-replicated, and it arranges for further replica to be created on
another node.
THE END

SORRY FOR MY POOR ENGLISH. 
PLEASE SEND YOUR VALUABLE FEEDBACK TO
RAJESH_1290K@YAHOO.COM

Más contenido relacionado

La actualidad más candente

Point To Point Protocol
Point To Point ProtocolPoint To Point Protocol
Point To Point Protocol
Phan Vuong
 
Dijkstra’s algorithm
Dijkstra’s algorithmDijkstra’s algorithm
Dijkstra’s algorithm
faisal2204
 

La actualidad más candente (20)

kerberos
kerberoskerberos
kerberos
 
IEEE 802 standards
IEEE 802 standardsIEEE 802 standards
IEEE 802 standards
 
Simplex stop and_wait_protocol
Simplex stop and_wait_protocolSimplex stop and_wait_protocol
Simplex stop and_wait_protocol
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
Comparison of mqtt and coap protocol
Comparison of mqtt and coap protocolComparison of mqtt and coap protocol
Comparison of mqtt and coap protocol
 
Message authentication
Message authenticationMessage authentication
Message authentication
 
IEEE STANDARDS 802.3,802.4,802.5
IEEE STANDARDS 802.3,802.4,802.5IEEE STANDARDS 802.3,802.4,802.5
IEEE STANDARDS 802.3,802.4,802.5
 
Tcp
TcpTcp
Tcp
 
1.7. eqivalence of nfa and dfa
1.7. eqivalence of nfa and dfa1.7. eqivalence of nfa and dfa
1.7. eqivalence of nfa and dfa
 
Ch 4 linker loader
Ch 4 linker loaderCh 4 linker loader
Ch 4 linker loader
 
Point To Point Protocol
Point To Point ProtocolPoint To Point Protocol
Point To Point Protocol
 
CNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream CiphersCNIT 141: 5. Stream Ciphers
CNIT 141: 5. Stream Ciphers
 
Sliding window protocol
Sliding window protocolSliding window protocol
Sliding window protocol
 
Pgp smime
Pgp smimePgp smime
Pgp smime
 
Dijkstra’s algorithm
Dijkstra’s algorithmDijkstra’s algorithm
Dijkstra’s algorithm
 
Email Security : PGP & SMIME
Email Security : PGP & SMIMEEmail Security : PGP & SMIME
Email Security : PGP & SMIME
 
Steganography
SteganographySteganography
Steganography
 
RSA algorithm
RSA algorithmRSA algorithm
RSA algorithm
 
Csma
CsmaCsma
Csma
 
Computer Networks- Introduction and Data Link Layer
Computer Networks- Introduction and Data Link LayerComputer Networks- Introduction and Data Link Layer
Computer Networks- Introduction and Data Link Layer
 

Destacado

In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

Destacado (14)

Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
 
Improving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time SparkImproving Mobile Payments With Real time Spark
Improving Mobile Payments With Real time Spark
 
Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
A Brief History of Big Data
A Brief History of Big DataA Brief History of Big Data
A Brief History of Big Data
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Similar a Anatomy of file write in hadoop

Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Yahoo Developer Network
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
vmoorthy
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
Wireshark Lab DNS v6.01 Supplement to Computer Networkin.docx
Wireshark Lab DNS v6.01  Supplement to Computer Networkin.docxWireshark Lab DNS v6.01  Supplement to Computer Networkin.docx
Wireshark Lab DNS v6.01 Supplement to Computer Networkin.docx
alanfhall8953
 

Similar a Anatomy of file write in hadoop (20)

Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File SystemFredrick Ishengoma -  HDFS+- Erasure Coding Based Hadoop Distributed File System
Fredrick Ishengoma - HDFS+- Erasure Coding Based Hadoop Distributed File System
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
HDFS client write/read implementation details
HDFS client write/read implementation detailsHDFS client write/read implementation details
HDFS client write/read implementation details
 
Wireshark Lab DNS v6.01 Supplement to Computer Networkin.docx
Wireshark Lab DNS v6.01  Supplement to Computer Networkin.docxWireshark Lab DNS v6.01  Supplement to Computer Networkin.docx
Wireshark Lab DNS v6.01 Supplement to Computer Networkin.docx
 
Data correlation using PySpark and HDFS
Data correlation using PySpark and HDFSData correlation using PySpark and HDFS
Data correlation using PySpark and HDFS
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
An unsupervised framework for effective indexing of BigData
An unsupervised framework for effective indexing of BigDataAn unsupervised framework for effective indexing of BigData
An unsupervised framework for effective indexing of BigData
 
2018a 1324654jhjkhkhkkjhk
2018a 1324654jhjkhkhkkjhk2018a 1324654jhjkhkhkkjhk
2018a 1324654jhjkhkhkkjhk
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at XiaomiHBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi
HBaseConAsia2018 Track1-7: HDFS optimizations for HBase at Xiaomi
 

Último

Último (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 

Anatomy of file write in hadoop

  • 1. M KU Y F O OM P AT OO AN AD H LE FI TE RI W IN AR A OM .C ND OO NA H A YA SH K@ JE 90 RA 12 H_ ES AJ R
  • 2. A SAMPLE HADOOP CLUSTER
  • 3. Data center D1 Name Node Rack R1 R1N1 R1N2 R1N3 R1N4 Rack R2 R2N1 R2N2 R2N3 R2N4 1. This is our example Hadoop cluster. 2. This with has one name node and two racks named R1 and R2 in a data center D1. Each rack has 4 nodes and they are uniquely identified as R1N1, R1N2 and so on. 3. Replication factor is 3. 4. HDFS block size is 64 MB.
  • 4. FACTS TO BE KNOW
  • 5. 1. Name node saves part of HDFS metadata like file location, permission, etc. in files called namespace image and edit logs. Files are stored in HDFS as blocks. These block information are not saved in any file. Instead it is gathered every time the cluster is started. And this information is stored in name node’s memory. 2. Replica Placement : Assuming the replication factor is 3; When a file is written from a data node (say R1N1), Hadoop attempts to save the first replica in same data node (R1N1). Second replica is written into another node (R2N2) in a different rack (R2). Third replica is written into another node (R2N1) in the same rack (R2) where the second replica was saved. 3. Hadoop takes a simple approach in which the network is represented as a tree and the distance between two nodes is the sum of their distances to their closest common ancestor. The levels can be like; “Data Center” > “Rack” > “Node”. Example; ‘/d1/r1/n1’ is a representation for a node named n1 on rack r1 in data center d1. Distance calculation has 4 possible scenarios as; 1. distance(/d1/r1/n1, /d1/r1/n1) = 0 [Processes on same node] 2. distance(/d1/r1/n1, /d1/r1/n2) = 2 [different node is same rack] 3. distance(/d1/r1/n1, /d1/r2/n3) = 4 [node in different rack
  • 6. ANATOMY OF FILE WRITE – HAPPY PATH
  • 7. HDFS Client create() RPC call to create a new file DistributedFileSystem RPC call is complete Name Node sfo_crimes.csv FSDataOutputStream DFSOutputStream RIN1 JVM • Let’s say we are trying to write the “sfo_crimes.csv” file from R1N1. • So a HDFS Client program will run on R1N1’s JVM. • First the HDFS client program calls the method create() on a Java class DistributedFileSystem (subclass of FileSystem). • DFS makes a RPC call to name node to create a new file in the file system's namespace. No blocks are associated to the file at this stage. • Name node performs various checks; ensures the file doesn't exists, the user has the right permissions to create the file. Then name node creates a record for the new file. • Then DFS creates a FSDataOutputStream for the client to write data to. FSDOS wraps a DFSOutputStream, which handles communication with DN and NN. • In response to ‘FileSystem.create()’, HDFS Client receives this FSDataOutputStream.
  • 8. HDFS Client write() FSDataOutputStream DFSOutputStream Name Node Data Queue Ack Queue DataStreamer RIN1 JVM • From now on HDFS Client deals with FSDataOutputStream. • HDFS Client invokes write() on the stream. • Following are the important components involved in a file write; • Data Queue: When client writes data, DFSOS splits into packets and writes into this internal queue. • DataStreamer: The data queue is consumed by this component, which also communicates with name node for block allocation. • Ack Queue: Packets consumed by DataStreamer are temporaroly saved in an this internal queue.
  • 9. HDFS Client write() FSDataOutputStream DFSOutputStream Name Node Data Queue P 6 P 5 P 4 P 3 P 2 P 1 Ack Queue DataStreamer Pipeline RIN1 JVM R1N1 R2N1 R1N2 • As said, data written by client will be converted into packets and stored in data queue. • DataStreamer communicates with NN to allocate new blocks by picking a list of suitable DNs to store the replicas. NN uses ‘Replica Placement’ as a strategy to pick DNs for a block. • The list of DNs form a pipeline. Since the replication factor is assumed as 3, there are 3 nodes picked by NN.
  • 10. HDFS Client write() FSDataOutputStream close() DFSOutputStream Name Node Data Queue P 8 P 7 P 6 P 5 P 4 P 3 Ack Queue P 2 P 1 DataStreamer Pipeline RIN1 JVM Ac k P1 P1 R1N1 R2N1 Ack • P1 R1N2 Ack DataStreamer consumes few packets from data queue. A copy of the consumed data is stored in ‘ack queue’. • DataStreamer streams the packet to first node in pipeline. Once the data is written in DN1, the data is forwarded to next DN. This repeats till last DN. • Once the packet is written to the last DN, an acknowledgement is sent from each DN to DFSOS. The packet P1 is removed from Ack Queue. • The whole process continues till a block is filled. After that, the pipeline is closed and DataStreamer asks NN for fresh set of DNs for next block. And the cycle repeats. • HDFS Client calls the close() method once the write is finished. This would flush all the remaining packets to the pipeline & waits for ack before informing the NN that the write is complete.
  • 11. ANATOMY OF FILE WRITE – DATA NODE WRITE ERROR
  • 12. HDFS Client write() FSDataOutputStream DFSOutputStream Name Node Data Queue P 8 P7 PP 68 P P5 7 P P 6 4 P P 5 3 P P2 4 P 3 1 Ack Queue P 2 P 1 DataStreamer Pipeline RIN1 JVM P1 R1N1 • R2N1 R1N2 A normal write begins with a write() method call from HDFS client on the stream. And let’s say an error occurred while writing to R2N1. • The pipeline will be closed. • Packets in ack queue are moved to front data queue. • The current block on good DNs are given a new identity and its communicated to NN, so the partial block on the failed DN will be deleted if the failed DN recovers later. • The failed data node is removed from pipeline and the remaining data is written to the remaining two DNs. • NN notices that the block is under-replicated, and it arranges for further replica to be created on another node.
  • 13. THE END SORRY FOR MY POOR ENGLISH.  PLEASE SEND YOUR VALUABLE FEEDBACK TO RAJESH_1290K@YAHOO.COM