SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
INSE 6620 (Cloud Computing Security and Privacy)
Cloud Computing 101
Prof. Lingyu Wang
1
Enabling TechnologiesEnabling Technologies
Cloud computing relies on:
1. Hardware advancements
2. Web x.0 technologies
3 Vi t li ti3. Virtualization
4. Distributed file system
2
Ghemawat et al., The Google File System; Dean et al., MapReduce: Simplified Data Processing on Large Clusters;
Chang et al., Bigtable: A Distributed Storage System for Structured Data
Google Server FarmsGoogle Server Farms
Early days…
…today…today
3
How Does it Work?How Does it Work?
How are data stored?
The Google File System (GFS)
How are data organized?
The Bigtable
How are computations supported?
M dMapreduce
4
Google File System (GFS) MotivationGoogle File System (GFS) Motivation
Need a scalable DFS for
Large distributed data-intensive applications
Performance, Reliability, Scalability and Availability
M th t diti l DFSMore than traditional DFS
Component failure is norm, not exception
built from inexpensive commodity componentsbuilt from inexpensive commodity components
Files are large (multi-GB)
Workloads: Large streaming reads sequential writesg g q
Co-design applications and file system API
Sustained bandwidth more critical than low latency
5
File StructureFile Structure
Files are divided into chunks
Fixed-size chunks (64MB)
Replicated over chunkservers, called replicas
3 replicas by default
Unique 64-bit chunk handles
h k f lChunks as Linux files chunk
file
6
…
blocks
ArchitectureArchitecture
metadata
data
Contact single master
Obtain chunk locations
Contact one of chunkservers
7
Contact one of chunkservers
Obtain data
Architecture - MasterArchitecture Master
Master stores three types of meta data
File & chunk namespaces
Mapping from files to chunks
Location of chunk replicasLocation of chunk replicas
Stored in memory
HeartbeatsHeartbeats
Having one master
Global knowledge allows better placement /Global knowledge allows better placement /
replication
Simplifies design
8
Mutation OperationsMutation Operations
Primary replica
Holds lease assigned by masterHolds lease assigned by master
Assigns serial order for all mutation
operations performed on replicas
Write operationWrite operation
1-2: client obtains replica locations
and identity of primary replica
3: client pushes data to replicas3 c e t pus es data to ep cas
4: client issues update request to
primary
5: primary forwards/performs write
requestrequest
6: primary receives replies from
replica
7: primary replies to clientp y p
9
Fault Tolerance and DiagnosisFault Tolerance and Diagnosis
Fast Recovery
Both master and chunkserver are designed toBoth master and chunkserver are designed to
restart in seconds
Chunk replication
E h h k i li t d lti l h kEach chunk is replicated on multiple chunkservers
on different racks
Master replicationp
Master’s state is replicated
Monitoring outside GFS may restart master process
Data integrityData integrity
Checksumming to detect corruption of stored data
Each chunkserver independently verifies integrity
same data may look different on different chunk servers
10
ConclusionConclusion
Major Innovations
File system API tailored to stylized workload
Single-master design to simplify coordination
Metadata fit in memoryMetadata fit in memory
Flat namespace
11
MapReduce MotivationMapReduce Motivation
Recall “Cost associativity”: 1k servers*1hr=1server*1k hrs
Nice, but how?
How to run my task on 1k servers?
Distributed computing, many things to worry about
Customized task, can’t use standard applications
MapRed ce a p og amming model/abst actionMapReduce: a programming model/abstraction
that supports this while hiding messy details:
ParallelizationParallelization
Data distribution
Fault-tolerance
Load balancing
12
Map/ReduceMap/Reduce
Map/Reduce
Inspired by LISP
(map square ‘(1 2 3 4))(map square ‘(1 2 3 4))
(1 4 9 16)
(reduce + ‘(1 4 9 16))
(+ 16 (+ 9 (+ 4 1) ) )
30
(reduce + (map square (map – l1 l2))))
13
Programming ModelProgramming Model
Input & Output: each a set of key/value pairs
Programmer specifies two functions:Programmer specifies two functions:
map (in_key, in_value) -> list(out_key,
intermediate value)intermediate_value)
Processes input key/value pair to generate intermediate
pairs
(transparently, the underlying system groups/sorts(transparently, the underlying system groups/sorts
intermediate values based on out_keys)
reduce (out_key, list(intermediate_value)) ->
list(out_value)( _ )
Given all intermediate values for a particular key,
produces a set of merged output values (usually just one)
Many real world problems can be representedMany real world problems can be represented
using these two functions
14
Example: Count Word OccurrencesExample: Count Word Occurrences
Input consists of (url, contents) pairs
map(key=url, val=contents):
For each word w in contents, emit (w, “1”)
ed ce(ke o d al es niq co nts)reduce(key=word, values=uniq_counts):
Sum all “1”s in values list
Emit result “(word sum)”Emit result (word, sum)
15
Example: Count Word OccurrencesExample: Count Word Occurrences
map(key=url, val=contents):
Fo each o d in contents emit ( “1”)For each word w in contents, emit (w, “1”)
reduce(key=word, values=uniq_counts):
Sum all “1”s in values list
Emit result “(word, sum)”
see bob throw
see 1 bob 1
see bob throw
see spot run
bob 1
run 1
1
run 1
see 2
t 1see 1
spot 1
throw 1
spot 1
throw 1
throw 1
grouping/
sorting 16
Example: Distributed GrepExample: Distributed Grep
Input consists of (url+offset, single line)
map(key=url+offset, val=line):
If contents matches regexp, emit (line, “1”)
d (k l l )reduce(key=line, values=uniq_counts):
Don’t do anything; just emit line
17
Reverse Web-Link GraphReverse Web Link Graph
Map
For each target URL found in page source
Emit a <target, source> pair
R dReduce
Concatenate a list of all source URLs
Outputs: <target list (source)> pairsOutputs: <target, list (source)> pairs
18
Inverted IndexInverted Index
Map
Reduce
19
More ExamplesMore Examples
Distributed sort
Map: extracts key from each record, emits a <key,
record>
Reduce: emits all pairs unchangedReduce: emits all pairs unchanged
Relies on underlying partitioning and orderingy g p g g
functionalities
20
Widely Used at GoogleWidely Used at Google
Example uses:Example uses:
distributed grep distributed sort web link-graph reversal
term-vector / host web access log stats inverted index construction
i i l hi
document clustering machine learning
statistical machine
translation
... ... ... 21
Usage in Aug 2004Usage in Aug 2004
Number of jobs 29,423
Average job completion time 634 secsAverage job completion time 634 secs
Machine days used 79,186 days
Input data read 3,288 TB
d d d d 8Intermediate data produced 758 TB
Output data written 193 TB
Average worker machines per job 157Average worker machines per job 157
Average worker deaths per job 1.2
Average map tasks per job 3,351
Average reduce tasks per job 55Average reduce tasks per job 55
Unique map implementations 395
Unique reduce implementations 269
U i / d bi ti 426Unique map/reduce combinations 426
22
Implementation OverviewImplementation Overview
Typical cluster:
100s-1000s of 2-CPU x86 machines, 2-4 GB of
memory
100MBPS or 1GBPS but limited bisection bandwidth100MBPS or 1GBPS, but limited bisection bandwidth
Storage is on local IDE disks
GFS: distributed file system manages datay g
Job scheduling system: jobs made up of tasks,
scheduler assigns tasks to machines
Implementation is a C++ library linked into
user programsuser programs
23
ParallelizationParallelization
How is task distributed?
Partition input key/value pairs into equal-sized
chunks of 16-64MB, run map() tasks in parallel
After all map()s are complete consolidate allAfter all map()s are complete, consolidate all
emitted values for each unique emitted key
Now partition space of output map keys, and run
reduce() in parallel
Typical setting:
2,000 machines
M = 200,000
R 5 000R = 5,000
24
Execution Overview
(0) mapreduce(spec, &result)
M inputp
splits of 16-
64MB each
R regions
• Read all intermediate data
• Sort it by intermediate keys
g
Partitioning function
hash(intermediate_key) mod R
25
Execution DetailsExecution Details
26
Task Granularity & PipeliningTask Granularity & Pipelining
Fine granularity tasks: map tasks >>
himachines
Minimizes time for fault recovery
Better dynamic load balancingBetter dynamic load balancing
Often use 200,000 map & 5000 reduce tasks
Running on 2000 machinesRunning on 2000 machines
27
Fault ToleranceFault Tolerance
Worker failure handled via re-execution
Detect failure via periodic heartbeats
Re-execute completed + in-progress map tasks
Due to inaccessible resultsDue to inaccessible results
Only re-execute in progress reduce tasks
Results of completed tasks stored in global file system
Robust: lost 80 machines once finished ok
Master failure not handled
Rare in practice
Abort and re-run at client
28
Refinement: Redundant ExecutionRefinement: Redundant Execution
Problem: Slow workers may significantly delay
l ti ti h l t d f t kcompletion time when close to end of tasks
Other jobs consuming resources on machine
Bad disks w/ soft errors transfer data slowlyBad disks w/ soft errors transfer data slowly
Weird things: processor caches disabled
Solution: Near end of phase, spawn backup
taskstas s
Whichever one finishes first "wins“
Dramatically shortens job completion time
29
Refinement: Locality OptimizationRefinement: Locality Optimization
Network bandwidth is a relatively scarce
t itresource, so to save it:
Input data stored on local disks in GFS
Schedule a map task on machine hosting a replicaSchedule a map task on machine hosting a replica
If can’t, schedule it close to a replica (e.g., a host
using the same switch)g )
Effect
Thousands of machines read input at local diskp
speed
Without this, rack switches limit read rate
30
Refinement: Combiner FunctionRefinement: Combiner Function
Purpose: reduce data sent over network
Combiner function: performs partial merging of
intermediate data at the map worker
Typically, combiner function == reducer function
Only difference is how to handle outputOnly difference is how to handle output
E.g. word count
31
PerformancePerformance
Tests run on cluster of 1800 machines:
4 GB of memory, dual-processor 2 GHz Xeons
Dual 160 GB IDE disks
Gigabit Ethernet NIC bisection bandwidth 100 GbpsGigabit Ethernet NIC, bisection bandwidth 100 Gbps
Two benchmarks:
Grep Scan 1010 100-byte records to extract recordsGrep Scan 1010 100-byte records to extract records
matching a rare pattern (92K matching records)
M=15,000 (input split size about 64MB)
R=1R=1
Sort Sort 1010 100-byte records
M=15,000 (input split size about 64MB)
R 4 000R=4,000
32
GrepGrep
Locality optimization helps:
1800 machines read 1 TB at peak ~31 GB/s
W/out this, rack switches would limit to 10 GB/s
St t h d i i ifi t f h t j bStartup overhead is significant for short jobs
Total time about 150 seconds; 1 minute startup
timetime
33
Sort 44% %
Sort 44%
longer
5%
longer
34
ExperienceExperience
Rewrote Google's production indexing System
i M R dusing MapReduce
Set of 10, 14, 17, 21, 24 MapReduce operations
New code is simpler easier to understandNew code is simpler, easier to understand
3800 lines C++ 700
Easier to understand and change indexing processg g p
(from months to days)
Easier to operate
M R d h dl f il l hiMapReduce handles failures, slow machines
Easy to improve performance
Add more machinesAdd more machines
35
ConclusionConclusion
MapReduce proven to be useful abstraction
Greatly simplifies large-scale computations
Fun to use:
focus on problem,
let library deal w/ messy details
36
Bigtable MotivationBigtable Motivation
Storage for (semi-)structured data
e.g., Google Earth, Google Finance, Personalized
Search
ScaleScale
Lots of data
Millions of machinesMillions of machines
Different project/applications
Hundreds of millions of users
37
Why Not a DBMS?Why Not a DBMS?
Few DBMS’s support the requisite scale
Required DB with wide scalability, wide applicability,
high performance and high availability
Couldn’t afford it if there was oneCouldn t afford it if there was one
Most DBMSs require very expensive infrastructure
DBMSs provide more than Google needsDBMSs provide more than Google needs
E.g., full transactions, SQL
Google has highly optimized lower-levelGoogle has highly optimized lower level
systems that could be exploited
GFS, Chubby, MapReduce, Job scheduling, y, p , g
38
BigtableBigtable
“A BigTable is a sparse, distributed, persistent
ltidi i l t d Th imultidimensional sorted map. The map is
indexed by a row key, a column key, and a
timestamp; each value in the map is antimestamp; each value in the map is an
uninterpreted array of bytes.”
39
Data ModelData Model
(row, column, timestamp) -> cell contents
Rows
Arbitrary string
Access to data in a row is atomic
Ordered lexicographically
40
Data ModelData Model
Column
Tow-level name structure: Column families and
columns
Column Family is the unit of access controlColumn Family is the unit of access control
41
Data ModelData Model
Timestamps
Store different versions of data in a cell
Lookup options
Return most recent K valuesReturn most recent K values
Return all values
42
Data ModelData Model
The row range for a table is dynamically
titi d i t “t bl t ”partitioned into “tablets”
Tablet is the unit for distribution and load
b l n ingbalancing
43
Building BlocksBuilding Blocks
Google File System (GFS)
stores persistent data
Scheduler
schedules jobs onto machines
Chubby
L k i di t ib t d l kLock service: distributed lock manager
e.g., master election, location bootstrapping
MapReduce (optional)MapReduce (optional)
Data processing
Read/write Bigtable dataRead/write Bigtable data
44
ImplementationImplementation
Single-master distributed system
Three major components
Library that linked into every client
One master server
Assigning tablets to tablet servers
Addition and expiration of tablet servers, balancing tablet-dd o a d e p a o o ab e se e s, ba a c g ab e
server load
Metadata Operations
Many tablet serversMany tablet servers
Tablet servers handle read and write requests to its table
Splits tablets that have grown too large
45
ImplementationImplementation
46
How to locate a Tablet?How to locate a Tablet?
Given a row, how do clients find the location of
th t bl t h th t tthe tablet whose row range covers the target
row?
47
Tablet AssignmentTablet Assignment
Chubby
Tablet server registers itself by getting a lock in a
specific directory chubby
Chubby gives “lease” on lock, must be renewed periodicallyChubby gives lease on lock, must be renewed periodically
Server loses lock if it gets disconnected
Master monitors this directory to find which servers
i t/ liexist/are alive
If server not contactable/has lost lock, master grabs lock
and reassigns tablets
48

Más contenido relacionado

La actualidad más candente

Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Shivkumar Babshetty
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...NECST Lab @ Politecnico di Milano
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Titus Damaiyanti
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPTathagata Das
 
Wayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics DeliveryWayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics DeliveryInfluxData
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesDataWorks Summit/Hadoop Summit
 

La actualidad más candente (20)

Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization Advanced Hadoop Tuning and Optimization
Advanced Hadoop Tuning and Optimization
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
H04502048051
H04502048051H04502048051
H04502048051
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
XeMPUPiL: Towards Performance-aware Power Capping Orchestrator for the Xen Hy...
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
Hadoop installation and Running KMeans Clustering with MapReduce Program on H...
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
03 Hadoop
03 Hadoop03 Hadoop
03 Hadoop
 
Wayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics DeliveryWayfair Use Case: The four R's of Metrics Delivery
Wayfair Use Case: The four R's of Metrics Delivery
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 

Destacado (6)

6620handout5t
6620handout5t6620handout5t
6620handout5t
 
6620handout4t
6620handout4t6620handout4t
6620handout4t
 
Handout1o
Handout1oHandout1o
Handout1o
 
Datacenter as computer
Datacenter as computerDatacenter as computer
Datacenter as computer
 
Nist cloud comp
Nist cloud compNist cloud comp
Nist cloud comp
 
Xen
XenXen
Xen
 

Similar a Cloud Computing 101 (INSE 6620

Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterHarsh Kevadia
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...areej qasrawi
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster InnardsMartin Dvorak
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Hao Chen
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009lilyco
 

Similar a Cloud Computing 101 (INSE 6620 (20)

Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015Eagle from eBay at China Hadoop Summit 2015
Eagle from eBay at China Hadoop Summit 2015
 
Sector Sphere 2009
Sector Sphere 2009Sector Sphere 2009
Sector Sphere 2009
 

Más de Shahbaz Sidhu (15)

Virtualization
VirtualizationVirtualization
Virtualization
 
Paravirtualization
ParavirtualizationParavirtualization
Paravirtualization
 
Outsourcing control
Outsourcing controlOutsourcing control
Outsourcing control
 
Map reduce
Map reduceMap reduce
Map reduce
 
Live migration
Live migrationLive migration
Live migration
 
Handout2o
Handout2oHandout2o
Handout2o
 
Hadoop
HadoopHadoop
Hadoop
 
Gfs
GfsGfs
Gfs
 
En
EnEn
En
 
Cloudcom13
Cloudcom13Cloudcom13
Cloudcom13
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
About clouds
About cloudsAbout clouds
About clouds
 
6620handout5o
6620handout5o6620handout5o
6620handout5o
 
6620handout4o
6620handout4o6620handout4o
6620handout4o
 
Xen revisited
Xen revisitedXen revisited
Xen revisited
 

Cloud Computing 101 (INSE 6620

  • 1. INSE 6620 (Cloud Computing Security and Privacy) Cloud Computing 101 Prof. Lingyu Wang 1
  • 2. Enabling TechnologiesEnabling Technologies Cloud computing relies on: 1. Hardware advancements 2. Web x.0 technologies 3 Vi t li ti3. Virtualization 4. Distributed file system 2 Ghemawat et al., The Google File System; Dean et al., MapReduce: Simplified Data Processing on Large Clusters; Chang et al., Bigtable: A Distributed Storage System for Structured Data
  • 3. Google Server FarmsGoogle Server Farms Early days… …today…today 3
  • 4. How Does it Work?How Does it Work? How are data stored? The Google File System (GFS) How are data organized? The Bigtable How are computations supported? M dMapreduce 4
  • 5. Google File System (GFS) MotivationGoogle File System (GFS) Motivation Need a scalable DFS for Large distributed data-intensive applications Performance, Reliability, Scalability and Availability M th t diti l DFSMore than traditional DFS Component failure is norm, not exception built from inexpensive commodity componentsbuilt from inexpensive commodity components Files are large (multi-GB) Workloads: Large streaming reads sequential writesg g q Co-design applications and file system API Sustained bandwidth more critical than low latency 5
  • 6. File StructureFile Structure Files are divided into chunks Fixed-size chunks (64MB) Replicated over chunkservers, called replicas 3 replicas by default Unique 64-bit chunk handles h k f lChunks as Linux files chunk file 6 … blocks
  • 7. ArchitectureArchitecture metadata data Contact single master Obtain chunk locations Contact one of chunkservers 7 Contact one of chunkservers Obtain data
  • 8. Architecture - MasterArchitecture Master Master stores three types of meta data File & chunk namespaces Mapping from files to chunks Location of chunk replicasLocation of chunk replicas Stored in memory HeartbeatsHeartbeats Having one master Global knowledge allows better placement /Global knowledge allows better placement / replication Simplifies design 8
  • 9. Mutation OperationsMutation Operations Primary replica Holds lease assigned by masterHolds lease assigned by master Assigns serial order for all mutation operations performed on replicas Write operationWrite operation 1-2: client obtains replica locations and identity of primary replica 3: client pushes data to replicas3 c e t pus es data to ep cas 4: client issues update request to primary 5: primary forwards/performs write requestrequest 6: primary receives replies from replica 7: primary replies to clientp y p 9
  • 10. Fault Tolerance and DiagnosisFault Tolerance and Diagnosis Fast Recovery Both master and chunkserver are designed toBoth master and chunkserver are designed to restart in seconds Chunk replication E h h k i li t d lti l h kEach chunk is replicated on multiple chunkservers on different racks Master replicationp Master’s state is replicated Monitoring outside GFS may restart master process Data integrityData integrity Checksumming to detect corruption of stored data Each chunkserver independently verifies integrity same data may look different on different chunk servers 10
  • 11. ConclusionConclusion Major Innovations File system API tailored to stylized workload Single-master design to simplify coordination Metadata fit in memoryMetadata fit in memory Flat namespace 11
  • 12. MapReduce MotivationMapReduce Motivation Recall “Cost associativity”: 1k servers*1hr=1server*1k hrs Nice, but how? How to run my task on 1k servers? Distributed computing, many things to worry about Customized task, can’t use standard applications MapRed ce a p og amming model/abst actionMapReduce: a programming model/abstraction that supports this while hiding messy details: ParallelizationParallelization Data distribution Fault-tolerance Load balancing 12
  • 13. Map/ReduceMap/Reduce Map/Reduce Inspired by LISP (map square ‘(1 2 3 4))(map square ‘(1 2 3 4)) (1 4 9 16) (reduce + ‘(1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) 30 (reduce + (map square (map – l1 l2)))) 13
  • 14. Programming ModelProgramming Model Input & Output: each a set of key/value pairs Programmer specifies two functions:Programmer specifies two functions: map (in_key, in_value) -> list(out_key, intermediate value)intermediate_value) Processes input key/value pair to generate intermediate pairs (transparently, the underlying system groups/sorts(transparently, the underlying system groups/sorts intermediate values based on out_keys) reduce (out_key, list(intermediate_value)) -> list(out_value)( _ ) Given all intermediate values for a particular key, produces a set of merged output values (usually just one) Many real world problems can be representedMany real world problems can be represented using these two functions 14
  • 15. Example: Count Word OccurrencesExample: Count Word Occurrences Input consists of (url, contents) pairs map(key=url, val=contents): For each word w in contents, emit (w, “1”) ed ce(ke o d al es niq co nts)reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word sum)”Emit result (word, sum) 15
  • 16. Example: Count Word OccurrencesExample: Count Word Occurrences map(key=url, val=contents): Fo each o d in contents emit ( “1”)For each word w in contents, emit (w, “1”) reduce(key=word, values=uniq_counts): Sum all “1”s in values list Emit result “(word, sum)” see bob throw see 1 bob 1 see bob throw see spot run bob 1 run 1 1 run 1 see 2 t 1see 1 spot 1 throw 1 spot 1 throw 1 throw 1 grouping/ sorting 16
  • 17. Example: Distributed GrepExample: Distributed Grep Input consists of (url+offset, single line) map(key=url+offset, val=line): If contents matches regexp, emit (line, “1”) d (k l l )reduce(key=line, values=uniq_counts): Don’t do anything; just emit line 17
  • 18. Reverse Web-Link GraphReverse Web Link Graph Map For each target URL found in page source Emit a <target, source> pair R dReduce Concatenate a list of all source URLs Outputs: <target list (source)> pairsOutputs: <target, list (source)> pairs 18
  • 20. More ExamplesMore Examples Distributed sort Map: extracts key from each record, emits a <key, record> Reduce: emits all pairs unchangedReduce: emits all pairs unchanged Relies on underlying partitioning and orderingy g p g g functionalities 20
  • 21. Widely Used at GoogleWidely Used at Google Example uses:Example uses: distributed grep distributed sort web link-graph reversal term-vector / host web access log stats inverted index construction i i l hi document clustering machine learning statistical machine translation ... ... ... 21
  • 22. Usage in Aug 2004Usage in Aug 2004 Number of jobs 29,423 Average job completion time 634 secsAverage job completion time 634 secs Machine days used 79,186 days Input data read 3,288 TB d d d d 8Intermediate data produced 758 TB Output data written 193 TB Average worker machines per job 157Average worker machines per job 157 Average worker deaths per job 1.2 Average map tasks per job 3,351 Average reduce tasks per job 55Average reduce tasks per job 55 Unique map implementations 395 Unique reduce implementations 269 U i / d bi ti 426Unique map/reduce combinations 426 22
  • 23. Implementation OverviewImplementation Overview Typical cluster: 100s-1000s of 2-CPU x86 machines, 2-4 GB of memory 100MBPS or 1GBPS but limited bisection bandwidth100MBPS or 1GBPS, but limited bisection bandwidth Storage is on local IDE disks GFS: distributed file system manages datay g Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines Implementation is a C++ library linked into user programsuser programs 23
  • 24. ParallelizationParallelization How is task distributed? Partition input key/value pairs into equal-sized chunks of 16-64MB, run map() tasks in parallel After all map()s are complete consolidate allAfter all map()s are complete, consolidate all emitted values for each unique emitted key Now partition space of output map keys, and run reduce() in parallel Typical setting: 2,000 machines M = 200,000 R 5 000R = 5,000 24
  • 25. Execution Overview (0) mapreduce(spec, &result) M inputp splits of 16- 64MB each R regions • Read all intermediate data • Sort it by intermediate keys g Partitioning function hash(intermediate_key) mod R 25
  • 27. Task Granularity & PipeliningTask Granularity & Pipelining Fine granularity tasks: map tasks >> himachines Minimizes time for fault recovery Better dynamic load balancingBetter dynamic load balancing Often use 200,000 map & 5000 reduce tasks Running on 2000 machinesRunning on 2000 machines 27
  • 28. Fault ToleranceFault Tolerance Worker failure handled via re-execution Detect failure via periodic heartbeats Re-execute completed + in-progress map tasks Due to inaccessible resultsDue to inaccessible results Only re-execute in progress reduce tasks Results of completed tasks stored in global file system Robust: lost 80 machines once finished ok Master failure not handled Rare in practice Abort and re-run at client 28
  • 29. Refinement: Redundant ExecutionRefinement: Redundant Execution Problem: Slow workers may significantly delay l ti ti h l t d f t kcompletion time when close to end of tasks Other jobs consuming resources on machine Bad disks w/ soft errors transfer data slowlyBad disks w/ soft errors transfer data slowly Weird things: processor caches disabled Solution: Near end of phase, spawn backup taskstas s Whichever one finishes first "wins“ Dramatically shortens job completion time 29
  • 30. Refinement: Locality OptimizationRefinement: Locality Optimization Network bandwidth is a relatively scarce t itresource, so to save it: Input data stored on local disks in GFS Schedule a map task on machine hosting a replicaSchedule a map task on machine hosting a replica If can’t, schedule it close to a replica (e.g., a host using the same switch)g ) Effect Thousands of machines read input at local diskp speed Without this, rack switches limit read rate 30
  • 31. Refinement: Combiner FunctionRefinement: Combiner Function Purpose: reduce data sent over network Combiner function: performs partial merging of intermediate data at the map worker Typically, combiner function == reducer function Only difference is how to handle outputOnly difference is how to handle output E.g. word count 31
  • 32. PerformancePerformance Tests run on cluster of 1800 machines: 4 GB of memory, dual-processor 2 GHz Xeons Dual 160 GB IDE disks Gigabit Ethernet NIC bisection bandwidth 100 GbpsGigabit Ethernet NIC, bisection bandwidth 100 Gbps Two benchmarks: Grep Scan 1010 100-byte records to extract recordsGrep Scan 1010 100-byte records to extract records matching a rare pattern (92K matching records) M=15,000 (input split size about 64MB) R=1R=1 Sort Sort 1010 100-byte records M=15,000 (input split size about 64MB) R 4 000R=4,000 32
  • 33. GrepGrep Locality optimization helps: 1800 machines read 1 TB at peak ~31 GB/s W/out this, rack switches would limit to 10 GB/s St t h d i i ifi t f h t j bStartup overhead is significant for short jobs Total time about 150 seconds; 1 minute startup timetime 33
  • 34. Sort 44% % Sort 44% longer 5% longer 34
  • 35. ExperienceExperience Rewrote Google's production indexing System i M R dusing MapReduce Set of 10, 14, 17, 21, 24 MapReduce operations New code is simpler easier to understandNew code is simpler, easier to understand 3800 lines C++ 700 Easier to understand and change indexing processg g p (from months to days) Easier to operate M R d h dl f il l hiMapReduce handles failures, slow machines Easy to improve performance Add more machinesAdd more machines 35
  • 36. ConclusionConclusion MapReduce proven to be useful abstraction Greatly simplifies large-scale computations Fun to use: focus on problem, let library deal w/ messy details 36
  • 37. Bigtable MotivationBigtable Motivation Storage for (semi-)structured data e.g., Google Earth, Google Finance, Personalized Search ScaleScale Lots of data Millions of machinesMillions of machines Different project/applications Hundreds of millions of users 37
  • 38. Why Not a DBMS?Why Not a DBMS? Few DBMS’s support the requisite scale Required DB with wide scalability, wide applicability, high performance and high availability Couldn’t afford it if there was oneCouldn t afford it if there was one Most DBMSs require very expensive infrastructure DBMSs provide more than Google needsDBMSs provide more than Google needs E.g., full transactions, SQL Google has highly optimized lower-levelGoogle has highly optimized lower level systems that could be exploited GFS, Chubby, MapReduce, Job scheduling, y, p , g 38
  • 39. BigtableBigtable “A BigTable is a sparse, distributed, persistent ltidi i l t d Th imultidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is antimestamp; each value in the map is an uninterpreted array of bytes.” 39
  • 40. Data ModelData Model (row, column, timestamp) -> cell contents Rows Arbitrary string Access to data in a row is atomic Ordered lexicographically 40
  • 41. Data ModelData Model Column Tow-level name structure: Column families and columns Column Family is the unit of access controlColumn Family is the unit of access control 41
  • 42. Data ModelData Model Timestamps Store different versions of data in a cell Lookup options Return most recent K valuesReturn most recent K values Return all values 42
  • 43. Data ModelData Model The row range for a table is dynamically titi d i t “t bl t ”partitioned into “tablets” Tablet is the unit for distribution and load b l n ingbalancing 43
  • 44. Building BlocksBuilding Blocks Google File System (GFS) stores persistent data Scheduler schedules jobs onto machines Chubby L k i di t ib t d l kLock service: distributed lock manager e.g., master election, location bootstrapping MapReduce (optional)MapReduce (optional) Data processing Read/write Bigtable dataRead/write Bigtable data 44
  • 45. ImplementationImplementation Single-master distributed system Three major components Library that linked into every client One master server Assigning tablets to tablet servers Addition and expiration of tablet servers, balancing tablet-dd o a d e p a o o ab e se e s, ba a c g ab e server load Metadata Operations Many tablet serversMany tablet servers Tablet servers handle read and write requests to its table Splits tablets that have grown too large 45
  • 47. How to locate a Tablet?How to locate a Tablet? Given a row, how do clients find the location of th t bl t h th t tthe tablet whose row range covers the target row? 47
  • 48. Tablet AssignmentTablet Assignment Chubby Tablet server registers itself by getting a lock in a specific directory chubby Chubby gives “lease” on lock, must be renewed periodicallyChubby gives lease on lock, must be renewed periodically Server loses lock if it gets disconnected Master monitors this directory to find which servers i t/ liexist/are alive If server not contactable/has lost lock, master grabs lock and reassigns tablets 48