SlideShare una empresa de Scribd logo
1 de 26
 Cluster Architecture Web Search for a Planet and much more…. Abhijeet Desai desaiabhijeet89@gmail.com 1
2
Google Query Serving Infrastructure Elapsed time: 0.25s, machines involved: 1000s+ 3
PageRank PageRankTM is the core technology to measure the importance of a page Google's theory 	– If page A links to page B 		# Page B is important 		# The link text is irrelevant 	– If many important links point to page A 		# Links from page A are also important 4
5
Key Design Principles Software reliability Use replication for better request throughput and availability  Price/performance beats peak performance Using commodity PCs reduces the cost of computation  6
The Power Problem High density of machines (racks) 	– High power consumption 400-700 W/ft2 		# Typical data center provides 70-150 W/ft2 		# Energy costs 	– Heating 		# Cooling system costs Reducing power 	– Reduce performance (c/p may not reduce!) 	– Faster hardware depreciation (cost up!) 7
Parallelism Lookup of matching docs in a large index 	--> many lookups in a set of smaller indexes followed by a merge step A query stream 	--> multiple streams 		(each handled by a cluster) Adding machines to a pool increases serving capacity 8
Hardware Level Consideration  Instruction level parallelism does not help Multiple simple, in-order, short-pipeline core Thread level parallelism Memory system with moderate sized L2 cache is enough Large shared-memory machines are not required to boost the performance 9
GFS (Google File System) Design ,[object Object]
 Data transfers happen directly between clients/chunk servers
 Files broken into chunks (typically 64 MB)10
GFS Usage @ Google • 200+ clusters • Many clusters of 1000s of machines • Pools of 1000s of clients • 4+ PB Filesystems • 40 GB/s read/write load 	– (in the presence of frequent HW failures) 11
The Machinery 12
Architectural view of the storage hierarchy 13
Clusters through the years “Google” Circa 1997 (google.stanford.edu) Google (circa 1999) 14
Clusters through the years Google (new data center 2001) Google Data Center (Circa 2000) 3 days later 15
Current Design • In-house rack design • PC-class motherboards • Low-end storage and networking hardware • Linux • + in-house software 16
Container Datacenter  17
Container Datacenter  18
Multicore Computing 19
Comparison Between Custom built & High-end Servers  20
Implications of the Computing Environment Stuff Breaks • If you have one server, it may stay up three years (1,000 days) • If you have 10,000 servers, expect to lose ten a day • “Ultra-reliable” hardware doesn’t really help • At large scale, super-fancy reliable hardware still fails, albeit less often 	– software still needs to be fault-tolerant 	– commodity machines without fancy hardware give better performance/$ • Reliability has to come from the software • Making it easier to write distributed programs 21
Infrastructure for Search Systems Several key pieces of infrastructure: 	– GFS 	– MapReduce 	– BigTable 22
MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library: 	– automatic parallelization 	– load balancing 	– network and disk transfer              optimizations 	– handling of machine failures 	– robustness 	– improvements to core library benefit all users of library! 23
Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results • Outline stays the same, map and reduce change to fit the problem 24

Más contenido relacionado

La actualidad más candente

High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
Divyen Patel
 
Parallel Processing Presentation2
Parallel Processing Presentation2Parallel Processing Presentation2
Parallel Processing Presentation2
daniyalqureshi712
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
Manish Singh
 
Distributed Computing Report
Distributed Computing ReportDistributed Computing Report
Distributed Computing Report
IIT Kharagpur
 

La actualidad más candente (20)

Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
Virtual Machine
Virtual MachineVirtual Machine
Virtual Machine
 
Parallel Processing Presentation2
Parallel Processing Presentation2Parallel Processing Presentation2
Parallel Processing Presentation2
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Parallel Computing
Parallel Computing Parallel Computing
Parallel Computing
 
Message passing interface
Message passing interfaceMessage passing interface
Message passing interface
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
An introduction to the Design of Warehouse-Scale Computers
An introduction to the Design of Warehouse-Scale ComputersAn introduction to the Design of Warehouse-Scale Computers
An introduction to the Design of Warehouse-Scale Computers
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1The Anatomy Of The Google Architecture Fina Lv1.1
The Anatomy Of The Google Architecture Fina Lv1.1
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
 
Grid computing ppt
Grid computing pptGrid computing ppt
Grid computing ppt
 
Google File System
Google File SystemGoogle File System
Google File System
 
Distributed Computing Report
Distributed Computing ReportDistributed Computing Report
Distributed Computing Report
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 

Destacado

Google Architecture - Breaking it Open
Google Architecture - Breaking it OpenGoogle Architecture - Breaking it Open
Google Architecture - Breaking it Open
HARMAN Services
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architecture
Divyangee Jain
 
It 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processingIt 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processing
Harish Khodke
 
Digital image processing unit 1
Digital image processing unit 1Digital image processing unit 1
Digital image processing unit 1
Anantharaj Manoj
 

Destacado (20)

Google Architecture - Breaking it Open
Google Architecture - Breaking it OpenGoogle Architecture - Breaking it Open
Google Architecture - Breaking it Open
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Google history nd architecture
Google history nd architectureGoogle history nd architecture
Google history nd architecture
 
Cluster computing pptl (2)
Cluster computing pptl (2)Cluster computing pptl (2)
Cluster computing pptl (2)
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
 
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.OVERVIEW  OF FACEBOOK SCALABLE ARCHITECTURE.
OVERVIEW OF FACEBOOK SCALABLE ARCHITECTURE.
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
Facebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challengeFacebook architecture presentation: scalability challenge
Facebook architecture presentation: scalability challenge
 
Facebook Architecture - Breaking it Open
Facebook Architecture - Breaking it OpenFacebook Architecture - Breaking it Open
Facebook Architecture - Breaking it Open
 
Computer cluster
Computer clusterComputer cluster
Computer cluster
 
It 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processingIt 4-yr-1-sem-digital image processing
It 4-yr-1-sem-digital image processing
 
Digital image processing unit 1
Digital image processing unit 1Digital image processing unit 1
Digital image processing unit 1
 
Dip Unit Test-I
Dip Unit Test-IDip Unit Test-I
Dip Unit Test-I
 
OGSA
OGSAOGSA
OGSA
 
Globus ppt
Globus pptGlobus ppt
Globus ppt
 
MPI
MPIMPI
MPI
 
Beowulf cluster
Beowulf clusterBeowulf cluster
Beowulf cluster
 
cluster computing
cluster computingcluster computing
cluster computing
 

Similar a Google cluster architecture

Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Google data centers
Google data centersGoogle data centers
Google data centers
Ali Al-Ugali
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
Nguyen Duong
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
Khazret Sapenov
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
Arun Nagarajan
 

Similar a Google cluster architecture (20)

CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Google data centers
Google data centersGoogle data centers
Google data centers
 
54665962-Nav-Cluster-Computing.pptx
54665962-Nav-Cluster-Computing.pptx54665962-Nav-Cluster-Computing.pptx
54665962-Nav-Cluster-Computing.pptx
 
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
An Introduction to Cloud Computing by Robert Grossman 08-06-09 (v19)
 
CDP_2(1).pptx
CDP_2(1).pptxCDP_2(1).pptx
CDP_2(1).pptx
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
The Rise of Cloud Computing Systems
The Rise of Cloud Computing SystemsThe Rise of Cloud Computing Systems
The Rise of Cloud Computing Systems
 
Architecting Cloudy Applications
Architecting Cloudy ApplicationsArchitecting Cloudy Applications
Architecting Cloudy Applications
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
Cloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sureCloud computing skepticism - But i'm sure
Cloud computing skepticism - But i'm sure
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Introduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big DataIntroduction to Cloud Computing and Big Data
Introduction to Cloud Computing and Big Data
 
Data Center of the Future v1.0.pptx
Data Center of the Future v1.0.pptxData Center of the Future v1.0.pptx
Data Center of the Future v1.0.pptx
 
node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
 
Cloud
CloudCloud
Cloud
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 
Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?Webinar: SQL for Machine Data?
Webinar: SQL for Machine Data?
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Google cluster architecture

  • 1. Cluster Architecture Web Search for a Planet and much more…. Abhijeet Desai desaiabhijeet89@gmail.com 1
  • 2. 2
  • 3. Google Query Serving Infrastructure Elapsed time: 0.25s, machines involved: 1000s+ 3
  • 4. PageRank PageRankTM is the core technology to measure the importance of a page Google's theory – If page A links to page B # Page B is important # The link text is irrelevant – If many important links point to page A # Links from page A are also important 4
  • 5. 5
  • 6. Key Design Principles Software reliability Use replication for better request throughput and availability Price/performance beats peak performance Using commodity PCs reduces the cost of computation 6
  • 7. The Power Problem High density of machines (racks) – High power consumption 400-700 W/ft2 # Typical data center provides 70-150 W/ft2 # Energy costs – Heating # Cooling system costs Reducing power – Reduce performance (c/p may not reduce!) – Faster hardware depreciation (cost up!) 7
  • 8. Parallelism Lookup of matching docs in a large index --> many lookups in a set of smaller indexes followed by a merge step A query stream --> multiple streams (each handled by a cluster) Adding machines to a pool increases serving capacity 8
  • 9. Hardware Level Consideration Instruction level parallelism does not help Multiple simple, in-order, short-pipeline core Thread level parallelism Memory system with moderate sized L2 cache is enough Large shared-memory machines are not required to boost the performance 9
  • 10.
  • 11. Data transfers happen directly between clients/chunk servers
  • 12. Files broken into chunks (typically 64 MB)10
  • 13. GFS Usage @ Google • 200+ clusters • Many clusters of 1000s of machines • Pools of 1000s of clients • 4+ PB Filesystems • 40 GB/s read/write load – (in the presence of frequent HW failures) 11
  • 15. Architectural view of the storage hierarchy 13
  • 16. Clusters through the years “Google” Circa 1997 (google.stanford.edu) Google (circa 1999) 14
  • 17. Clusters through the years Google (new data center 2001) Google Data Center (Circa 2000) 3 days later 15
  • 18. Current Design • In-house rack design • PC-class motherboards • Low-end storage and networking hardware • Linux • + in-house software 16
  • 22. Comparison Between Custom built & High-end Servers 20
  • 23. Implications of the Computing Environment Stuff Breaks • If you have one server, it may stay up three years (1,000 days) • If you have 10,000 servers, expect to lose ten a day • “Ultra-reliable” hardware doesn’t really help • At large scale, super-fancy reliable hardware still fails, albeit less often – software still needs to be fault-tolerant – commodity machines without fancy hardware give better performance/$ • Reliability has to come from the software • Making it easier to write distributed programs 21
  • 24. Infrastructure for Search Systems Several key pieces of infrastructure: – GFS – MapReduce – BigTable 22
  • 25. MapReduce • A simple programming model that applies to many large-scale computing problems • Hide messy details in MapReduce runtime library: – automatic parallelization – load balancing – network and disk transfer optimizations – handling of machine failures – robustness – improvements to core library benefit all users of library! 23
  • 26. Typical problem solved by MapReduce • Read a lot of data • Map: extract something you care about from each record • Shuffle and Sort • Reduce: aggregate, summarize, filter, or transform • Write the results • Outline stays the same, map and reduce change to fit the problem 24
  • 27. Conclusions For a large scale web service system like Google – Design the algorithm which can be easily parallelized – Design the architecture using replication to achieve distributed computing/storage and fault tolerance – Be aware of the power problem which significantly restricts the use of parallelism 25
  • 28. References 1. Luiz André Barroso , Jeffrey Dean , UrsHölzle, Web Search for a Planet: The Google Cluster Architecture, IEEE Micro, v.23 n.2, p.22-28, March 2003 [doi>10.1109/MM.2003.1196112] 2. S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Proc. Seventh World Wide Web Conf. (WWW7), International World Wide Web Conference Committee (IW3C2), 1998, pp. 107-117. 3. “TPC Benchmark C Full Disclosure Report for IBM eserverxSeries 440 using Microsoft SQL Server 2000 Enterprise Edition and Microsoft Windows .NET Datacenter Server 2003, TPC-C Version 5.0,” http://www.tpc.org/results/FDR/TPCC/ibm.x4408way.c5.fdr.02110801.pdf. 4. D. Marr et al., “Hyper-Threading Technology Architecture and Microarchitecture: A Hypertext History,” Intel Technology J., vol. 6, issue 1, Feb. 2002. 5. L. Hammond, B. Nayfeh, and K. Olukotun, “A Single-Chip Multiprocessor,” Computer, vol. 30, no. 9, Sept. 1997, pp. 79-85. 6. L.A. Barroso et al., “Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing,” Proc. 27th ACM Int’l Symp. Computer Architecture, ACM Press, 2000, pp. 282-293. 7. L.A. Barroso, K. Gharachorloo, and E. Bugnion, “Memory System Characterization of Commercial Workloads,” Proc. 25th ACM Int’l Symp. Computer Architecture, ACM Press, 1998, pp. 3-14. 26