SlideShare a Scribd company logo
1 of 21
Download to read offline
E N TE RP RI S E
A     R     C      H      I   T
E     C     T      U      R   E




                                  THE 5 PRINCIPLES OF OF
                                  GOOGLE’S
                                  ”CLOUD”
                                  Patrik Svensson, 2011, ptrksvnssn@gmail.com




torsdag den 12 maj 2011
E N TE RP RI S E
                                  THE VISION OF GOOGLE
A     R     C      H      I   T
E     C     T      U      R   E




torsdag den 12 maj 2011
E N TE RP RI S E
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E
                                      THE 5 PRINCIPLES
                                  •   Everything is a service (or an application in
                                      Android)

                                  •   Relentless technical focus (thinking at nanoscale)

                                  •   Data centers are the foundation

                                  •   Code is king, Data is king kong

                                  •   Identify and keep track on your users



torsdag den 12 maj 2011
E N TE RP RI S E
A     R     C      H      I   T
E     C     T      U      R   E




torsdag den 12 maj 2011
#1 EVERYTHING IS A
E N TE RP RI S E
                                  SERVICE (OR AN
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   APPLICATION)




torsdag den 12 maj 2011
E N TE RP RI S E                  #2 RELENTLESS
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   TECHNICAL FOCUS
       •     Jedis build their own
             lightsabres

       •     Parallelize, Distribute, Cache,
             Compress, Redundantize
             everything

       •     Latency is VERY evil              Source: http://www.flickr.com/photos/60994749@N07/5557591956/




torsdag den 12 maj 2011
EXAMPLE: ”NUMBERS
E N TE RP RI S E
                                  EVERYONE SHOULD
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   KNOW”




                                                                                                          1,000,000 ns        = 1 ms
                                                                                                          1,000,000,000 ns = 1 s

                                   Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”



torsdag den 12 maj 2011
E N TE RP RI S E                  #3 DATA CENTERS ARE
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E
                                  THE FOUNDATION




torsdag den 12 maj 2011
E N TE RP RI S E
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                                  T
                                  E    ECONOMIES OF SCALE
                              •       ~40 data centers in 2009, 1000,000 machines




                                                     Source: http://techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/




torsdag den 12 maj 2011
E N TE RP RI S E
A     R     C      H      I   T
E     C     T      U      R   E




torsdag den 12 maj 2011
E N TE RP RI S E                                 #4 CODE IS KING, DATA
                                                 IS KING KONG
A     R     C      H         I      T
E     C     T      U         R      E




                                                     Enterprise Architecture
                                                   Technical Architecture i.e. which technologies do we use
                          DATA CENTERS                       DATA                      CODE                          CONTROL                        USERS

                                                         "We need:         "We need to build applications
                                "We need:                                     and services, application-,       "We need scheduling         "We need to identify our
                                                    One Distributed File
                              Cooling, Power,                               integration- & data platforms,      synchronization, lock          users to be able to
                                                    Systems, Distributed
                           Perimeter Networks,                             parallell computing platforms &       services, i.e. various     interact, differentiate and
                                                    One Shared memory,
                             Containers, Racks,                             use an open source OS, upon           forms of control             customize the user
                                                      & common data
                          Switches & Hardware at                           our data center/data platform"      mechanisms for data and             experience"
                                                    formats to get scale
                            low cost that scale"                                                                        code"
                                                       and low cost"




                                             Implementation Architecture i.e. how do we implement the technologies
                                                                                   Android, Chrome
                                                                            App Engine, Gmail, Search, Index          GFS master
                                                             GFS,                  Python, Java, C++               Google Work Queue,          OpenID, OAuth, Google
                             Google Container-             BigTable,                                             Chubby,Netscalar, Google    Accounts available for most
                                                                                 Protocol Buffers, Json
                            based Data Centers          Protocol Buffers                                          HTTP Server, (Spanner)              services
                                                                              Sawzall, Dremel, Percolator
                                                                                     MapReduce
                                                                                         Linux




torsdag den 12 maj 2011
E N TE RP RI S E                                                                   "Google's mission is to
A
E
      R
      C
            C
            T
                   H
                   U
                              I
                              R
                                  T
                                  E
                                        ABOUT DATA                                  organize the world's
                                                                                  information and make it
                                                                                       available to all"


                                                                                  +20 Petabyte/day
                   200



                   150



                   100


                                                          ~10 Terabyte/day
                     50

                                      ~2,5 Terabyte
                          0
                                  Structured, Numerical   Unstructured, Textual   Communication, Traffic




torsdag den 12 maj 2011
E N TE RP RI S E
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   DATA CENTER ”ENTRY”
                                  •   The same entry to each Data Center

                                  •   ~50 caching (using Squid)

                                  •   Built their own HTTP servers/farms




                                             Source: Ed Austin, ”The Anatomy of the Google Architecture”




torsdag den 12 maj 2011
E N TE RP RI S E
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   INSIDE THE CONTAINERS
                                  •   Customized commodity servers, is customized racks in
                                      containers (+1000 servers), organized into clusters

                                  •   All containers ”cloned” and look the same




                                                                   Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”




torsdag den 12 maj 2011
THE SAME HW, OS AND
E N TE RP RI S E
                                  FILESYSTEM
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   EVERYWHERE




                                  Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”




torsdag den 12 maj 2011
E N TE RP RI S E
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   BIGDATA AS DATABASE




                                  Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”




torsdag den 12 maj 2011
E N TE RP RI S E                  BIGDATA IS COLUMN-
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   BASED




                                  Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”



torsdag den 12 maj 2011
E N TE RP RI S E
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E       BIGDATA NEEDS GFS
                                  •   Use GFS to store data and logs




                                      Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”




torsdag den 12 maj 2011
MAPREDUCE -
E N TE RP RI S E                  A PARALLELL
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E
                                  COMPUTING PLATFORM




                                  Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”




torsdag den 12 maj 2011
E N TE RP RI S E                  ABOUT CODING AT
A
E
      R
      C
            C
            T
                   H
                   U
                          I
                          R
                              T
                              E   GOOGLE
      •     Linux as operating system everywhere - is open source, highly customized for this (Android is also
            a higly customized version of Linux)

      •     Serialization/Integration - Protocol buffers (RPC) runs at nano speed, internally used for
            ”everything”, Json and RESTful used for external API’s

      •     Application-oriented Programming languages - mainly Python, Java and C++

      •     Data-oriented programming languages - Percolator, Sawzall, Dremel for various data
            processing task (so specialised tools for data!)

      •     The Business Applications - Gmail, Search, App Engine etc - built upon data center
            infrasctructure, data platform and above




torsdag den 12 maj 2011
E N TE RP RI S E                  #5 IDENTIFY AND KEEP
A
E
      R
      C
              C
              T
                   H
                   U
                          I
                          R
                              T
                              E   TRACK OF YOUR USERS
          •       You need a google account to start
                  Android properly

          •       OpenSocial is a collaborate effort to
                  compete against Facebook

          •       OpenID is an identity standard and OAuth
                  is a standard for authorizing services

          •       Google is identifying and tracking every
                  step you take within their domains



torsdag den 12 maj 2011

More Related Content

Similar to The 5 principles of google's cloud

The Modern Software Engineer - Nuno Job
The Modern Software Engineer - Nuno JobThe Modern Software Engineer - Nuno Job
The Modern Software Engineer - Nuno JobGreta Strolyte
 
Painless OpenStack Deployments Powered by Puppet
Painless OpenStack Deployments Powered by PuppetPainless OpenStack Deployments Powered by Puppet
Painless OpenStack Deployments Powered by Puppetaedocw
 
0900 revision anne mac donald
0900 revision anne mac donald0900 revision anne mac donald
0900 revision anne mac donaldMediaPost
 
Terminology in openEHR
Terminology in openEHRTerminology in openEHR
Terminology in openEHRPablo Pazos
 
6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...
6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...
6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...Tim55Ehrler
 
The Changing Face of Government IT
The Changing Face of Government ITThe Changing Face of Government IT
The Changing Face of Government ITDustin Haisler
 
John Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonKrishna Sankar
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012OSCON Byrum
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentationTheo Schlossnagle
 
LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19Alberto Paro
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 

Similar to The 5 principles of google's cloud (16)

Promise notes
Promise notesPromise notes
Promise notes
 
The Modern Software Engineer - Nuno Job
The Modern Software Engineer - Nuno JobThe Modern Software Engineer - Nuno Job
The Modern Software Engineer - Nuno Job
 
Painless OpenStack Deployments Powered by Puppet
Painless OpenStack Deployments Powered by PuppetPainless OpenStack Deployments Powered by Puppet
Painless OpenStack Deployments Powered by Puppet
 
0900 revision anne mac donald
0900 revision anne mac donald0900 revision anne mac donald
0900 revision anne mac donald
 
Terminology in openEHR
Terminology in openEHRTerminology in openEHR
Terminology in openEHR
 
6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...
6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...
6th OA Conference - Apr 2005 - Into the Depths of OpenAccess - Timing Constra...
 
The Changing Face of Government IT
The Changing Face of Government ITThe Changing Face of Government IT
The Changing Face of Government IT
 
John Eberhardt NSTAC Testimony
John Eberhardt NSTAC TestimonyJohn Eberhardt NSTAC Testimony
John Eberhardt NSTAC Testimony
 
ET Ch - 2.pptx
ET Ch - 2.pptxET Ch - 2.pptx
ET Ch - 2.pptx
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
 
Monitoring is easy, why are we so bad at it presentation
Monitoring is easy, why are we so bad at it  presentationMonitoring is easy, why are we so bad at it  presentation
Monitoring is easy, why are we so bad at it presentation
 
LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19LUISS - Deep Learning and data analyses - 09/01/19
LUISS - Deep Learning and data analyses - 09/01/19
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Complete placement guide(technical)
Complete placement guide(technical)Complete placement guide(technical)
Complete placement guide(technical)
 

Recently uploaded

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 

Recently uploaded (20)

Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 

The 5 principles of google's cloud

  • 1. E N TE RP RI S E A R C H I T E C T U R E THE 5 PRINCIPLES OF OF GOOGLE’S ”CLOUD” Patrik Svensson, 2011, ptrksvnssn@gmail.com torsdag den 12 maj 2011
  • 2. E N TE RP RI S E THE VISION OF GOOGLE A R C H I T E C T U R E torsdag den 12 maj 2011
  • 3. E N TE RP RI S E A E R C C T H U I R T E THE 5 PRINCIPLES • Everything is a service (or an application in Android) • Relentless technical focus (thinking at nanoscale) • Data centers are the foundation • Code is king, Data is king kong • Identify and keep track on your users torsdag den 12 maj 2011
  • 4. E N TE RP RI S E A R C H I T E C T U R E torsdag den 12 maj 2011
  • 5. #1 EVERYTHING IS A E N TE RP RI S E SERVICE (OR AN A E R C C T H U I R T E APPLICATION) torsdag den 12 maj 2011
  • 6. E N TE RP RI S E #2 RELENTLESS A E R C C T H U I R T E TECHNICAL FOCUS • Jedis build their own lightsabres • Parallelize, Distribute, Cache, Compress, Redundantize everything • Latency is VERY evil Source: http://www.flickr.com/photos/60994749@N07/5557591956/ torsdag den 12 maj 2011
  • 7. EXAMPLE: ”NUMBERS E N TE RP RI S E EVERYONE SHOULD A E R C C T H U I R T E KNOW” 1,000,000 ns = 1 ms 1,000,000,000 ns = 1 s Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 8. E N TE RP RI S E #3 DATA CENTERS ARE A E R C C T H U I R T E THE FOUNDATION torsdag den 12 maj 2011
  • 9. E N TE RP RI S E A E R C C T H U I R T E ECONOMIES OF SCALE • ~40 data centers in 2009, 1000,000 machines Source: http://techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/ torsdag den 12 maj 2011
  • 10. E N TE RP RI S E A R C H I T E C T U R E torsdag den 12 maj 2011
  • 11. E N TE RP RI S E #4 CODE IS KING, DATA IS KING KONG A R C H I T E C T U R E Enterprise Architecture Technical Architecture i.e. which technologies do we use DATA CENTERS DATA CODE CONTROL USERS "We need: "We need to build applications "We need: and services, application-, "We need scheduling "We need to identify our One Distributed File Cooling, Power, integration- & data platforms, synchronization, lock users to be able to Systems, Distributed Perimeter Networks, parallell computing platforms & services, i.e. various interact, differentiate and One Shared memory, Containers, Racks, use an open source OS, upon forms of control customize the user & common data Switches & Hardware at our data center/data platform" mechanisms for data and experience" formats to get scale low cost that scale" code" and low cost" Implementation Architecture i.e. how do we implement the technologies Android, Chrome App Engine, Gmail, Search, Index GFS master GFS, Python, Java, C++ Google Work Queue, OpenID, OAuth, Google Google Container- BigTable, Chubby,Netscalar, Google Accounts available for most Protocol Buffers, Json based Data Centers Protocol Buffers HTTP Server, (Spanner) services Sawzall, Dremel, Percolator MapReduce Linux torsdag den 12 maj 2011
  • 12. E N TE RP RI S E "Google's mission is to A E R C C T H U I R T E ABOUT DATA organize the world's information and make it available to all" +20 Petabyte/day 200 150 100 ~10 Terabyte/day 50 ~2,5 Terabyte 0 Structured, Numerical Unstructured, Textual Communication, Traffic torsdag den 12 maj 2011
  • 13. E N TE RP RI S E A E R C C T H U I R T E DATA CENTER ”ENTRY” • The same entry to each Data Center • ~50 caching (using Squid) • Built their own HTTP servers/farms Source: Ed Austin, ”The Anatomy of the Google Architecture” torsdag den 12 maj 2011
  • 14. E N TE RP RI S E A E R C C T H U I R T E INSIDE THE CONTAINERS • Customized commodity servers, is customized racks in containers (+1000 servers), organized into clusters • All containers ”cloned” and look the same Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 15. THE SAME HW, OS AND E N TE RP RI S E FILESYSTEM A E R C C T H U I R T E EVERYWHERE Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 16. E N TE RP RI S E A E R C C T H U I R T E BIGDATA AS DATABASE Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 17. E N TE RP RI S E BIGDATA IS COLUMN- A E R C C T H U I R T E BASED Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 18. E N TE RP RI S E A E R C C T H U I R T E BIGDATA NEEDS GFS • Use GFS to store data and logs Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 19. MAPREDUCE - E N TE RP RI S E A PARALLELL A E R C C T H U I R T E COMPUTING PLATFORM Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems” torsdag den 12 maj 2011
  • 20. E N TE RP RI S E ABOUT CODING AT A E R C C T H U I R T E GOOGLE • Linux as operating system everywhere - is open source, highly customized for this (Android is also a higly customized version of Linux) • Serialization/Integration - Protocol buffers (RPC) runs at nano speed, internally used for ”everything”, Json and RESTful used for external API’s • Application-oriented Programming languages - mainly Python, Java and C++ • Data-oriented programming languages - Percolator, Sawzall, Dremel for various data processing task (so specialised tools for data!) • The Business Applications - Gmail, Search, App Engine etc - built upon data center infrasctructure, data platform and above torsdag den 12 maj 2011
  • 21. E N TE RP RI S E #5 IDENTIFY AND KEEP A E R C C T H U I R T E TRACK OF YOUR USERS • You need a google account to start Android properly • OpenSocial is a collaborate effort to compete against Facebook • OpenID is an identity standard and OAuth is a standard for authorizing services • Google is identifying and tracking every step you take within their domains torsdag den 12 maj 2011