Igniting Next Level Productivity with AI-Infused Data Integration Workflows
The 5 principles of google's cloud
1. E N TE RP RI S E
A R C H I T
E C T U R E
THE 5 PRINCIPLES OF OF
GOOGLE’S
”CLOUD”
Patrik Svensson, 2011, ptrksvnssn@gmail.com
torsdag den 12 maj 2011
2. E N TE RP RI S E
THE VISION OF GOOGLE
A R C H I T
E C T U R E
torsdag den 12 maj 2011
3. E N TE RP RI S E
A
E
R
C
C
T
H
U
I
R
T
E
THE 5 PRINCIPLES
• Everything is a service (or an application in
Android)
• Relentless technical focus (thinking at nanoscale)
• Data centers are the foundation
• Code is king, Data is king kong
• Identify and keep track on your users
torsdag den 12 maj 2011
4. E N TE RP RI S E
A R C H I T
E C T U R E
torsdag den 12 maj 2011
5. #1 EVERYTHING IS A
E N TE RP RI S E
SERVICE (OR AN
A
E
R
C
C
T
H
U
I
R
T
E APPLICATION)
torsdag den 12 maj 2011
6. E N TE RP RI S E #2 RELENTLESS
A
E
R
C
C
T
H
U
I
R
T
E TECHNICAL FOCUS
• Jedis build their own
lightsabres
• Parallelize, Distribute, Cache,
Compress, Redundantize
everything
• Latency is VERY evil Source: http://www.flickr.com/photos/60994749@N07/5557591956/
torsdag den 12 maj 2011
7. EXAMPLE: ”NUMBERS
E N TE RP RI S E
EVERYONE SHOULD
A
E
R
C
C
T
H
U
I
R
T
E KNOW”
1,000,000 ns = 1 ms
1,000,000,000 ns = 1 s
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
8. E N TE RP RI S E #3 DATA CENTERS ARE
A
E
R
C
C
T
H
U
I
R
T
E
THE FOUNDATION
torsdag den 12 maj 2011
9. E N TE RP RI S E
A
E
R
C
C
T
H
U
I
R
T
E ECONOMIES OF SCALE
• ~40 data centers in 2009, 1000,000 machines
Source: http://techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/
torsdag den 12 maj 2011
10. E N TE RP RI S E
A R C H I T
E C T U R E
torsdag den 12 maj 2011
11. E N TE RP RI S E #4 CODE IS KING, DATA
IS KING KONG
A R C H I T
E C T U R E
Enterprise Architecture
Technical Architecture i.e. which technologies do we use
DATA CENTERS DATA CODE CONTROL USERS
"We need: "We need to build applications
"We need: and services, application-, "We need scheduling "We need to identify our
One Distributed File
Cooling, Power, integration- & data platforms, synchronization, lock users to be able to
Systems, Distributed
Perimeter Networks, parallell computing platforms & services, i.e. various interact, differentiate and
One Shared memory,
Containers, Racks, use an open source OS, upon forms of control customize the user
& common data
Switches & Hardware at our data center/data platform" mechanisms for data and experience"
formats to get scale
low cost that scale" code"
and low cost"
Implementation Architecture i.e. how do we implement the technologies
Android, Chrome
App Engine, Gmail, Search, Index GFS master
GFS, Python, Java, C++ Google Work Queue, OpenID, OAuth, Google
Google Container- BigTable, Chubby,Netscalar, Google Accounts available for most
Protocol Buffers, Json
based Data Centers Protocol Buffers HTTP Server, (Spanner) services
Sawzall, Dremel, Percolator
MapReduce
Linux
torsdag den 12 maj 2011
12. E N TE RP RI S E "Google's mission is to
A
E
R
C
C
T
H
U
I
R
T
E
ABOUT DATA organize the world's
information and make it
available to all"
+20 Petabyte/day
200
150
100
~10 Terabyte/day
50
~2,5 Terabyte
0
Structured, Numerical Unstructured, Textual Communication, Traffic
torsdag den 12 maj 2011
13. E N TE RP RI S E
A
E
R
C
C
T
H
U
I
R
T
E DATA CENTER ”ENTRY”
• The same entry to each Data Center
• ~50 caching (using Squid)
• Built their own HTTP servers/farms
Source: Ed Austin, ”The Anatomy of the Google Architecture”
torsdag den 12 maj 2011
14. E N TE RP RI S E
A
E
R
C
C
T
H
U
I
R
T
E INSIDE THE CONTAINERS
• Customized commodity servers, is customized racks in
containers (+1000 servers), organized into clusters
• All containers ”cloned” and look the same
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
15. THE SAME HW, OS AND
E N TE RP RI S E
FILESYSTEM
A
E
R
C
C
T
H
U
I
R
T
E EVERYWHERE
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
16. E N TE RP RI S E
A
E
R
C
C
T
H
U
I
R
T
E BIGDATA AS DATABASE
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
17. E N TE RP RI S E BIGDATA IS COLUMN-
A
E
R
C
C
T
H
U
I
R
T
E BASED
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
18. E N TE RP RI S E
A
E
R
C
C
T
H
U
I
R
T
E BIGDATA NEEDS GFS
• Use GFS to store data and logs
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
19. MAPREDUCE -
E N TE RP RI S E A PARALLELL
A
E
R
C
C
T
H
U
I
R
T
E
COMPUTING PLATFORM
Source: Jeff Dean, ”Designs, Lessons and Advise from Building Large Distributed Systems”
torsdag den 12 maj 2011
20. E N TE RP RI S E ABOUT CODING AT
A
E
R
C
C
T
H
U
I
R
T
E GOOGLE
• Linux as operating system everywhere - is open source, highly customized for this (Android is also
a higly customized version of Linux)
• Serialization/Integration - Protocol buffers (RPC) runs at nano speed, internally used for
”everything”, Json and RESTful used for external API’s
• Application-oriented Programming languages - mainly Python, Java and C++
• Data-oriented programming languages - Percolator, Sawzall, Dremel for various data
processing task (so specialised tools for data!)
• The Business Applications - Gmail, Search, App Engine etc - built upon data center
infrasctructure, data platform and above
torsdag den 12 maj 2011
21. E N TE RP RI S E #5 IDENTIFY AND KEEP
A
E
R
C
C
T
H
U
I
R
T
E TRACK OF YOUR USERS
• You need a google account to start
Android properly
• OpenSocial is a collaborate effort to
compete against Facebook
• OpenID is an identity standard and OAuth
is a standard for authorizing services
• Google is identifying and tracking every
step you take within their domains
torsdag den 12 maj 2011