1. Cloud Computing:
What it is, DOs and DON'Ts
Svet Ivantchev, eFaber
Fourth Workshop on
Advanced Computing Techniques in the Microworld,
April 2011
domingo 1 de mayo de 2011
2. Our plan for today
• What Is Cloud Computing?
• Enabling technologies
• Public vs Private Clouds
• Idea of MapReduce with two examples
domingo 1 de mayo de 2011
3. Our plan for tomorrow
• Create a HPC cluster with:
• 184 GB RAM
• 13 TB local disk space and 800 GB persistent storage
• 64 cores @ 2.9 GHz, Intel Nehalem = 268 ECUs
(~268 2007 1.2 GHz Xeons)
• 10 GB network connection between them
domingo 1 de mayo de 2011
4. (Kind of) Evolution
• Grid Computing
• Utility Computing
• Cloud Computing
• Software as a Service (SaaS)
domingo 1 de mayo de 2011
5. Grid Computing
Grid computing is a term referring to the combination of
computer resources from multiple administrative domains
to reach a common goal. The grid can be thought of as a
distributed system with non-interactive workloads that
involve a large number of files.
http://en.wikipedia.org/wiki/Grid_computing
domingo 1 de mayo de 2011
6. Utility Computing
Utility Computing is the packaging of computing
resources, such as computation, storage and services, as a
metered service similar to a traditional public utility (such
as electricity, water, natural gas, or telephone network).
http://en.wikipedia.org/wiki/Utility_computing
domingo 1 de mayo de 2011
7. Cloud Computing
McKinsey & Co. Report
domingo 1 de mayo de 2011
8. Cloud Computing
Cloud computing is a model for enabling convenient, on-
demand network access to a shared pool of configurable
computing resources (e.g., networks, servers, storage,
applications, and services) that can be rapidly provisioned
and released with minimal management effort or service
provider interaction.
NIST
domingo 1 de mayo de 2011
9. Cloud Computing
1. The illusion of infinite computing resources...
2. The elimination of an up-front commitment...
3. The ability to pay for use ... as needed.
UC Berkeley RAD Labs
domingo 1 de mayo de 2011
10. So, what it is?
• Pay-per-use
• Resources are abstracted (virtualized)
• Upscale and downscale on demand
• Self service interface (API included)
domingo 1 de mayo de 2011
11. Enabling technologies
• Virtualisation
• Virtualised Storage
• Web Services
domingo 1 de mayo de 2011
12. Virtualisation
• Xen
• KVM
• WMware
• more...
domingo 1 de mayo de 2011
13. Abstracted Storage
• Distributed File Systems; examples:
• Amazon S3
• RackSpace’s CloudFiles
• HDFS
domingo 1 de mayo de 2011
14. Stack
Software as a Service (SaaS)
Platform as a Service (PaaS)
Infrastructure as a Service (IaaS)
Cloud Enabler(s)
Hardware
domingo 1 de mayo de 2011
15. Public Cloud Services
• Amazon EC2
• RackSpace
• 100s more ...
domingo 1 de mayo de 2011
27. MapReduce
• High level vs low level languages
• Example: MPI/PVM vs MapReduce
domingo 1 de mayo de 2011
28. MRs “Hello world”
Unix-style
“en un lugar de la Mancha de cuyo nombre no quiero
acordarme no ha mucho tiempo que vivía un hidalgo ...”
$ cat i.txt | tr ' ' 'n' | sort | uniq -c
1 Mancha
1 acordarme
1 cuyo
2 de
...
domingo 1 de mayo de 2011
30. Google Books
• 129 000 000 books are publshed so far
• 15 000 000 books scanned (1700-2010)
• 5 000 000 classified and with metadata
Science,Vol. 331, no 6014, pp. 176-182 (Jan 14, 2011):
domingo 1 de mayo de 2011
33. MapReduce
map: (k1, v1) ! list (k2, v2)
reduce: (k2, list(v2)) ! list (v2)
domingo 1 de mayo de 2011
34. MapReduce: Mapper
map(String key, String value):
// key: document name
// value: document contents
for each word w in value:
EmitIntermediate(w, 1);
“en”, 1
“un”, 1
“en un lugar de la Mancha de “lugar”, 1
cuyo nombre no quiero acordarme “de”, 1
no ha mucho tiempo que vivía un “la”, 1
hidalgo” “Mancha”, 1
“de”, 1
...
domingo 1 de mayo de 2011
35. MapReduce: Reducer
reduce(String key, Iterator values):
// key: a word
// values: a list of counts
result = 0;
for each v in values:
result += v;
Emit(result);
“en”, [1] “en”, 1
“un”, [1,1] “un”, 2
“lugar”, [1] “lugar”, 1
“de”, [1] “de”, 1
... ...
domingo 1 de mayo de 2011
36. Dean, J and Ghemawat, S, Comm. ACM,Vol 51, pp. 107--113, (2008)
domingo 1 de mayo de 2011
37. Our input
$ ls -l donquijote_s?.txt
-rw-r--r-- 1 svet staff 1037413 23 abr 18:26 donquijote_s1.txt
-rw-r--r-- 1 svet staff 1099078 23 abr 18:22 donquijote_s2.txt
$ head -6 donquijote_s1.txt
El ingenioso hidalgo don Quijote de la Mancha
TASA
Yo, Juan Gallo de Andrada, escribano de Camara del Rey
nuestro senor, de los que residen en su Consejo, certifico
y doy fe que, habiendo visto por los senores del un libro
domingo 1 de mayo de 2011
38. Python Mapper
#!/usr/bin/python
import sys
import re
def main(argv):
line = sys.stdin.readline()
pattern = re.compile("[a-zA-Z][a-zA-Z0-9]*")
try:
while line:
for word in pattern.findall(line):
print "LongValueSum:" + word.lower() + "t" + "1"
line = sys.stdin.readline()
except "end of file":
return None
if __name__ == "__main__":
main(sys.argv)
domingo 1 de mayo de 2011
39. Test the mapper
$ cat donquijote_s1.txt | ./wsplit.py
LongValueSum:el 1
LongValueSum:ingenioso 1
LongValueSum:hidalgo 1
LongValueSum:don 1
LongValueSum:quijote 1
LongValueSum:de 1
LongValueSum:la 1
LongValueSum:mancha 1
LongValueSum:tasa 1
LongValueSum:yo 1
LongValueSum:juan 1
LongValueSum:gallo 1
LongValueSum:de 1
LongValueSum:andrada 1
domingo 1 de mayo de 2011
58. Final result
$ awk '{print $2 " " $1}' part-00000 | sort -r -n
21477 que
18297 de
18189 y 3352 su
10363 la 2647 don
9824 a 2623 del
9490 el 2539 como
8243 en 2345 me
6335 no 2312 si
5079 se 2284 mas
4748 los 2207 mi
4202 con 2175 quijote
3940 por 2148 sancho
3468 las 2142 es
3461 lo 2077 yo
3398 le 1938 un
1808 dijo
1740 al
1463 para
1400 porque
domingo 1 de mayo de 2011
59. CL alternative
$ elastic-mapreduce --create
--stream
--input s3n://mrbg/input
--mapper s3://mrbg/prog/wsplit.py
--output s3n://mgbr/output/run2
$ elastic-mapreduce --create
domingo 1 de mayo de 2011
61. MapReduce: Mapper
#!/usr/bin/ruby
ARGF.each do |line|
mcsteps = line.strip
unless mcsteps.length == 0
begin
inside = 0
mcsteps.to_i.times do
x, y = rand, rand
inside += 1 if Math.hypot(x,y) < 1.0
end
puts inside.to_s
rescue
# couldn't parse mc steps
end
end
end
domingo 1 de mayo de 2011
62. Pi
$ cat mcs.txt
1000
$ cat mcs.txt | ./mc-pi-mr.rb
776
... create more mcs.txts:
200_000_000
200_000_000
domingo 1 de mayo de 2011
63. MapReduce: Reducer
#!/usr/bin/ruby
count = 0
ARGF.each do |line|
count += line.to_i
end
puts "#{count} points inside"
domingo 1 de mayo de 2011
64. Prepare the EMR
• upload mcsnn.txt to mrbg/mcinput/
• upload mc-mapper.rb to mrbg/prog/
• upload mc-reducer.rb to mrbg/prog/
domingo 1 de mayo de 2011