Million Monkeys User Group

DO NOT USE PUBLICLY
Million Monkeys PRIOR TO 10/23/12
Headline Goes Here
Jesse Anderson | Curriculum Developer and Instructor
Speaker Name or Subhead Goes Here
November 2012

1

About Me
• Cloudera - Educational Services Team
• Twitter - @jessetanderson
• Blog and more info: http://www.jesse-anderson.com
• Screencasts on Pragmatic Programmers: Buy It Now on
http://www.jesse-anderson.com
• President – Northern Nevada Software Developers Group

2

About Cloudera
• Cloudera is “The commercial Hadoop company”
• Founded by leading experts on Hadoop from
Facebook, Google, Oracle and Yahoo
• Provides consulting and training services for Hadoop users
• Staff includes committers to virtually all Hadoop projects

3

Introduction

• Infinite Monkey Theorem
• Hadoop
• Million Monkeys Algorithm
• Business Case

4

Exponential Growth (aka Big Data)

Odds of finding a group Contiguous
Combinations
of characters is 1 in 26 Characters
raised to the power of
the number of 8 208,827,064,576
contiguous characters
9 5,429,503,678,976

10 141,167,095,653,376

6

Hadoop

• Apache Project
• Reliable, Scalable, Distributed Computing
• Software Framework
• MapReduce
• Distributed File System (HDFS)
• Other projects

7

Map
Create or process the input data

8

Reduce
Process data from Map into something usable

9

Million Monkeys Algorithm

11

Hadoop Scalability
Percent of Linear Scalability
100

80
Percent

60 RDBMS
Hadoop
40

20

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Nodes RDBMS = Relational Database

13

Business Value of Scalability

Scaling does not require Adding more computers
massive re-engineering to cluster gets a
and complete rewrites of predictable increase in
code computational power and
storage

SAVE SAVE

14

Going Viral (and taking over the world)

Covered internationally 26,000 unique
in BBC, Wall Street visits from 119
Journal, Wired and countries in
Slashdot one day

15

Next Steps
• Books
• Hadoop: The Definitive Guide - Tom White
• Hadoop Operations - Eric Sammer
• Cloudera Training
• Developer, Admin, Hive and Pig, HBase, Essentials
• CDH
• Cloudera's Apache Distribution Including Hadoop
• Open Source
• VM Image

16

Conclusion

• MapReduce breaks up problem efficiently
• No code changes to scale
• Incredible scalability
• Enables previously impossible tasks

17

Million Monkeys User Group

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Similar a Million Monkeys User Group

Similar a Million Monkeys User Group (20)

Más de Jesse Anderson

Más de Jesse Anderson (13)

Million Monkeys User Group

Notas del editor