21. What do you with a
240-core Cluster?
Use the power of many
machines to analyze Big
Data sets.
22. How do you get computers to
work together like that??
That’s what Hadoop is for.
23. An Example
Daily Hansard: transcript of
Canadian parliament since 1994
Swearwords.txt (
http://www.bannedwordlist.com)
Who are the most foul-mouthed
Federal MPs?
24.
25.
26. Results
• 20 years of House of Commons statements
• 511,341 Statements analyzed
• 121,985,310 Words spoken
• 3,839 Swearwords spoken
• 1 in 133 statements has a swearword
27. Top 5 Swearers
(absolute)
Pat Martin NDP 98
Randy White Conservative 88
Alexa McDonough NDP 52
Jim Silye Conservative 50
Yvan Loubier Bloc Quebecois 49
28. Top 5 Swearers
(relative)
Randy White Conservative 0.037% 88 299,114
Dennis Mills Liberal 0.023% 14 62,221
Gerry Ritz Conservative 0.022% 22 99,037
John McCallum Conservative 0.017% 38 226,155
John McKay Liberal 0.016% 44 268,188
29. Top 5 Words Spoken
Paul Szabo 1,482,106
Pat Martin 1,053,365
Don Boudria 867,204
Yvan Loubier 861,888
Peter McKay 844,130
31. "The best minds of my generation are
thinking about how to make people click
ads"
- Jeff Hammerbacher (Facebook, Accel,
Cloudera)
Notas del editor
In a 2001 research report [20] and related lectures, META Group (no w Gartner ) analy st Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources).
Exabyte = 1,000 petabytes = 1 million terabytes, or 1 trillion gigabytes A popular expression claims that "all words ever spoken by human beings" could be stored in approximately 5 exabytes of data
In Big data there are no requests, no predefined parameters and no structured responses. You are free to intersect anything with anything. You can analyse, mutate, group, split, reorder in any way you can imagine.