SlideShare una empresa de Scribd logo
1 de 32
Big Data
    big problems.
What is Big Data?
Volume
Velocity
Variety
Volume
Billions of Things:
    Posts, Tweets and Likes
    Web Transactions
    Sensor Readings
Velocity
Streaming Data:
   Twitter: 500,000,000 TPD
   Walmart: 20,000,000 TPD
   Hopper: 750,000,000 TPD
Variety
Integrating Many Sources of Data:
   Unstructured Web Content
   Semi-structured Logs
   Relational Databases
   Images,Video, Audio
So What’s Changed?
Mobile devices
Social Web
Sensors, Metrics
Digitization of everything
Open Source Tools
•   Hadoop: distributed processing
•   R: predictive analytics for big data
•   Hive, Pig: ad-hoc analytics for Hadoop
•   Mahout: machine learning for Hadoop
•   HBase, Cassandra: distributed databases
•   ElasticSearch: distributed search engine
•   Storm: distributed processing for data streams
"The best minds of my generation are
thinking about how to make people click
ads"
- Jeff Hammerbacher (Facebook, Accel,
Cloudera)
Big Minds + Big Data
Aggregate, Summarize
Detect Patterns
Model, Simulate
Forecast, Predict
Open Data

Reports
Request/Response APIs
Small Data
Text
Text
Hack/reduce
Open Hackspace in Boston
Home for Pre-seed projects,
Community events
Not-for-profit sponsored by
local industry and government
Hack/reduce Cluster
240-core cluster sponsored
by GoGrid, a cloud
computing company.
Available for use at today’s
Open Data Day.
What do you with a
240-core Cluster?
Use the power of many
machines to analyze Big
Data sets.
How do you get computers to
work together like that??

That’s what Hadoop is for.
An Example
Daily Hansard: transcript of
Canadian parliament since 1994
Swearwords.txt (
http://www.bannedwordlist.com)
Who are the most foul-mouthed
Federal MPs?
Results

• 20 years of House of Commons statements
• 511,341 Statements analyzed
• 121,985,310 Words spoken
• 3,839 Swearwords spoken
• 1 in 133 statements has a swearword
Top 5 Swearers
       (absolute)
   Pat Martin         NDP          98

  Randy White      Conservative    88

Alexa McDonough       NDP          52

    Jim Silye      Conservative    50

  Yvan Loubier    Bloc Quebecois   49
Top 5 Swearers
             (relative)
Randy White     Conservative   0.037%   88   299,114

 Dennis Mills     Liberal      0.023%   14   62,221

 Gerry Ritz     Conservative   0.022%   22   99,037

John McCallum   Conservative   0.017%   38   226,155

 John McKay       Liberal      0.016%   44   268,188
Top 5 Words Spoken
   Paul Szabo    1,482,106


   Pat Martin    1,053,365


  Don Boudria    867,204


  Yvan Loubier   861,888


  Peter McKay    844,130
Prime Ministers
Jean Chrétien    11   604,431




  Paul Martin    6    485,990




Stephen Harper   22   620,999
"The best minds of my generation are
thinking about how to make people click
ads"
- Jeff Hammerbacher (Facebook, Accel,
Cloudera)
Joost ouwerkerk

Más contenido relacionado

Similar a Joost ouwerkerk

Wisconsin Strategic Social Media Presentation
Wisconsin Strategic Social Media PresentationWisconsin Strategic Social Media Presentation
Wisconsin Strategic Social Media PresentationGreg Bennett
 
Business Social Media
Business Social MediaBusiness Social Media
Business Social Mediahilweb
 
What Matters to Millennials + Bonus SXSW 2014 Insights!
What Matters to Millennials + Bonus SXSW 2014 Insights!What Matters to Millennials + Bonus SXSW 2014 Insights!
What Matters to Millennials + Bonus SXSW 2014 Insights!naimul
 
GovLoop: The Power of People Like You
GovLoop: The Power of People Like YouGovLoop: The Power of People Like You
GovLoop: The Power of People Like YouGovLoop
 
Reverse Engineering Slack
Reverse Engineering SlackReverse Engineering Slack
Reverse Engineering SlackAgney Menon
 
Todd Park & Macon Philips, Lean Startup SXSW
Todd Park & Macon Philips, Lean Startup SXSWTodd Park & Macon Philips, Lean Startup SXSW
Todd Park & Macon Philips, Lean Startup SXSW500 Startups
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotJen Stirrup
 
Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febJonathan Woodward
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...IABmembership
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Fredrik Olsson
 
Right content, Right Audience: Finding the Perfect Match
Right content, Right Audience: Finding the Perfect Match Right content, Right Audience: Finding the Perfect Match
Right content, Right Audience: Finding the Perfect Match Spredfast
 
The hive introduction
The hive introductionThe hive introduction
The hive introductionThe Hive
 
Understanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalUnderstanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalAndrew Gregoris
 
Trends with Benefits - Social Media Update 2012
Trends with Benefits  - Social Media Update 2012Trends with Benefits  - Social Media Update 2012
Trends with Benefits - Social Media Update 2012Fan Foundry
 
Hack reduce introduction
Hack reduce introductionHack reduce introduction
Hack reduce introductionmontrealouvert
 
Social Media Analytics: The Value Proposition
Social Media Analytics: The Value PropositionSocial Media Analytics: The Value Proposition
Social Media Analytics: The Value PropositionContent Savvy
 
Trends en ontwikkelingen in interactive marketing - Clockwork
Trends en ontwikkelingen in interactive marketing - ClockworkTrends en ontwikkelingen in interactive marketing - Clockwork
Trends en ontwikkelingen in interactive marketing - Clockworkevagroen
 
Big Data, Big Opportunity: Making Sense of Big Data for PR
Big Data, Big Opportunity: Making Sense of Big Data for PRBig Data, Big Opportunity: Making Sense of Big Data for PR
Big Data, Big Opportunity: Making Sense of Big Data for PRCision
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 

Similar a Joost ouwerkerk (20)

Wisconsin Strategic Social Media Presentation
Wisconsin Strategic Social Media PresentationWisconsin Strategic Social Media Presentation
Wisconsin Strategic Social Media Presentation
 
Business Social Media
Business Social MediaBusiness Social Media
Business Social Media
 
What Matters to Millennials + Bonus SXSW 2014 Insights!
What Matters to Millennials + Bonus SXSW 2014 Insights!What Matters to Millennials + Bonus SXSW 2014 Insights!
What Matters to Millennials + Bonus SXSW 2014 Insights!
 
GovLoop: The Power of People Like You
GovLoop: The Power of People Like YouGovLoop: The Power of People Like You
GovLoop: The Power of People Like You
 
Reverse Engineering Slack
Reverse Engineering SlackReverse Engineering Slack
Reverse Engineering Slack
 
Todd Park & Macon Philips, Lean Startup SXSW
Todd Park & Macon Philips, Lean Startup SXSWTodd Park & Macon Philips, Lean Startup SXSW
Todd Park & Macon Philips, Lean Startup SXSW
 
Big Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivotBig Data Visualisation with Hadoop and PowerPivot
Big Data Visualisation with Hadoop and PowerPivot
 
Data Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th febData Culture Series - Keynote - 24th feb
Data Culture Series - Keynote - 24th feb
 
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
Early Lessons Learned in Applying Big Data to TV Advertising presentation Pre...
 
Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...Online text data for machine learning, data science, and research - Who can p...
Online text data for machine learning, data science, and research - Who can p...
 
Right content, Right Audience: Finding the Perfect Match
Right content, Right Audience: Finding the Perfect Match Right content, Right Audience: Finding the Perfect Match
Right content, Right Audience: Finding the Perfect Match
 
The hive introduction
The hive introductionThe hive introduction
The hive introduction
 
Understanding The Big Data Opportunity Final
Understanding The Big Data Opportunity FinalUnderstanding The Big Data Opportunity Final
Understanding The Big Data Opportunity Final
 
Trends with Benefits - Social Media Update 2012
Trends with Benefits  - Social Media Update 2012Trends with Benefits  - Social Media Update 2012
Trends with Benefits - Social Media Update 2012
 
Hack reduce introduction
Hack reduce introductionHack reduce introduction
Hack reduce introduction
 
Social Media Analytics: The Value Proposition
Social Media Analytics: The Value PropositionSocial Media Analytics: The Value Proposition
Social Media Analytics: The Value Proposition
 
Trends en ontwikkelingen in interactive marketing - Clockwork
Trends en ontwikkelingen in interactive marketing - ClockworkTrends en ontwikkelingen in interactive marketing - Clockwork
Trends en ontwikkelingen in interactive marketing - Clockwork
 
Big Data, Big Opportunity: Making Sense of Big Data for PR
Big Data, Big Opportunity: Making Sense of Big Data for PRBig Data, Big Opportunity: Making Sense of Big Data for PR
Big Data, Big Opportunity: Making Sense of Big Data for PR
 
Big data
Big dataBig data
Big data
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 

Más de montrealouvert

5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologies5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologiesmontrealouvert
 
5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juin5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juinmontrealouvert
 
Contrats net - analyse des contrats
Contrats net - analyse des contratsContrats net - analyse des contrats
Contrats net - analyse des contratsmontrealouvert
 
Journée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc GauvreauJournée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc Gauvreaumontrealouvert
 
DataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement CanadaDataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement Canadamontrealouvert
 
Serveur Weather Environnement Canada
Serveur Weather Environnement CanadaServeur Weather Environnement Canada
Serveur Weather Environnement Canadamontrealouvert
 
Présentation opendata christiangendreau
Présentation opendata christiangendreauPrésentation opendata christiangendreau
Présentation opendata christiangendreaumontrealouvert
 
Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés montrealouvert
 
Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)montrealouvert
 
Allumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la CorrutpionAllumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la Corrutpionmontrealouvert
 
Jean Fortier Hackons la Corruption
Jean Fortier Hackons la CorruptionJean Fortier Hackons la Corruption
Jean Fortier Hackons la Corruptionmontrealouvert
 
Présentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruptionPrésentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruptionmontrealouvert
 
Ffctn hackons la-corruption
Ffctn hackons la-corruptionFfctn hackons la-corruption
Ffctn hackons la-corruptionmontrealouvert
 
Communautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directeCommunautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directemontrealouvert
 
Congrès des archivestes
Congrès des archivestesCongrès des archivestes
Congrès des archivestesmontrealouvert
 
Première rencontre publique Québec Ouvert
Première rencontre publique Québec OuvertPremière rencontre publique Québec Ouvert
Première rencontre publique Québec Ouvertmontrealouvert
 
How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...montrealouvert
 
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à QuébecPrésentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québecmontrealouvert
 
WebÉduction Données ouvertes enjeux
WebÉduction Données ouvertes enjeuxWebÉduction Données ouvertes enjeux
WebÉduction Données ouvertes enjeuxmontrealouvert
 

Más de montrealouvert (20)

5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologies5 @ 7 Geek à Sid Lee Technologies
5 @ 7 Geek à Sid Lee Technologies
 
5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juin5@7 Données Ouvertes Montréal Juin
5@7 Données Ouvertes Montréal Juin
 
Contrats net - analyse des contrats
Contrats net - analyse des contratsContrats net - analyse des contrats
Contrats net - analyse des contrats
 
Journée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc GauvreauJournée de la culture ouverte - Luc Gauvreau
Journée de la culture ouverte - Luc Gauvreau
 
DataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement CanadaDataMart - Miguel Tremblay - Environnement Canada
DataMart - Miguel Tremblay - Environnement Canada
 
Serveur Weather Environnement Canada
Serveur Weather Environnement CanadaServeur Weather Environnement Canada
Serveur Weather Environnement Canada
 
Présentation opendata christiangendreau
Présentation opendata christiangendreauPrésentation opendata christiangendreau
Présentation opendata christiangendreau
 
Hack reduce mr-intro
Hack reduce mr-introHack reduce mr-intro
Hack reduce mr-intro
 
Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés Données Ouvertes et les terrains contaminés
Données Ouvertes et les terrains contaminés
 
Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)Conférence corruption Institut du nouveau monde (INM)
Conférence corruption Institut du nouveau monde (INM)
 
Allumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la CorrutpionAllumer - Présentation de LDAC à Hackons la Corrutpion
Allumer - Présentation de LDAC à Hackons la Corrutpion
 
Jean Fortier Hackons la Corruption
Jean Fortier Hackons la CorruptionJean Fortier Hackons la Corruption
Jean Fortier Hackons la Corruption
 
Présentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruptionPrésentation par Nord Ouvert - Hackons la corruption
Présentation par Nord Ouvert - Hackons la corruption
 
Ffctn hackons la-corruption
Ffctn hackons la-corruptionFfctn hackons la-corruption
Ffctn hackons la-corruption
 
Communautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directeCommunautaire médias sociaux et démocratie directe
Communautaire médias sociaux et démocratie directe
 
Congrès des archivestes
Congrès des archivestesCongrès des archivestes
Congrès des archivestes
 
Première rencontre publique Québec Ouvert
Première rencontre publique Québec OuvertPremière rencontre publique Québec Ouvert
Première rencontre publique Québec Ouvert
 
How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...How to build an open data movement in your city, state, or province OKFN data...
How to build an open data movement in your city, state, or province OKFN data...
 
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à QuébecPrésentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
Présentation avec l'équipe Gautrin à l'Assemblée Nationale à Québec
 
WebÉduction Données ouvertes enjeux
WebÉduction Données ouvertes enjeuxWebÉduction Données ouvertes enjeux
WebÉduction Données ouvertes enjeux
 

Joost ouwerkerk

  • 1. Big Data big problems.
  • 2.
  • 3. What is Big Data? Volume Velocity Variety
  • 4. Volume Billions of Things: Posts, Tweets and Likes Web Transactions Sensor Readings
  • 5. Velocity Streaming Data: Twitter: 500,000,000 TPD Walmart: 20,000,000 TPD Hopper: 750,000,000 TPD
  • 6. Variety Integrating Many Sources of Data: Unstructured Web Content Semi-structured Logs Relational Databases Images,Video, Audio
  • 7. So What’s Changed? Mobile devices Social Web Sensors, Metrics Digitization of everything
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Open Source Tools • Hadoop: distributed processing • R: predictive analytics for big data • Hive, Pig: ad-hoc analytics for Hadoop • Mahout: machine learning for Hadoop • HBase, Cassandra: distributed databases • ElasticSearch: distributed search engine • Storm: distributed processing for data streams
  • 13.
  • 14. "The best minds of my generation are thinking about how to make people click ads" - Jeff Hammerbacher (Facebook, Accel, Cloudera)
  • 15. Big Minds + Big Data Aggregate, Summarize Detect Patterns Model, Simulate Forecast, Predict
  • 18. Hack/reduce Open Hackspace in Boston Home for Pre-seed projects, Community events Not-for-profit sponsored by local industry and government
  • 19.
  • 20. Hack/reduce Cluster 240-core cluster sponsored by GoGrid, a cloud computing company. Available for use at today’s Open Data Day.
  • 21. What do you with a 240-core Cluster? Use the power of many machines to analyze Big Data sets.
  • 22. How do you get computers to work together like that?? That’s what Hadoop is for.
  • 23. An Example Daily Hansard: transcript of Canadian parliament since 1994 Swearwords.txt ( http://www.bannedwordlist.com) Who are the most foul-mouthed Federal MPs?
  • 24.
  • 25.
  • 26. Results • 20 years of House of Commons statements • 511,341 Statements analyzed • 121,985,310 Words spoken • 3,839 Swearwords spoken • 1 in 133 statements has a swearword
  • 27. Top 5 Swearers (absolute) Pat Martin NDP 98 Randy White Conservative 88 Alexa McDonough NDP 52 Jim Silye Conservative 50 Yvan Loubier Bloc Quebecois 49
  • 28. Top 5 Swearers (relative) Randy White Conservative 0.037% 88 299,114 Dennis Mills Liberal 0.023% 14 62,221 Gerry Ritz Conservative 0.022% 22 99,037 John McCallum Conservative 0.017% 38 226,155 John McKay Liberal 0.016% 44 268,188
  • 29. Top 5 Words Spoken Paul Szabo 1,482,106 Pat Martin 1,053,365 Don Boudria 867,204 Yvan Loubier 861,888 Peter McKay 844,130
  • 30. Prime Ministers Jean Chrétien 11 604,431 Paul Martin 6 485,990 Stephen Harper 22 620,999
  • 31. "The best minds of my generation are thinking about how to make people click ads" - Jeff Hammerbacher (Facebook, Accel, Cloudera)

Notas del editor

  1. In a 2001 research report [20] and related lectures, META Group (no w Gartner ) analy st Doug Laney defined data growth challenges and opportunities as being three-dimensional, i.e. increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources).
  2. Exabyte = 1,000 petabytes = 1 million terabytes, or 1 trillion gigabytes A popular expression claims that "all words ever spoken by human beings" could be stored in approximately 5 exabytes of data
  3. In Big data there are no requests, no predefined parameters and no structured responses. You are free to intersect anything with anything. You can analyse, mutate, group, split, reorder in any way you can imagine.