SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Map Reduce
  An Example
Who am I?
My name is James Grant (james@queeg.org).

I'm a developer here at Brandwatch.

For the last three years I've been a Data
Engineer at Last.fm and the maintainer of their
Hadoop Cluster.
Coming up…
●   What happens during MapReduce?
●   Plays and Reach from music listening data
●   The Mapper pseudo code
●   The Reducer pseudo code
●   The result
●   What if…?
What happens during MapReduce?

Input     Data
           Data
            Data
        Fragment     Mapper     Map
Data     Fragment
          Fragment             Output




                                Sort
          Data
           Data
           Reduce              Reducer
        Fragment     Reducer
         Fragment
           Output               Input
Plays and Reach from music
listening data
● Plays - The number of times that song has
  been played
● Reach - The number of unique listeners to
  that song
● Similar to hits and uniques for web
  properties
● Input data has columns for user id and song
  id (amongst others)
The Mapper
function map(Integer user, Integer song):
  emit(song, user);
The Reducer
function reduce(Integer song, Iterator users):
  Integer plays = 0;
  Set uniqueUsers = [];

  foreach user in users:
    increment plays;
    if user not within uniqueUsers:
      uniqueUsers.add(user);

 result.plays = plays;
 result.reach = uniqueUsers.cardinality();
 emit(song, result);
What if…?
You often hear that for nearly all cases you
should use a higher level tool like Pig or Hive to
solve problems.

So what does the Pig script look like for this
problem?
Using Pig
subs = LOAD 'submissions.tsv' USING PigStorage()
        AS (user:int, song:int);
songs = GROUP subs BY song;
songs = FOREACH songs GENERATE group AS song, subs.user;
songs = FOREACH songs GENERATE
          song, COUNT($1.user), COUNT(Distinct($1.user));
STORE songs INTO 'playsreach.tsv';
Questions?

Más contenido relacionado

Similar a Map Reduce: An Example (James Grant at Big Data Brighton)

Remixable Media Week 5 Seminar
Remixable Media Week 5 SeminarRemixable Media Week 5 Seminar
Remixable Media Week 5 SeminarMichela Ledwidge
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e práticaPET Computação
 
To GO or not to GO
To GO or not to GOTo GO or not to GO
To GO or not to GOsuperstas88
 
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to knowRoy van Rijn
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & StorageIlayaraja P
 
Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...
Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...
Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...Unity Technologies
 
GDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしようGDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしようSatoshi Noda
 
Menggabungkan audio ke dalam sajian multimedia 3.english
Menggabungkan audio ke dalam sajian multimedia 3.englishMenggabungkan audio ke dalam sajian multimedia 3.english
Menggabungkan audio ke dalam sajian multimedia 3.englishEko Supriyadi
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...sebastianewert
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...AMD Developer Central
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersVitomir Kovanovic
 

Similar a Map Reduce: An Example (James Grant at Big Data Brighton) (20)

Map Reduce basics
Map Reduce basicsMap Reduce basics
Map Reduce basics
 
WELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANINGWELCOME TO BIG DATA TRANING
WELCOME TO BIG DATA TRANING
 
Allegograph
AllegographAllegograph
Allegograph
 
Remixable Media Week 5 Seminar
Remixable Media Week 5 SeminarRemixable Media Week 5 Seminar
Remixable Media Week 5 Seminar
 
MapReduce: teoria e prática
MapReduce: teoria e práticaMapReduce: teoria e prática
MapReduce: teoria e prática
 
To GO or not to GO
To GO or not to GOTo GO or not to GO
To GO or not to GO
 
What Shazam doesn't want you to know
What Shazam doesn't want you to knowWhat Shazam doesn't want you to know
What Shazam doesn't want you to know
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...
Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...
Introducing the DSPGraph, the new audio rendering/mixing engine- Unite Copenh...
 
GDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしようGDG DevFest Kyoto 2014 これからのGoの話をしよう
GDG DevFest Kyoto 2014 これからのGoの話をしよう
 
Menggabungkan audio ke dalam sajian multimedia 3.english
Menggabungkan audio ke dalam sajian multimedia 3.englishMenggabungkan audio ke dalam sajian multimedia 3.english
Menggabungkan audio ke dalam sajian multimedia 3.english
 
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
Large-Scale Capture of Producer-Defined Musical Semantics - Ryan Stables (Sem...
 
Scmad Chapter12
Scmad Chapter12Scmad Chapter12
Scmad Chapter12
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...
GS-4093, "AstoundSound for Gaming – The next dimension in the evolution of Au...
 
Audio equalizer
Audio equalizerAudio equalizer
Audio equalizer
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Introduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics ResearchersIntroduction to R for Learning Analytics Researchers
Introduction to R for Learning Analytics Researchers
 

Más de Brandwatch

Identifying and Analyzing a target audience with Analytics
Identifying and Analyzing a target audience with Analytics Identifying and Analyzing a target audience with Analytics
Identifying and Analyzing a target audience with Analytics Brandwatch
 
Brand protection & Crisis Aversion
Brand protection & Crisis AversionBrand protection & Crisis Aversion
Brand protection & Crisis AversionBrandwatch
 
Leveraging Insights with Creative Segmentation
Leveraging Insights with Creative SegmentationLeveraging Insights with Creative Segmentation
Leveraging Insights with Creative SegmentationBrandwatch
 
Life As a Brandwatch Analyst
Life As a Brandwatch AnalystLife As a Brandwatch Analyst
Life As a Brandwatch AnalystBrandwatch
 
Intelligence: The Fundamentals
Intelligence: The Fundamentals Intelligence: The Fundamentals
Intelligence: The Fundamentals Brandwatch
 
Control vs. Culture: The New Technology Operating Environment
Control vs. Culture: The New Technology Operating EnvironmentControl vs. Culture: The New Technology Operating Environment
Control vs. Culture: The New Technology Operating EnvironmentBrandwatch
 
Collective creativity for better intelligence
Collective creativity for better intelligenceCollective creativity for better intelligence
Collective creativity for better intelligenceBrandwatch
 
Ethics and humanity in the age of technology
Ethics and humanity in the age of technology Ethics and humanity in the age of technology
Ethics and humanity in the age of technology Brandwatch
 
Digital transformation in a regulated industry
Digital transformation in a regulated industry Digital transformation in a regulated industry
Digital transformation in a regulated industry Brandwatch
 
Emotional Intelligence
Emotional Intelligence Emotional Intelligence
Emotional Intelligence Brandwatch
 
25 things we learned analyzing billions of tweets
25 things we learned analyzing billions of tweets   25 things we learned analyzing billions of tweets
25 things we learned analyzing billions of tweets Brandwatch
 
PSB + Aga Khan Foundation: United We Brand
PSB + Aga Khan Foundation: United We BrandPSB + Aga Khan Foundation: United We Brand
PSB + Aga Khan Foundation: United We BrandBrandwatch
 
Ditch the Label and Brandwatch: Mental Health Study, 2017
Ditch the Label and Brandwatch: Mental Health Study, 2017Ditch the Label and Brandwatch: Mental Health Study, 2017
Ditch the Label and Brandwatch: Mental Health Study, 2017Brandwatch
 
Telling a story with your social insights
Telling a story with your social insightsTelling a story with your social insights
Telling a story with your social insightsBrandwatch
 
Combining Brandwatch and non Brandwatch data using Vizia 2
Combining Brandwatch and non Brandwatch data using Vizia 2Combining Brandwatch and non Brandwatch data using Vizia 2
Combining Brandwatch and non Brandwatch data using Vizia 2Brandwatch
 
How can social listening help to determine ROI?
How can social listening help to determine ROI?How can social listening help to determine ROI?
How can social listening help to determine ROI?Brandwatch
 
One step ahead: How Co-op uses Brandwatch to inform their business
One step ahead: How Co-op uses Brandwatch to inform their businessOne step ahead: How Co-op uses Brandwatch to inform their business
One step ahead: How Co-op uses Brandwatch to inform their businessBrandwatch
 
Today’s Reality: Managing & Monitoring Campus Crises through Social Media
Today’s Reality: Managing & Monitoring Campus Crises through Social MediaToday’s Reality: Managing & Monitoring Campus Crises through Social Media
Today’s Reality: Managing & Monitoring Campus Crises through Social MediaBrandwatch
 
Social Truth: Revealing what Truly Matters to Customers
Social Truth: Revealing what Truly Matters to CustomersSocial Truth: Revealing what Truly Matters to Customers
Social Truth: Revealing what Truly Matters to CustomersBrandwatch
 
Social Maturity
Social MaturitySocial Maturity
Social MaturityBrandwatch
 

Más de Brandwatch (20)

Identifying and Analyzing a target audience with Analytics
Identifying and Analyzing a target audience with Analytics Identifying and Analyzing a target audience with Analytics
Identifying and Analyzing a target audience with Analytics
 
Brand protection & Crisis Aversion
Brand protection & Crisis AversionBrand protection & Crisis Aversion
Brand protection & Crisis Aversion
 
Leveraging Insights with Creative Segmentation
Leveraging Insights with Creative SegmentationLeveraging Insights with Creative Segmentation
Leveraging Insights with Creative Segmentation
 
Life As a Brandwatch Analyst
Life As a Brandwatch AnalystLife As a Brandwatch Analyst
Life As a Brandwatch Analyst
 
Intelligence: The Fundamentals
Intelligence: The Fundamentals Intelligence: The Fundamentals
Intelligence: The Fundamentals
 
Control vs. Culture: The New Technology Operating Environment
Control vs. Culture: The New Technology Operating EnvironmentControl vs. Culture: The New Technology Operating Environment
Control vs. Culture: The New Technology Operating Environment
 
Collective creativity for better intelligence
Collective creativity for better intelligenceCollective creativity for better intelligence
Collective creativity for better intelligence
 
Ethics and humanity in the age of technology
Ethics and humanity in the age of technology Ethics and humanity in the age of technology
Ethics and humanity in the age of technology
 
Digital transformation in a regulated industry
Digital transformation in a regulated industry Digital transformation in a regulated industry
Digital transformation in a regulated industry
 
Emotional Intelligence
Emotional Intelligence Emotional Intelligence
Emotional Intelligence
 
25 things we learned analyzing billions of tweets
25 things we learned analyzing billions of tweets   25 things we learned analyzing billions of tweets
25 things we learned analyzing billions of tweets
 
PSB + Aga Khan Foundation: United We Brand
PSB + Aga Khan Foundation: United We BrandPSB + Aga Khan Foundation: United We Brand
PSB + Aga Khan Foundation: United We Brand
 
Ditch the Label and Brandwatch: Mental Health Study, 2017
Ditch the Label and Brandwatch: Mental Health Study, 2017Ditch the Label and Brandwatch: Mental Health Study, 2017
Ditch the Label and Brandwatch: Mental Health Study, 2017
 
Telling a story with your social insights
Telling a story with your social insightsTelling a story with your social insights
Telling a story with your social insights
 
Combining Brandwatch and non Brandwatch data using Vizia 2
Combining Brandwatch and non Brandwatch data using Vizia 2Combining Brandwatch and non Brandwatch data using Vizia 2
Combining Brandwatch and non Brandwatch data using Vizia 2
 
How can social listening help to determine ROI?
How can social listening help to determine ROI?How can social listening help to determine ROI?
How can social listening help to determine ROI?
 
One step ahead: How Co-op uses Brandwatch to inform their business
One step ahead: How Co-op uses Brandwatch to inform their businessOne step ahead: How Co-op uses Brandwatch to inform their business
One step ahead: How Co-op uses Brandwatch to inform their business
 
Today’s Reality: Managing & Monitoring Campus Crises through Social Media
Today’s Reality: Managing & Monitoring Campus Crises through Social MediaToday’s Reality: Managing & Monitoring Campus Crises through Social Media
Today’s Reality: Managing & Monitoring Campus Crises through Social Media
 
Social Truth: Revealing what Truly Matters to Customers
Social Truth: Revealing what Truly Matters to CustomersSocial Truth: Revealing what Truly Matters to Customers
Social Truth: Revealing what Truly Matters to Customers
 
Social Maturity
Social MaturitySocial Maturity
Social Maturity
 

Map Reduce: An Example (James Grant at Big Data Brighton)

  • 1. Map Reduce An Example
  • 2. Who am I? My name is James Grant (james@queeg.org). I'm a developer here at Brandwatch. For the last three years I've been a Data Engineer at Last.fm and the maintainer of their Hadoop Cluster.
  • 3. Coming up… ● What happens during MapReduce? ● Plays and Reach from music listening data ● The Mapper pseudo code ● The Reducer pseudo code ● The result ● What if…?
  • 4. What happens during MapReduce? Input Data Data Data Fragment Mapper Map Data Fragment Fragment Output Sort Data Data Reduce Reducer Fragment Reducer Fragment Output Input
  • 5. Plays and Reach from music listening data ● Plays - The number of times that song has been played ● Reach - The number of unique listeners to that song ● Similar to hits and uniques for web properties ● Input data has columns for user id and song id (amongst others)
  • 6. The Mapper function map(Integer user, Integer song): emit(song, user);
  • 7. The Reducer function reduce(Integer song, Iterator users): Integer plays = 0; Set uniqueUsers = []; foreach user in users: increment plays; if user not within uniqueUsers: uniqueUsers.add(user); result.plays = plays; result.reach = uniqueUsers.cardinality(); emit(song, result);
  • 8. What if…? You often hear that for nearly all cases you should use a higher level tool like Pig or Hive to solve problems. So what does the Pig script look like for this problem?
  • 9. Using Pig subs = LOAD 'submissions.tsv' USING PigStorage() AS (user:int, song:int); songs = GROUP subs BY song; songs = FOREACH songs GENERATE group AS song, subs.user; songs = FOREACH songs GENERATE song, COUNT($1.user), COUNT(Distinct($1.user)); STORE songs INTO 'playsreach.tsv';