SlideShare una empresa de Scribd logo
1 de 21
NASDAQ            SCOR
Clients           2,000+ worldwide
Employees         1,000+
Headquarters      Reston, VA
Global Coverage   220+ countries under measurement
Local Presence    32 locations in 23 countries
The Challenge


 •   Available in 7 countries: USA, Brazil,
     Britain, Canada, France, Germany, Spain
     2013: Mexico and India




       Over 4B ads monthly
       5M-10M unique new ads monthly
Display Ads
•   Observes advertising creatives
•   As they are encountered by the panelist



Collects Facebook pages
•   Regular and premium ads


                                        Extracting all this
                                     information (and more)
Production Hadoop Cluster
•   100 nodes

•   2276 total CPUs, 6TB total memory, 1.7PB total disk space, 1GB Ethernet




                                                                               Facebook
                           Facebook         Facebook                             Ads
                          Entity-Stream      Entity-
            Hadoop         Extraction       Partitions
             DFS                                            Dictionary-Apply   Facebook
                                                                                News &
                                                                                Profiles


                 Daily:   2 Hr / 70G      15min / 15Gx      30 min / 15Gx
Data size:                                                           Client
                                                                                          NameNode
•   Compressed ~ 2 TB
•   Uncompressed ~ 6 TB
•   Total Pages - 320M
Need to process 3,700 pages/sec…               Hadoop-1   Hadoop-2        Hadoop-3       Hadoop-N

•   Avg size per page: 18 KB…
•   Factor in time to collect, load to HDFS,
    buffer time for errors, etc…                                                     …
Hadoop is used to extract entities
•   Each node processes 85 pages /sec
•   Daily Facebook entity extraction                                 HDFS
    completes in ~2 hours
                                                              Load FB Pages
•   Multi-Language Support

                                                                     NTFS
AdMetrix:
•   Total Ads: 85M
•   Ads per Ad-page: 3.7


Social Essentials:
•   Total news items: 351M
Ad-Volume
•    6M unique new ads monthly
                                     ?
Advertiser-Space
(Product Dictionary)
•    Over 56K companies
•    Over 100K company/brand pairs


Problem
         correctly
         quickly
         inexpensively
OCR based   Image-Recognition based
            Pros
            •   Potentially applicable to all non-Facebook
                online ads


            Cons
            •   Low Accuracy
            •   Low Coverage
            •   Difficult to scale and maintain for huge daily
                data-volume
•   Classify ads to cover ~80% impression

•   Automated Classification:

       Destination URL
       Title

       Currently classifying 7-20% of new ads




       no associated-text for ad
       new advertiser
       multi-advertiser ads
       new brand, movie
Classify ads for              Turk-
                                  Turk-
                         Classification to
                   Ads              Turk-
                         Product-Names to
                           Classification
                           Product-Names to
                              Classification
                              Product-Names




                                   New           No
                                                      Prod
                                 Product?             Name

                                Yes


                               Turk-
                                   Turk-
                          Identification of
                                      Turk-
                         Company-Name,of
                             Identification
                            Company-Name,of
                                Identification
                          URL, Category
                               Company-Name,
                             URL, Category
                                URL, Category
Visit www.comscoredatamine.com or follow @datagems for the latest gems.
Michael Brown
CTO
comScore, Inc.


mbrown@comscore.com
We are sincerely eager to
 hear your feedback on this
presentation and on re:Invent.

 Please fill out an evaluation
   form when you have a
            chance.

Más contenido relacionado

Destacado

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
CPN202 More for Less - AWS re: Invent 2012
CPN202 More for Less - AWS re: Invent 2012CPN202 More for Less - AWS re: Invent 2012
CPN202 More for Less - AWS re: Invent 2012Amazon Web Services
 
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …Amazon Web Services
 
STP102 Ahead in the Clouds - AWS re: Invent 2012
STP102 Ahead in the Clouds - AWS re: Invent 2012STP102 Ahead in the Clouds - AWS re: Invent 2012
STP102 Ahead in the Clouds - AWS re: Invent 2012Amazon Web Services
 
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012Amazon Web Services
 
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012Amazon Web Services
 
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...Amazon Web Services
 
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…Amazon Web Services
 
MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012
MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012
MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012Amazon Web Services
 
CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012
CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012
CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012Amazon Web Services
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012Amazon Web Services
 
SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …
SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …
SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …Amazon Web Services
 
ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012
ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012
ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012Amazon Web Services
 
MBL205 Monetizing Your App on Kindle Fire - AWS re: Invent 2012
MBL205 Monetizing Your App on Kindle Fire  - AWS re: Invent 2012MBL205 Monetizing Your App on Kindle Fire  - AWS re: Invent 2012
MBL205 Monetizing Your App on Kindle Fire - AWS re: Invent 2012Amazon Web Services
 
SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012
SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012
SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012Amazon Web Services
 
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...Amazon Web Services
 
ARC304 Solutions in Action - AWS re: Invent 2012
ARC304 Solutions in Action - AWS re: Invent 2012ARC304 Solutions in Action - AWS re: Invent 2012
ARC304 Solutions in Action - AWS re: Invent 2012Amazon Web Services
 
BDT304 Big Data Masterclass - AWS re: Invent 2012
BDT304 Big Data Masterclass - AWS re: Invent 2012BDT304 Big Data Masterclass - AWS re: Invent 2012
BDT304 Big Data Masterclass - AWS re: Invent 2012Amazon Web Services
 

Destacado (18)

BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
 
CPN202 More for Less - AWS re: Invent 2012
CPN202 More for Less - AWS re: Invent 2012CPN202 More for Less - AWS re: Invent 2012
CPN202 More for Less - AWS re: Invent 2012
 
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
CPN102 Your First Week with Amazon Elastic Compute Cloud - AWS re: Invent …
 
STP102 Ahead in the Clouds - AWS re: Invent 2012
STP102 Ahead in the Clouds - AWS re: Invent 2012STP102 Ahead in the Clouds - AWS re: Invent 2012
STP102 Ahead in the Clouds - AWS re: Invent 2012
 
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
 
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012
STP205 Making it Big Without Breaking the Bank - AWS re: Invent 2012
 
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
SVC103 The Whys and Hows of Integrating Amazon Simple Email Service into your...
 
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…
 
MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012
MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012
MBL303 Scalable Mobile and Web Apps - AWS re: Invent 2012
 
CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012
CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012
CPN204 Windows on Amazon EC2 – Top ten things - AWS re: Invent 2012
 
STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012STP201 Efficiency at Scale - AWS re: Invent 2012
STP201 Efficiency at Scale - AWS re: Invent 2012
 
SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …
SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …
SEC202 Federal Government Compliance Best Practices in the Cloud - AWS re: …
 
ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012
ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012
ENT205 Drinking Our Own Champagne - How Amazon uses AWS - AWS re: Invent 2012
 
MBL205 Monetizing Your App on Kindle Fire - AWS re: Invent 2012
MBL205 Monetizing Your App on Kindle Fire  - AWS re: Invent 2012MBL205 Monetizing Your App on Kindle Fire  - AWS re: Invent 2012
MBL205 Monetizing Your App on Kindle Fire - AWS re: Invent 2012
 
SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012
SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012
SEC102 Security and Compliance in the AWS Cloud - AWS re: Invent 2012
 
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
ARC205 Building Web-scale Applications Architectures with AWS - AWS re: Inven...
 
ARC304 Solutions in Action - AWS re: Invent 2012
ARC304 Solutions in Action - AWS re: Invent 2012ARC304 Solutions in Action - AWS re: Invent 2012
ARC304 Solutions in Action - AWS re: Invent 2012
 
BDT304 Big Data Masterclass - AWS re: Invent 2012
BDT304 Big Data Masterclass - AWS re: Invent 2012BDT304 Big Data Masterclass - AWS re: Invent 2012
BDT304 Big Data Masterclass - AWS re: Invent 2012
 

Similar a BDT102 Algorithms, Machines, and Crowdsourcing - AWS re: Invent 2012

Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsWebtrends
 
Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsWebtrends
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK Conference
 
The information supernova
The information supernovaThe information supernova
The information supernovaAlaa Al-Agamawi
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Crossmedia Workflows
Crossmedia WorkflowsCrossmedia Workflows
Crossmedia WorkflowsDwight Kelly
 
Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationDan English
 
Facebook Developer Garage Uganda
Facebook Developer Garage UgandaFacebook Developer Garage Uganda
Facebook Developer Garage UgandaLeila Janah
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsInside Analysis
 
The Capitalist in the Co-Op: The Art & Science of the Premium WordPress Business
The Capitalist in the Co-Op: The Art & Science of the Premium WordPress BusinessThe Capitalist in the Co-Op: The Art & Science of the Premium WordPress Business
The Capitalist in the Co-Op: The Art & Science of the Premium WordPress BusinessShane Pearlman
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingYahoo Developer Network
 
Selling You & Your Services Drupalcon 2009
Selling You & Your Services   Drupalcon 2009Selling You & Your Services   Drupalcon 2009
Selling You & Your Services Drupalcon 2009Neil Giarratana
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringChris Gates
 
2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & Gamble2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & GamblePerforce
 

Similar a BDT102 Algorithms, Machines, and Crowdsourcing - AWS re: Invent 2012 (20)

Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends Analytics
 
Discover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends AnalyticsDiscover the Hidden Gems in Webtrends Analytics
Discover the Hidden Gems in Webtrends Analytics
 
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
TCUK 2012, Nolwenn Kerzreho, Metadata: Why Should Technical Communicators Care?
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
The information supernova
The information supernovaThe information supernova
The information supernova
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Crossmedia Workflows
Crossmedia WorkflowsCrossmedia Workflows
Crossmedia Workflows
 
Pm100 sponsorship
Pm100 sponsorshipPm100 sponsorship
Pm100 sponsorship
 
Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG Presentation
 
Facebook Developer Garage Uganda
Facebook Developer Garage UgandaFacebook Developer Garage Uganda
Facebook Developer Garage Uganda
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Impact on internal collaboration
Impact on internal collaborationImpact on internal collaboration
Impact on internal collaboration
 
PatSeer
PatSeerPatSeer
PatSeer
 
Hadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both WorldsHadoop and the Relational Database: The Best of Both Worlds
Hadoop and the Relational Database: The Best of Both Worlds
 
Big Data
Big DataBig Data
Big Data
 
The Capitalist in the Co-Op: The Art & Science of the Premium WordPress Business
The Capitalist in the Co-Op: The Art & Science of the Premium WordPress BusinessThe Capitalist in the Co-Op: The Art & Science of the Premium WordPress Business
The Capitalist in the Co-Op: The Art & Science of the Premium WordPress Business
 
Hadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data ProcessingHadoop @ Yahoo! - Internet Scale Data Processing
Hadoop @ Yahoo! - Internet Scale Data Processing
 
Selling You & Your Services Drupalcon 2009
Selling You & Your Services   Drupalcon 2009Selling You & Your Services   Drupalcon 2009
Selling You & Your Services Drupalcon 2009
 
Gates Toorcon X New School Information Gathering
Gates Toorcon X New School Information GatheringGates Toorcon X New School Information Gathering
Gates Toorcon X New School Information Gathering
 
2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & Gamble2013 Perforce Collaboration Tour - Procter & Gamble
2013 Perforce Collaboration Tour - Procter & Gamble
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

BDT102 Algorithms, Machines, and Crowdsourcing - AWS re: Invent 2012

  • 1.
  • 2. NASDAQ SCOR Clients 2,000+ worldwide Employees 1,000+ Headquarters Reston, VA Global Coverage 220+ countries under measurement Local Presence 32 locations in 23 countries
  • 3.
  • 4.
  • 5. The Challenge • Available in 7 countries: USA, Brazil, Britain, Canada, France, Germany, Spain 2013: Mexico and India  Over 4B ads monthly  5M-10M unique new ads monthly
  • 6. Display Ads • Observes advertising creatives • As they are encountered by the panelist Collects Facebook pages • Regular and premium ads Extracting all this information (and more)
  • 7. Production Hadoop Cluster • 100 nodes • 2276 total CPUs, 6TB total memory, 1.7PB total disk space, 1GB Ethernet Facebook Facebook Facebook Ads Entity-Stream Entity- Hadoop Extraction Partitions DFS Dictionary-Apply Facebook News & Profiles Daily: 2 Hr / 70G 15min / 15Gx 30 min / 15Gx
  • 8. Data size: Client NameNode • Compressed ~ 2 TB • Uncompressed ~ 6 TB • Total Pages - 320M Need to process 3,700 pages/sec… Hadoop-1 Hadoop-2 Hadoop-3 Hadoop-N • Avg size per page: 18 KB… • Factor in time to collect, load to HDFS, buffer time for errors, etc… … Hadoop is used to extract entities • Each node processes 85 pages /sec • Daily Facebook entity extraction HDFS completes in ~2 hours Load FB Pages • Multi-Language Support NTFS
  • 9. AdMetrix: • Total Ads: 85M • Ads per Ad-page: 3.7 Social Essentials: • Total news items: 351M
  • 10. Ad-Volume • 6M unique new ads monthly ? Advertiser-Space (Product Dictionary) • Over 56K companies • Over 100K company/brand pairs Problem  correctly  quickly  inexpensively
  • 11. OCR based Image-Recognition based Pros • Potentially applicable to all non-Facebook online ads Cons • Low Accuracy • Low Coverage • Difficult to scale and maintain for huge daily data-volume
  • 12. Classify ads to cover ~80% impression • Automated Classification: Destination URL Title Currently classifying 7-20% of new ads no associated-text for ad new advertiser multi-advertiser ads new brand, movie
  • 13. Classify ads for Turk- Turk- Classification to Ads Turk- Product-Names to Classification Product-Names to Classification Product-Names New No Prod Product? Name Yes Turk- Turk- Identification of Turk- Company-Name,of Identification Company-Name,of Identification URL, Category Company-Name, URL, Category URL, Category
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Visit www.comscoredatamine.com or follow @datagems for the latest gems.
  • 21. We are sincerely eager to hear your feedback on this presentation and on re:Invent. Please fill out an evaluation form when you have a chance.