SlideShare a Scribd company logo
1 of 19
Big Data at Tube
(Events → Insights → Actions)
27th April 2016
@John Trenkle (Chief Scientist)
@Murtaza Doctor (Director of Engineering, RTB)
©2016 TubeMogul Inc. All rights reserved.
• Where do we fit?
• What do we do?
• Life of a video Ad
• RTB Architecture
• Events Architecture
• ML Perspective: Transactional -> User-Oriented
• Data -> Models
• Models -> Action
Outline
Busy Ad-Tech Landscape
©2016 TubeMogul Inc. All rights reserved.
Where does TubeMogul fit?
©2016 TubeMogul Inc. All rights reserved.
Scale:
An enterprise software company for digital branding
● Processed over 12.6 Trillion Ad Auctions in 2015
● Serve over 55 billion auctions per day
● Served over 3 Billion Ad Impressions on linear TV via our PTV solution
● Process bids in < 50 ms
● Serve bid responses in < 80 ms (includes network round-trip)
● Serve 5 PB of monthly video traffic
©2016 TubeMogul Inc. All rights reserved.
Ex: Life of a video Ad:
©2016 TubeMogul Inc. All rights reserved.
Technical Overview
Bidding Layer
Ad
Serving
- High Volumes
- Low Latency
- Small Packets
- Large Data Sets
- Low Latency
- Fast Processing
- Large Caches
Low Latency User
Database for User
Targeting and Frequency
Capping
©2016 TubeMogul Inc. All rights reserved.
Events Architecture:
● Auctions (Bids + Non Bids)
● Win Events (Impressions)
● Columnar format (ORC)
● Data Pipeline?
● Bad data?
● Scaling challenges
● Multiple downstream consumers
©2016 TubeMogul Inc. All rights reserved.
Events Architecture
©2016 TubeMogul Inc. All rights reserved.
Events Architecture: Takeaways
● Simply and Unify
● Focus on Data Validation at each step
● Automated recovery
● Leverage the messaging system for status or completion
● Metrics & Measurement for SLA
©2016 TubeMogul Inc. All rights reserved.
Machine-Learning as a Consumer
• Audience Modeling begets user-oriented data
• Pivot RTB / Analytics sources for model-building
• Many sources of Truth that need to be integrated
• Ad Interaction
• Characterize Users with robust signature (UU-Code) rather than just an item list
• Facilitate rapid prototyping and model-building
• Maintain enriched information for exploratory analysis and visualization
• Insights
• Actionable Intel
©2016 TubeMogul Inc. All rights reserved.
Ad Calls to User-Traces in Hive (on path to NoSQL)
Hive
RTB Ad
Calls
RTB
Digest
User
Activity
NoSQL
RTB Ad
Calls
User
Activity
Elastic
Search
©2016 TubeMogul Inc. All rights reserved.
Token Embedding Models and Spark
http://deepdist.com/
Ref: http://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
©2016 TubeMogul Inc. All rights reserved.
Cascading for Signatures
1. JOIN on
tm_client
2. Filter
average weight
per verticals <
0.5
Daily Users
Activities
Prefixed
Daily UUCode
Creation Process
Daily
UUCodes
TM Client
Daily Activity3
Get Truth Users By
LAL Segment
Daily Truth
Users for all
LAL segment
Centroid Creation
Process
LAL
Landmarks
Segment
Creation
Process
User
Membership
Unfiltered
UUCode
Model
TM Daily
Converters
Convs LAL
segments from
Mario
User
Membership
Attach SourceID
Process
Daily
UUCodes with
Source ID
TMClientID
SourceID
Lookup
Aggregated
UUCode Creation
Process
UU Code
TM Client
Digest3
Create SourceID
Lookup Process
Wormhole
Process
Segment
Filter
Process
~650GB
UDB Team
Persistent Users
Table
©2016 TubeMogul Inc. All rights reserved.
Large-Scale Predictive Model Building
Get Truth Users,
signature
Data
Warehouse
Of truth users
Training Data
Creation
Training
Data for
segments
Ground Truth
For each
segment, perform
training
Check
performance, log
in mysql for
tracking
purposes.
Model/
weights file
for each
segment
Aggregate and
Convert to
UUCode
UU Code
Model
3 months
aggregatio
n
Segment Information
Dashboard
UI
©2016 TubeMogul Inc. All rights reserved.
Partners that have Contributed to Our Ecosystem
• Qubole
• Long-time partners
• Great for Ad Hoc queries and scheduled ETL
• Dynamic Scaling
• Snowflake
• Data Warehouse – facilitates Fraud Analysis
• SpotInst
• Cost effective Spot Instances in EMR
• Robust provisioning
• Dynamic Scaling
• Driven
• Monitor, optimize and debug Hadoop flows
©2016 TubeMogul Inc. All rights reserved.
Since Hive has been our primary datastore for a while…
• Tips and tricks
• ORC
• MAPJOIN
• Sorted, Bucketed JOINs
• TRANSFORM
• HAVING
• Hadoop Streaming
©2016 TubeMogul Inc. All rights reserved.
Models → Action
• Optimization
• Surrogate measures of engagement: Clicks, Completions, Conversions
• Audience Building for Targeting
• Demographic
• Behavioral
• Fraud Detection
• Cross Device Synching
• Profiling / Data Mining / Actionable Intel
Big Data at Tube: Events to Insights to Action

More Related Content

What's hot

Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Imply
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
saadjamil31
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
Databricks
 

What's hot (20)

Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Presto Summit 2018 - 08 - FINRA
Presto Summit 2018  - 08 - FINRAPresto Summit 2018  - 08 - FINRA
Presto Summit 2018 - 08 - FINRA
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
 
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience SharingClickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
Presto & differences between popular SQL engines (Spark, Redshift, and Hive)
 
Self Service Analytics at Twitch
Self Service Analytics at TwitchSelf Service Analytics at Twitch
Self Service Analytics at Twitch
 
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
Druid + Kafka: transform your data-in-motion to analytics-in-motion | Gian Me...
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
NoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache CalciteNoSQL no more: SQL on Druid with Apache Calcite
NoSQL no more: SQL on Druid with Apache Calcite
 
empirical analysis modeling of power dissipation control in internet data ce...
 empirical analysis modeling of power dissipation control in internet data ce... empirical analysis modeling of power dissipation control in internet data ce...
empirical analysis modeling of power dissipation control in internet data ce...
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul MasterCornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
Cornami Accelerates Performance on SPARK: Spark Summit East talk by Paul Master
 
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
 

Viewers also liked

(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 

Viewers also liked (12)

Big Data Applications
Big Data ApplicationsBig Data Applications
Big Data Applications
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
Hadoop World 2011: Replacing RDB/DW with Hadoop and Hive for Telco Big Data -...
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Upping your NiFi Game with Docker
Upping your NiFi Game with DockerUpping your NiFi Game with Docker
Upping your NiFi Game with Docker
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Selective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed HadoopSelective Data Replication with Geographically Distributed Hadoop
Selective Data Replication with Geographically Distributed Hadoop
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
HBaseCon 2013: Realtime User Segmentation using Apache HBase -- Architectural...
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
 

Similar to Big Data at Tube: Events to Insights to Action

Apply Machine Learning to Microservices
Apply Machine Learning to MicroservicesApply Machine Learning to Microservices
Apply Machine Learning to Microservices
Kai Wähner
 

Similar to Big Data at Tube: Events to Insights to Action (20)

Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
 
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
 
Apply Machine Learning to Microservices
Apply Machine Learning to MicroservicesApply Machine Learning to Microservices
Apply Machine Learning to Microservices
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
Big Data LDN 2017: How Big Data Insights Become Easily Accessible With Workfl...
 
Criteo TektosData Meetup
Criteo TektosData MeetupCriteo TektosData Meetup
Criteo TektosData Meetup
 
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...
 
New machine learning challenges at Criteo
New machine learning challenges at CriteoNew machine learning challenges at Criteo
New machine learning challenges at Criteo
 
RTBkit Introduction & Best Practices
RTBkit Introduction & Best PracticesRTBkit Introduction & Best Practices
RTBkit Introduction & Best Practices
 
Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015Applying R in BI and Real Time applications EARL London 2015
Applying R in BI and Real Time applications EARL London 2015
 
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
TIBCO Innovation Workshop Series: Reducing Decision Latency with Streaming An...
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
Applying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time ApplicationsApplying the R Language to BI and Real Time Applications
Applying the R Language to BI and Real Time Applications
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Cloud Automation and Machine learning: A selection of real world case studies...
Cloud Automation and Machine learning: A selection of real world case studies...Cloud Automation and Machine learning: A selection of real world case studies...
Cloud Automation and Machine learning: A selection of real world case studies...
 
Nosql Now 2015
Nosql Now 2015Nosql Now 2015
Nosql Now 2015
 
Presentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector FinancieroPresentación Paco Bermejo - La Noche del Sector Financiero
Presentación Paco Bermejo - La Noche del Sector Financiero
 
Industrial Internet of Things: Protocols an Standards
Industrial Internet of Things: Protocols an StandardsIndustrial Internet of Things: Protocols an Standards
Industrial Internet of Things: Protocols an Standards
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 

Recently uploaded

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 

Recently uploaded (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Big Data at Tube: Events to Insights to Action

  • 1. Big Data at Tube (Events → Insights → Actions) 27th April 2016 @John Trenkle (Chief Scientist) @Murtaza Doctor (Director of Engineering, RTB)
  • 2. ©2016 TubeMogul Inc. All rights reserved. • Where do we fit? • What do we do? • Life of a video Ad • RTB Architecture • Events Architecture • ML Perspective: Transactional -> User-Oriented • Data -> Models • Models -> Action Outline
  • 4. ©2016 TubeMogul Inc. All rights reserved. Where does TubeMogul fit?
  • 5. ©2016 TubeMogul Inc. All rights reserved. Scale: An enterprise software company for digital branding ● Processed over 12.6 Trillion Ad Auctions in 2015 ● Serve over 55 billion auctions per day ● Served over 3 Billion Ad Impressions on linear TV via our PTV solution ● Process bids in < 50 ms ● Serve bid responses in < 80 ms (includes network round-trip) ● Serve 5 PB of monthly video traffic
  • 6. ©2016 TubeMogul Inc. All rights reserved. Ex: Life of a video Ad:
  • 7. ©2016 TubeMogul Inc. All rights reserved. Technical Overview Bidding Layer Ad Serving - High Volumes - Low Latency - Small Packets - Large Data Sets - Low Latency - Fast Processing - Large Caches Low Latency User Database for User Targeting and Frequency Capping
  • 8. ©2016 TubeMogul Inc. All rights reserved. Events Architecture: ● Auctions (Bids + Non Bids) ● Win Events (Impressions) ● Columnar format (ORC) ● Data Pipeline? ● Bad data? ● Scaling challenges ● Multiple downstream consumers
  • 9. ©2016 TubeMogul Inc. All rights reserved. Events Architecture
  • 10. ©2016 TubeMogul Inc. All rights reserved. Events Architecture: Takeaways ● Simply and Unify ● Focus on Data Validation at each step ● Automated recovery ● Leverage the messaging system for status or completion ● Metrics & Measurement for SLA
  • 11. ©2016 TubeMogul Inc. All rights reserved. Machine-Learning as a Consumer • Audience Modeling begets user-oriented data • Pivot RTB / Analytics sources for model-building • Many sources of Truth that need to be integrated • Ad Interaction • Characterize Users with robust signature (UU-Code) rather than just an item list • Facilitate rapid prototyping and model-building • Maintain enriched information for exploratory analysis and visualization • Insights • Actionable Intel
  • 12. ©2016 TubeMogul Inc. All rights reserved. Ad Calls to User-Traces in Hive (on path to NoSQL) Hive RTB Ad Calls RTB Digest User Activity NoSQL RTB Ad Calls User Activity Elastic Search
  • 13. ©2016 TubeMogul Inc. All rights reserved. Token Embedding Models and Spark http://deepdist.com/ Ref: http://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf
  • 14. ©2016 TubeMogul Inc. All rights reserved. Cascading for Signatures 1. JOIN on tm_client 2. Filter average weight per verticals < 0.5 Daily Users Activities Prefixed Daily UUCode Creation Process Daily UUCodes TM Client Daily Activity3 Get Truth Users By LAL Segment Daily Truth Users for all LAL segment Centroid Creation Process LAL Landmarks Segment Creation Process User Membership Unfiltered UUCode Model TM Daily Converters Convs LAL segments from Mario User Membership Attach SourceID Process Daily UUCodes with Source ID TMClientID SourceID Lookup Aggregated UUCode Creation Process UU Code TM Client Digest3 Create SourceID Lookup Process Wormhole Process Segment Filter Process ~650GB UDB Team Persistent Users Table
  • 15. ©2016 TubeMogul Inc. All rights reserved. Large-Scale Predictive Model Building Get Truth Users, signature Data Warehouse Of truth users Training Data Creation Training Data for segments Ground Truth For each segment, perform training Check performance, log in mysql for tracking purposes. Model/ weights file for each segment Aggregate and Convert to UUCode UU Code Model 3 months aggregatio n Segment Information Dashboard UI
  • 16. ©2016 TubeMogul Inc. All rights reserved. Partners that have Contributed to Our Ecosystem • Qubole • Long-time partners • Great for Ad Hoc queries and scheduled ETL • Dynamic Scaling • Snowflake • Data Warehouse – facilitates Fraud Analysis • SpotInst • Cost effective Spot Instances in EMR • Robust provisioning • Dynamic Scaling • Driven • Monitor, optimize and debug Hadoop flows
  • 17. ©2016 TubeMogul Inc. All rights reserved. Since Hive has been our primary datastore for a while… • Tips and tricks • ORC • MAPJOIN • Sorted, Bucketed JOINs • TRANSFORM • HAVING • Hadoop Streaming
  • 18. ©2016 TubeMogul Inc. All rights reserved. Models → Action • Optimization • Surrogate measures of engagement: Clicks, Completions, Conversions • Audience Building for Targeting • Demographic • Behavioral • Fraud Detection • Cross Device Synching • Profiling / Data Mining / Actionable Intel