More Related Content Similar to How did you know this ad would be relevant for me? (20) More from DataWorks Summit (20) How did you know this ad would be relevant for me?1. Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Scale
- How did we know that this Ad was relevant for you ?
Savin Goyal
Sivasankaran Chandrasekar
2. Proprietary & Confidential. Copyright © 2014.
ADVERTISER ROCKET FUEL
200+
RTB advertising
supply partners
50+ Mn
Websites
50+ Bn
Daily impressions
3B WW CONSUMERS
100,000+ DEVICES
3. Proprietary & Confidential. Copyright © 2014.
Exchanges
Ad
Exchange
Rocket Fuel Platform
Auto
Optimization
Real-Time
Bidding
Agencies
Data Partners
Display Advertising Ecosystem
4. Proprietary & Confidential. Copyright © 2014.
Bid on Ad
User
Data
Bid Request
Rocket Fuel
Winning AdAd Request
Ad Served to User
Page RequestWeb Browser
Rocket Fuel Platform
Smart Ad Servers
Response
Prediction
Models
1
8
2 7
Calculate
Propensity
Score
5User
Engagement
Recorded
9 User Engages with Ad
Publishers
Refresh
learning
Campaign &
Audience
Data
4
Qualify
Campaign
10
3
6
Data Partners
Exchange Partners
Programmatic Buying
5. Proprietary & Confidential. Copyright © 2014.
1.25
$2.11
$1.26
$2.78
$1.256
$1.809
$2.42
1.25
$2.11
$1.26
$2.78
$0.586
$2.009
1.25
$2.11
$1.26
$2.78
$1.56
$0.00
Site/PageGeo/WeatherTime of DayBrand AffinityUser
[ + ][ + ]
Real Time Auction
6. Proprietary & Confidential. Copyright © 2014.
Goal:
Leads
& sales
Goal:
Coupon
downloads
Goal:
Brand
awareness
Site/PageGeo/WeatherTime of DayBrand AffinityDemo
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-market
Behavior
Response
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
X
Impression Scorecard
Demo
Brand Affinity
Time of Day
Geo/Weather
Site/Page
Ad Position
In-Market
Behavior
Response
+100
+40
-20
+20
+15
+10
+40
+35
+9.7%
+40
-70
-20
+10
+15
-25
-40
-18
+0.7%
+10
-10
-20
+20
+10
-35
-25
+10
+1.4%
X
Real Time Auction
7. Proprietary & Confidential. Copyright © 2014.
Scalable Predictive Models
Age/Gender
Occupation
IncomeEthnicity
Purchase Intent
Online
Purchases
Offline
Purchases
Browsing
Behavior
Site Actions
Zip CodeCity/DMA
Search
Sites
Search
Categories
Recency
Search
Keywords
Web Site/Page
Referral URL
Site
Category
Bizographics
Social
Interests Lifestyle
Positive Lift
Marginal Impact
Negative Lift
-7
+17
X
-2
+8
+14
X
-9
-13
-12
X
+19
+13
+11
X
+11
X
X
X
+25
+6
X
-7 +17
-2
+28
X
+11
X
X
-9
+14
+17 +19
+8 +11
X
X
-9
+17
-23
+6
X
+17
-7
X
-2
-13
-12
X
+13
+6
+11
X
X
X
-9 X
+17
X
+19
+8
+14
+18
-23
+17
-12
+11
-9
+8 +14
X
+11
-13
-12
+13
+11
X
X
-7
+17 +8
+18X
+11
X -12-10
+6
+14
X
+8
+11
-10+13
+28 +6
+13
+19
X
+8
+11
-10
+13
-12
+17
X
-7
+8
X
Automated Feature Selection
Infinite number of models
Determine perfect model size
Balance past data fit
and future generalization
Learn-Test-Refine
Automatically learn from
each response
Cross-validate - A / B testing
infrastructure
Training pipeline
8. Proprietary & Confidential. Copyright © 2014.
5 B
6 B
50 B
Facebook likes
Searches on Google
Events processed by Rocket Fuel
Requests per day
Throughput
9. Proprietary & Confidential. Copyright © 2014.
Rocket Fuel Scale
34,474 CPU Processor Cores
2655 servers
187.4 Teraflops of computing
188 Terabytes of memory
13X the memory of Jeopardy-
winning IBM Watson
42 Petabytes of storage
106X the data volume of entire
Library of Congress
12. Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
Leverage online activities on the web to learn about user’s
Long Term Interests
User is interested in luxury cars
Short Term Interests
User is looking for a pizza right now
Expand user set beyond retargeting
Explore v/s Exploit
Identify relevant users even if they have never been targeted
previously
13. Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
Label Data
Train
Model
Back Test
Calibrate
Training
Events
Pixel
Stream
Ad Logs
BT Features
(HBase)
Feature
Generation
Score
Profiles
Profile
Generation
Scoring
Ad Serving Data Centers Model
14. Proprietary & Confidential. Copyright © 2014.
Hadoop/HBase @ Rocket Fuel
Cluster Highlights
650+ Slaves (64 GB + 12 *3 TB)
20 PB Storage
HA Name Node Set Up
9k Map Slots + 5.5k Reduce Slots
Co-located to run HBase for offline processing
HBase 0.94.15
5 Node ZooKeeper quorum
Monitoring with OpenTSDB
Dual Master Setup
15. Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
bmw.com 11:23
Cars 11:23
pizzahut.com 11:26
Food 11:26
honda.com 11:27
Cars 11:27
30 minutes
honda.com 11:27 Recent 6 hours: 5 Between 6 and 12 hours: 3 Between 12 hours and …
Food 11:26 Recent 6 hours: 2 Between 6 and 12 hours: 7 Between 12 hours and …
Read events of
last N days
Recency
Frequency
Others..
Behavioral Targeting Profile
11:23 11:26 11:27
16. Proprietary & Confidential. Copyright © 2014.
HBase Data Model
11:23ABCD06EFG
2014060416:site:bmw.com 2014060416:category:food
11:26
row_key: user_id
Single Column Family “u”
Column Qualifier:
<date><hour>:<type>:<value>
Cell Value: [Protobuf]
Most recent timestamp, Event details
relative to timestamp
Event details relative to 11:23 Event details relative to 11:26
• Efficient look up for a given user
• Access range of events by event date, hour and type
19. Proprietary & Confidential. Copyright © 2014.
User Profile Freshness
Strict latency requirements
Recent activity much better
predictor
Solutions -
Staggered Pipelines
Real Time Behavioral Targeting
20. Proprietary & Confidential. Copyright © 2014.
Staggered Pipelines
Extract Score Filter Upload
Extract Score Filter UploadSource Data
Extract Score Filter Upload
Extract Score Filter Upload
Extract Score Filter Upload
22. Proprietary & Confidential. Copyright © 2014.
Batched Profile
Blackbird – HBase instance tuned for 2ms latencies
Refreshed
every N hours
Real Time Behavioral Targeting
Offline BT
Pipeline
BT Profile
Ad Servers Merge Profiles
Logs
Blackbird
Online Profile
Record events for users
in real time
Request
Response
23. Proprietary & Confidential. Copyright © 2014.
Batched Updates vs. Real Time Updates
Event Granularity
Aggregated over
several hours/days
Raw recorded events
appended for recent
N hours
Processing Load
Requires minimal CPU
processing
Needs aggregation
on-the-fly
Disk Footprint
Compact
representation
captures several days
Strict limits to ensure
read times are
acceptable
Coverage All interactions
Only interactions at a
data center
Real Time Profile updated in milliseconds
Batched Profile refreshed every N hours
Batched Profile Real Time Profile
24. Proprietary & Confidential. Copyright © 2014.
Scaling Issues
3X growth in events processed/year
First Party Data
App Interactions
Geo-location Data
…
Case Studies
HBase Region Hot-spotting
Network Bandwidth Troubles
26. Proprietary & Confidential. Copyright © 2014.
HBase
Region
HBase Region Hot-spotting
High Write Load
HBase
Region
HBase
Region
Region Split (painful!)
Some users more active than others
No control on user id’s generated
Still
problematic
Non-uniform
distribution!
27. Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
Uneven write-load distribution
Non-Uniform Row Key Distribution
Salt row key’s to ensure uniform distribution
Fixed length hashed prefix
Murmur hash
based prefix
Original User ID
Uniform pre-splits
28. Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
Don’t stop at salting
Map input splits configured for region boundaries
Region 1
x03x85x1ExB8ZZZZZZ
Region 2
x07x5CxF5xC2928ZZ
Region m
xFFxAEx14xE1Z28ZZ
1234557
1234568
1234579
1234583
1234594
..
..
..
..
ZZAHT654
ZZZGT934
ZZZZNGA2
ZZZZKLO1
Key
Partitioner
‘k’ splits ‘m’ regions‘m’ splits
x01x85x1ExB811ZKL1
x01x86x1ExB8129542
..
x03x85x1ExB8ZZZKL1
x05x35x9Ex18087KL1
x06x86x1ExB8AHV24
..
x07x5CxF5xC16534Z
xEBx27x92x1508RKL1
xFEx86x1ExB8AHV24
..
xFFxAEx14x126534Z
29. Proprietary & Confidential. Copyright © 2014.
HBase Key Partitioner
As many splits as regions to maximize parallelism
Key Partitioner (MR) –
Reads region boundaries of HBase table
Salts and sorts row key accordingly
Multiple Output Format to optimize reduce phase
Each generated split file corresponds to a single region
Drastically reduces read latencies
32. Proprietary & Confidential. Copyright © 2014.
Network Bandwidth Constraints
Consistently overshot bandwidth limit during uploads
All sorts of delays (Redis, MySQL, Blackbird…)
Bidding hampered
33. Proprietary & Confidential. Copyright © 2014.
Solutions
Intelligent storage – protobufs everywhere
Throttle writes
Geo-splitting
35. Proprietary & Confidential. Copyright © 2014.
Geo-splitting
Tag user’s location history & predict future data center visits
⨍(dc, geo_history, bt_profile)
A separate workflow periodically generates geo-split rules:
Clusters users & analyzes migration patterns
Ensures maximal look-up coverage of profiles
Minimizes total number of profiles stored
Ensures efficient use of resources, with minimal impact on perf
36. Proprietary & Confidential. Copyright © 2014.
Geo-splitting
Label Data
Train
Model
Back Test
Calibrate
Training
Events
Pixel
Stream
Ad Logs
BT Features
(HBase)
Feature
Generation
Score
Profiles
Profile
Generation
Scoring
Ad Serving Data Centers Model
Cluster
Users
Analyze
Patterns
Generate
Rules
Geo-split
38. Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
Break pipeline into short payloads
Fail fast, recover fast!
Actionable alerts, cut down noise
39. Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
Materialize data as frequently as possible
Cross system fault tolerance
Idempotency
Backfill at EOD to plug holes if needed
46. Proprietary & Confidential. Copyright © 2014.
Questions ?
Thank You!
Sivasankaran Chandrasekar
chandra@rocketfuel.com
Savin Goyal
savin@rocketfuel.com
47. Proprietary & Confidential. Copyright © 2014.
We are hiring! (as always)
http://rocketfuel.com/careers
savin@rocketfuel.com
chandra@rocketfuel.com
Editor's Notes When an advertiser works with Rocket Fuel, it immediately has access to 145 RTB advertising supply partners, 21M sites, 20B ad serving opportunities, 3B users on 92000 devices. Real Time Auction
Selecting the right ad for each auction Automatically learning from every response & getting better
Nobody else is doing this as fast, precisely, consistently for our customers
Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Mention that it runs round the clock, handles upwards of 100 TB per day, stages vary in frequency, dependencies vary in frequency, need to play catch up, bugs Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Mention that we went from Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting Give props to the open source stack. Give props to the open source stack. Give props to the open source stack. Give props to the open source stack. Stock career images (2), probably ask recruiting Stock career images (2), probably ask recruiting