Hua Jiang and Kedar Sadekar talked about feature engineering using time rewinding in the context of Netflix Recommendations at an ML Platform meetup at LinkedIn HQ. Jan 24, 2018
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Netflix Recommendations Feature Engineering with Time Travel
1. Hua Jiang & Kedar Sadekar
Feature Engineering with Time Travel
Jan 2018
2. Netflix Scale
▪ Started streaming videos
in 2007
▪ > 117M members
▪ > 190 countries
▪ > 1000 device types
▪ A third of peak US
downstream traffic
3. Create a page of recommendations
where the titles you are
most likely to watch and enjoy are
shown on the most visible parts of
the page
We Want to Help Members Find Great Content
5. Title Ranking
Everything is a RecommendationRowSelection&Ordering
Recommendations are
driven by machine
learning algorithms
Over 80% of what
members watch comes
from our
recommendations
Image
6. • Try an idea offline using historical data to see if it
would have made better recommendations
• If it would, deploy a live A/B test to see if it performs
well in production
Running Experiments
7. Design Experiment
Collect Label Dataset
Offline Feature
Generation
Model Training
Compute
Validation Metrics
Model Testing
Design a New Experiment to Test Out Different Ideas
Offline
Experiment
Online
System
Online
AB Testing
Running Experiments
8. Offline Feature Generation - Feature Logging
• How to get features for offline training?
▪ Feature logging
▪ Pros: no need to compute offline
▪ Cons: delayed iteration for new features
Feature
Generator
Viewing
History
Service
MyList
Service
Thumbs
Service
Predictor
Facts Features
Log these
Recommendations
9. • How to get features for offline training?
Feature
Generator
Viewing
History
Service
MyList
Service
Thumbs
Service
Predictor
Facts Features
Log these
▪ Fact logging + time travel
Recommendations
Offline Feature Generation - Fact Logging
11. Offline Feature Generation - Time Travel
Training Data
Fact Store Fact Store Fact Store
Feature
Encoders
Prepared
Data
12. S3
Snapshot
DeLorean: Offline
Feature Generation
Online Ranking /
Scoring Service
Model Training /
Validation / Testing
Offline Experiment
Online SystemViewing
History
Service
My List
Service
Ratings
Service
Online Feature
Generation
Deploy
models
Shared Feature
Encoders
Feature Generation - Online/Offline Parity
13. Feature Generation - Online/Offline Parity
Required
Data
Shared Feature
Encoders
Viewing History
Service
My List Service
Ratings
Service
Fact
Stores
14. Feature Generation from Logged Facts
S3
Snapshot
Model Training
Features
Training Data
Feature Model
Feature Encoders
Required
Features Data
Features
Fact Data
Required Data
1
3
42
5
6
15. Offline Feature Generation - Fact Logging
• Fact
▪ Input data for feature encoders
▪ Example: viewing history of member, my list of a member
• Temporal
▪ Facts are temporal i.e. they change with time
▪ Each online scoring service uses the latest value of a fact
16. Offline Feature Generation
▪ Pull based: Fact logging + offline feature generation
▪ Pros:
▪ Easy to add new features
▪ Cons:
▪ Not temporally accurate data
▪ Additional load on micro-services serving facts
• How to obtain features for offline training?
17. S3
User
Sets
Data
Snapshots Runs once
a day
S3
Snapshot
Viewing
History
Service
MyList
Service
Thumbs
Rating
Service
Snapshot data for
each user
Parquet
Feature Generation: Fact Logging - Pull based
Pull
18. Offline Feature Generation
▪ Push based: Fact logging + offline feature generation
▪ Pros:
▪ Easy to add new features.
▪ Temporally accurate facts
▪ Algorithm controls who to log
▪ Cons:
▪ Scale challenges
• How to obtain features for offline training?
22. Offline Feature Generation
▪ Read characteristics and performance
▪ 20-30 times more data being logged. S3 suited for storage
▪ Training data accessed is 1-5%, suited for point-query lookups
▪ Solution
▪ Hot cache for read performance
▪ Spark API thats seamless irrespective of access from cache or
from S3
• Push based Fact logging: Challenges
24. Conclusions
● Offline feature generation
○ Feature logging vs. fact logging
○ Time travel to retrieve fact
○ Online / offline parity
25. Conclusions ..
● Fact logging enabled
○ Time travel
○ Algorithm service controls when / who to log
○ Temporally accurate data
○ Log request level data used in computation
○ Scale to a very large number of members