SlideShare una empresa de Scribd logo
1 de 76
Online Advertising and 
Large-scale model fitting 
Wush Wu 
2014-10-24
Outline 
● Introduction of Online Advertising 
● Handling Real Data 
– Data Engineering 
– Model Matrix 
– Enhance Computation Speed of R 
● Fitting Model to Large Scale Data 
– Batch Algorithm – Parallelizing Existed Algorithm 
– Online Algorithm – SGD, FTPRL and Learning Rate Schema 
● Display Advertising Challenge
Ad Formats – Pre Roll Video Ads
Ad Formats – Banner/ Display Ads
Adwords Search Ads
Related Content Ads
Online Advertising is Growing 
Rapidly
Why Online Advertising is Growing? 
● Wide reach 
● Target oriented 
● Quick conversion 
● Highly informative 
● Cost-effective 
● Easy to use 
Measurable 
Half the money I spend on advertising is wasted; the trouble is I 
don't know which half.
How do we measure the online ad? 
● The user behavior on the internet is trackable. 
– We know who watches the ad. 
– We know who buys the product. 
● We collect data for measurement.
How do we collect the data?
Performance-based advertising 
● Pricing Model 
– Cost-Per-Mille (CPM) 
– Cost-Per-Click (CPC) 
– Cost-Per-Action (CPA) or Cost-Per-Order (CPO)
To Improve Profit 
● Display the ad with high Click-Through Rate(CTR) * CPC, or 
Conversion Rate (CVR) * CPO 
● Estimation of the probability of click (conversion) is the central 
problem 
– Rule Based 
– Statistical Modeling (Machine Learning) 
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 
12 
10 
8 
6 
4 
2 
0
System 
Website Ad Request 
Recommendation 
Website Ad Delivering 
Log Server 
Batch 
Model Fitting 
Online
Rule Based 
● Let the advertiser selects the target group 
XX
Statistical Modeling 
● We log the display and collect the response 
● Features 
– Ad 
– Channel 
– User
Features of Ad 
● Ad type 
– Text 
– Figure 
– Video 
● Ad Content 
– Fashion 
– Health 
– Game
Features of Channel 
● Visibility
Features of User 
● Sex 
● Age 
● Location 
● Behavior
Real Features 
Zhang, Weinan and Yuan, Shuai and Wang, Jun and Shen, Xuehua. Real-Time Bidding Benchmarking 
with iPinYou Dataset
Know How v.s. Know Why 
● We usually do not study the reason of high CTR 
● Little improvement of accuracy implies large improvement of 
profit 
● Predictive Analysis
Data 
● School 
– Static 
– Cleaned 
– Public 
● Commercial 
– Dynamic 
– Error 
– Private
Data Engineering 
Impression 
Click 
CLICK_TIME CLIENT_IP CLICKED ADID 
2014/05/17 ... 2.17.x.x 133594 
2014/05/17 ... 140.112.x.x 134811 
+
Data Engineering with R 
http://wush978.github.io/REngineering/ 
● Automation of R Jobs 
– Convert R script to command line application 
– Learn modern tools such as jenkins 
● Connections between multiple machine 
– Learn ssh 
● Logging 
– Linux tools: bash redirection, tee 
– R package: logging 
● R Error Handling 
– try, tryCatch
Characteristic of Data 
● Rare Event 
● Large Amount of Categorical Features 
– Binning Numerical Features 
● Features are highly correlated 
● Some features occurs frequently, some occurs rarely
Common Statistical Model for CTR 
● Logistic Regression ● Gradient Boosted Regression 
Tree 
– Check xgboost
Logistic Regression 
P(Click| x)= 1 
1+e−wT x=σ(wT x) 
● Linear relationship with features 
– Fast prediction 
– (Relative) Fast Fitting 
● Usually fit the model with L2 regularization
How large is the data? 
● Instances: 10^9 
● Binary features: 10^5
Subsampling 
● Sampling is useful for: 
– Data exploration 
– Code testing 
● Sampling might harm the accuracy (profit) 
– Rare event 
– Some features occurs frequently and some occurs rarely 
● We do not subsample data so far
Sampling 
● Olivier Chapelle, et. al. Simple and scalable response prediction 
for display advertising.
Computation 
P(Click| x)= 1 
wT x 
1+e−wT x
Model Matrix 
head(model.matrix(Species ~ ., iris))
Dense Matrix 
● 10^9 instances 
● 10^5 binary features 
● 10^14 elements for model matrix 
● Size: 4 * 10^14 bytes 
– 400 TB 
● In memory is about 10^3 faster than on disk
R and Large Scale Data 
● R cannot handle large scale data 
● R consumes lots of memory
Sparse Matrix
Sparse Matrix 
A∈ℝm×n and k nonzero elements 
Dense Matrix: 
[1 0 1 0 
0 0 0 0 
0 0 0 0 
0 0 1 0] requires 4 mn size 
List: 
(1, 1,1) ,(1, 3,1) ,(4, 3,1) requires 12k size 
Compressed List: 
i :{1, 3,3 }, p: {2, 0, 0,1}, x :{1, 1,1} requires 8 k+4m size 
j :{1,1, 4 }, p:{1, 0, 2,0}, x :{1,1,1} requires 8 k+4 n size
Sparse Matrix 
● The size of non-zero could be estimated by the number of 
categorical variable 
m∼109 
n∼105 
k∼101×109 
Dense Matrix: 4×1014 
List: 12×109 
Compressed: 12×109 or 8×109+4×105
Sparse Matrix 
● Sparse matrix is useful for: 
– Large amount of categorical data 
– Text Analysis 
– Tag Analysis
R package: Matrix 
m1 <- matrix(0, 5, 5);m1[1, 4] <- 1 
m1 
library(Matrix) 
m2 <- Matrix(0, 5, 5, sparse=TRUE) 
m2[1,4] <- 1 
m2
Computation Speed 
m1 <- matrix(0, 5, 5);m1[1, 4] <- 1 
library(Matrix) 
m2 <- Matrix(0, 5, 5, sparse=TRUE) 
m2[1,4] <- 1
Advanced tips: package Rcpp 
● C/C++ uses memory more efficiently 
● Rcpp provides easy interface for R and C/C++ 
#include <Rcpp.h> 
using namespace Rcpp; 
// [[Rcpp::export]] 
SEXP XTv(S4 m, NumericVector v, NumericVector& 
retval) { 
//... 
}
Two approach of fitting logistic 
regression to large-scaled data 
● Batch Algorithm 
– Optimize the log likelihood 
globally 
● Online Algorithm 
– Optimize the loss function 
instance per instance
Batch Algorithm 
Negative Loglikelihood: 
f (w∣( x1, y1) ,⋯,( xm , ym)) 
m 
−yt log(σ(wT xt))−(1−yt)log(1−σ(wT xt)) 
=Σt 
=1 
Gradient Decent: 
wt+1=wt−η∇ f (wt) 
Each update requires scanning all data
Parallelize Existed Batch Algorithm 
Rowise Partition 
(X1 
X2)v=(X1 v 
X2 v) 
(v1 v2)(X1 
X2)=v1 X1+v2 X2 
● We could split data by instances to several machines 
● The matrix-vector multiplication could be parallelized
Framework of Parallelization 
● Hadoop 
– Slow for iterative 
algorithm 
– False tolerance 
– Good for many machines 
● MPI 
– If in memory, fast for 
iterative algorithm 
– No false tolerance 
– Good for several machines
R Package: pbdMPI
R Package: pbdMPI 
● Easy to install (on ubuntu) 
– sudo apt-get install openmpi-bin openmpi-common 
libopenmpi-dev 
– install.packages("pbdMPI") 
● Easy to develop (compared to Rmpi)
R Package: pbdMPI 
library(pbdMPI) 
.rank <- comm.rank() 
filename <- sprintf("%d.csv", .rank) 
data <- read.csv(filename) 
target <- reduce(sum(data$value), op="sum") 
finalize()
Parallelize Algorithm with pbdMPI 
● Implement functions required for optimization with pbdMPI 
– optim requires f and g (gradient of f) 
– nlminb requires f, g, and H(hessian of f) 
– tron requires f, g, and Hs(H multiply a given vector s)
Some Tips of Optimization 
● Take care of stopping criteria 
– A relative threshold might be enough 
● Save the coefficient during iteration and print the value of f and 
g with operator <<- 
– You can stop the iteration anytime 
– Monitor the convergence
Overview
LinkedIn Way 
Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013 
● Too many data to fit in single machine 
– Billions of observations, million of features 
● A Naive Approach 
– Partition the data and run logistic regression for each partition 
– Take the mean of the learned coefficients 
– Problem: Not guaranteed to converge to the model from single machine! 
● Alternating Direction Method of Multipliers (ADMM) 
– Boyd et al. 2011 (based on earlier work from the 70s)
ADMM 
For each nodes, the data and coefficients are different 
K 
f k (wk )+λ2∥w∥2 2 
Σk 
=1 
subject to wk=w∀k 
k =argminwk f k (wk )+ 
wt+1 
ρ2 
∥wk−wt+ut 
k∥2 2 
wt+1=argminw λ2∥w∥2 2 
+ 
ρ2 
K 
∥wt+1 
Σk 
=1 
k −w+ut 
k∥2 2 
k =ut 
ut+1 
k+wt+1 
k −wt+1
Update Coefficient 
Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013 
BBIIGG DDAATTAA 
PPaarrttiittiioonn 11 PPaarrttiittiioonn 22 PPaarrttiittiioonn 33 PPaarrttiittiioonn KK 
Logistic 
Regression 
Logistic 
Regression 
Logistic 
Regression 
Logistic 
Regression 
Consensus 
Computation
Update Regularization 
Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013 
BBIIGG DDAATTAA 
PPaarrttiittiioonn 11 PPaarrttiittiioonn 22 PPaarrttiittiioonn 33 PPaarrttiittiioonn KK 
Logistic 
Regression 
Consensus 
Computation 
Logistic 
Regression 
Logistic 
Regression 
Logistic 
Regression
Our Remark of ADMM 
● ADMM saves the communication between the nodes 
● In our environment, the overhead of communication is 
affordable 
– ADMM does not enhance the performance of our system
Online Algorithm 
Stochastic Gradient Decent(SGD): 
f (w| yt , xt ) 
=−yt log(σ(wT xt ))−(1−yt )log(1−σ(wT xt )) 
wt+1=wt−η∇ f (wt| yt , xt ) 
● Choose an initial value and learning rate 
● Randomly shuffle the instance in the training set 
● Scan the data and update the coefficient 
– Repeat until an approximate minimum is obtained
SGD to Follow The Proximal 
Regularized Leader 
H. Brendan McMahan. Follow-the-Regularized-Leader and Mirror Descent: 
Equivalence Theorems and L1 Regularization. AISTATS 2011 
wt+1 = wt−ηt∇ f (wt| yt , xt) 
= argminw∇ f (wt| yt , xt )T w+ 
1 
2ηt 
(w−wt )T (w−wt) 
t 
f (wi| yi , xi) 
Let gt=f (wt| yt , xt ) and g1 :t=Σi 
=1 
T w+t λ1‖w‖1+ 
wt+1 = argminw g1 :t 
t 
‖w−wi‖2 2 
λ2 
2 Σi 
=1
Regret of SGD 
H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for 
Online Convex Optimization. COLT 2010 
T 
f t (wt )−minwΣt 
Regret :=Σt 
=1 
T 
f t (w) 
=1 
Global learning rate achives regret bound O(D M√T ) 
D is the L2 diameter of the feasible set 
M is the L2 bound of g
Regret of SGD 
H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for 
Online Convex Optimization. COLT 2010 
Per-coordinate Learning Rate: 
ηt ,i= α 
β+√Σs 
t 
gs ,i 
=1 
2 
achieves regret bound O(√T n 
1−γ 
2 ) 
n is the dimension of w. If w∈[−0.5,0.5]n ,D=√n 
P(xt ,i=1)∼i−γ for some γ∈[1,2)
Comparison of Learning Rate Schema 
Xinran He, et. al. Practical Lessons from Predicting Clicks on Ads at Facebook. 
ADKDD 2014.
Google KDD 2013, FTPRL 
H. Brendan McMahan, et. al. Ad Click Prediction: a View from the Trenches. KDD 
2013.
Some Remark for FTPRL 
● FTPRL is a general optimization framework. 
– We used it successfully to fit neuron network 
● The per-coordinate learning rate greatly improves the 
convergence on our data 
– SGD works with per-coordinate learning rate 
● The “Proximal” part decreases the accuracy, but introduces the 
sparsity
Implementation of FTPRL in R 
● I am not aware of any implementation of online optimization in 
R 
● The algorithm is simple. Just write it with a for loop. 
● The overhead of loop is small in C/C++ compared to R 
● I implemented the algorithm in 
https://github.com/wush978/BridgewellML/tree/r-pkg 
– Call for user 
– Contact me if you want to try
FTPRL v.s. TRON
Batch v.s. Online 
Olivier Chapelle, et. al. Simple and scalable response prediction for display 
advertising. 
● Batch Algorithm 
– Optimize the likelihood 
function to a high accuracy 
once they are in a good 
neighborhood of the optimal 
solution. 
– Quite slow in reaching the 
solution 
– Straightforward to 
generalize batch learning to 
distributed environment 
● Online Algorithm (mini-batch) 
– Optimize the likelihood to 
a rough precision quite fast 
– A handful of passes over 
the data. 
– Tricky to parallelize
Criteo Inc. Hybrid of Online and Batch 
● For each node, making one online pass over its local data 
according to adaptive gradient updates. 
● Average these local weights to be the initial value of L-BFGS.
Facebook 
Xinran He, et. al. Practical Lessons from Predicting Clicks on Ads at Facebook. 
ADKDD 2014. 
● Decision Tree (Batch) for Feature Transforms 
● Logistic Regression (Online)
Data Size and Accuracy
Experiment Designs
Experiments Result
Experiment Analysis
Improving 
New Models 
New Algorithms 
New Features 
Experiments 
Analysis
Display Advertising Challenge 
● https://www.kaggle.com/c/criteo-display-ad-challenge 
● 7 * 10^7 instances 
● 13 integer features and 26 categorical features with about 3 * 
10^7 levels 
● We were 9th over 718 teams 
– We fit the neuron network (2-layer logistic regression) to the 
data with FTPRL and dropout
Dropout in SGD 
Geoffrey E. Hinton, et. al. Improving neural networks by preventing co-adaptation 
of feature detectors. CoRR 2012
Tools of Large-scale Model Fitting 
● Almost top 10 competitors were implemented algorithm by themselves 
– There is no dominant tool for large-scale model fitting 
● The winner used 20GB memory only. See 
https://github.com/guestwalk/kaggle-2014-criteo 
● For single machine, there are some good machine learning library 
– LIBLINEAR for linear model (The student in the Lab is no.1) 
– xgboost for gradient boosted regression tree (The author is no.12) 
– Vowpal Wabbit
Thanks for your listening

Más contenido relacionado

La actualidad más candente

CTR Prediction using Spark Machine Learning Pipelines
CTR Prediction using Spark Machine Learning PipelinesCTR Prediction using Spark Machine Learning Pipelines
CTR Prediction using Spark Machine Learning PipelinesManisha Sule
 
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMachine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMark Lefevre, CQF
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning FundamentalsSigOpt
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML, Inc
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...TigerGraph
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityTigerGraph
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...TigerGraph
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI AI Frontiers
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
 

La actualidad más candente (15)

CTR Prediction using Spark Machine Learning Pipelines
CTR Prediction using Spark Machine Learning PipelinesCTR Prediction using Spark Machine Learning Pipelines
CTR Prediction using Spark Machine Learning Pipelines
 
Quantitative finance in q
Quantitative finance in qQuantitative finance in q
Quantitative finance in q
 
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read JapaneseMachine Learning in q/kdb+ - Teaching KDB to Read Japanese
Machine Learning in q/kdb+ - Teaching KDB to Read Japanese
 
Machine Learning Fundamentals
Machine Learning FundamentalsMachine Learning Fundamentals
Machine Learning Fundamentals
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Kx for wine tasting
Kx for wine tastingKx for wine tasting
Kx for wine tasting
 
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
Graph Gurus Episode 9: How Visa Optimizes Network and IT Resources with a Nat...
 
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 CentralityUsing Graph Algorithms for Advanced Analytics - Part 2 Centrality
Using Graph Algorithms for Advanced Analytics - Part 2 Centrality
 
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
Using Graph Algorithms For Advanced Analytics - Part 4 Similarity 30 graph al...
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
Training at AI Frontiers 2018 - LaiOffer Data Session: How Spark Speedup AI
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
 

Destacado

TTS ORIENTATION Trainors Training School PART 1
TTS ORIENTATION Trainors Training School PART 1TTS ORIENTATION Trainors Training School PART 1
TTS ORIENTATION Trainors Training School PART 1Jose Ramos
 
Q3 Media Evaluation
Q3 Media EvaluationQ3 Media Evaluation
Q3 Media EvaluationSuther98
 
Risen week1-the jesusyouneverknew-sample
Risen week1-the jesusyouneverknew-sampleRisen week1-the jesusyouneverknew-sample
Risen week1-the jesusyouneverknew-sampleJose Ramos
 
Disadvantages Of Scaling Content: A Slideshow
Disadvantages Of Scaling Content: A SlideshowDisadvantages Of Scaling Content: A Slideshow
Disadvantages Of Scaling Content: A Slideshowdesigns.codes
 
Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...
Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...
Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...Pefkoula Stagia
 
Uudet teknologiat kuntoutuksen tukena –games for health
Uudet teknologiat kuntoutuksen tukena –games for healthUudet teknologiat kuntoutuksen tukena –games for health
Uudet teknologiat kuntoutuksen tukena –games for healthGames for Health Finland
 
2016 choose april 10-kick-off
2016 choose april 10-kick-off2016 choose april 10-kick-off
2016 choose april 10-kick-offJose Ramos
 
Операції банків в іноземній валюті
Операції банків в іноземній валютіОперації банків в іноземній валюті
Операції банків в іноземній валютіНастя Корабахина
 
Run with the vision ggmd 2016 JRs
Run with the vision ggmd 2016 JRsRun with the vision ggmd 2016 JRs
Run with the vision ggmd 2016 JRsJose Ramos
 
Evaluation Q4
Evaluation Q4Evaluation Q4
Evaluation Q4Suther98
 
The 2nd Games for Health Finland event - Heli Vehkala
The 2nd Games for Health Finland event - Heli VehkalaThe 2nd Games for Health Finland event - Heli Vehkala
The 2nd Games for Health Finland event - Heli VehkalaGames for Health Finland
 
利用免費服務建立R的持續整合環境
利用免費服務建立R的持續整合環境利用免費服務建立R的持續整合環境
利用免費服務建立R的持續整合環境Wush Wu
 
Railing system with wire rope fitting
Railing system with wire rope fittingRailing system with wire rope fitting
Railing system with wire rope fittingParesh Vekariya
 

Destacado (20)

iOS
iOSiOS
iOS
 
Avatar power point
Avatar power pointAvatar power point
Avatar power point
 
TTS ORIENTATION Trainors Training School PART 1
TTS ORIENTATION Trainors Training School PART 1TTS ORIENTATION Trainors Training School PART 1
TTS ORIENTATION Trainors Training School PART 1
 
Welcome
WelcomeWelcome
Welcome
 
Q3 Media Evaluation
Q3 Media EvaluationQ3 Media Evaluation
Q3 Media Evaluation
 
Risen week1-the jesusyouneverknew-sample
Risen week1-the jesusyouneverknew-sampleRisen week1-the jesusyouneverknew-sample
Risen week1-the jesusyouneverknew-sample
 
Disadvantages Of Scaling Content: A Slideshow
Disadvantages Of Scaling Content: A SlideshowDisadvantages Of Scaling Content: A Slideshow
Disadvantages Of Scaling Content: A Slideshow
 
Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...
Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...
Οι δραστηριότητες των παιδιών στους ανοιχτούς χώρους στη Νέα Μηχανιώνα από το...
 
Risen 1
Risen 1Risen 1
Risen 1
 
Uudet teknologiat kuntoutuksen tukena –games for health
Uudet teknologiat kuntoutuksen tukena –games for healthUudet teknologiat kuntoutuksen tukena –games for health
Uudet teknologiat kuntoutuksen tukena –games for health
 
2016 choose april 10-kick-off
2016 choose april 10-kick-off2016 choose april 10-kick-off
2016 choose april 10-kick-off
 
Операції банків в іноземній валюті
Операції банків в іноземній валютіОперації банків в іноземній валюті
Операції банків в іноземній валюті
 
Run with the vision ggmd 2016 JRs
Run with the vision ggmd 2016 JRsRun with the vision ggmd 2016 JRs
Run with the vision ggmd 2016 JRs
 
тема 5
тема 5тема 5
тема 5
 
Evaluation Q4
Evaluation Q4Evaluation Q4
Evaluation Q4
 
Welcome
WelcomeWelcome
Welcome
 
FawziShammari
FawziShammariFawziShammari
FawziShammari
 
The 2nd Games for Health Finland event - Heli Vehkala
The 2nd Games for Health Finland event - Heli VehkalaThe 2nd Games for Health Finland event - Heli Vehkala
The 2nd Games for Health Finland event - Heli Vehkala
 
利用免費服務建立R的持續整合環境
利用免費服務建立R的持續整合環境利用免費服務建立R的持續整合環境
利用免費服務建立R的持續整合環境
 
Railing system with wire rope fitting
Railing system with wire rope fittingRailing system with wire rope fitting
Railing system with wire rope fitting
 

Similar a Online advertising and large scale model fitting

Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analyticsCollin Bennett
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013Sanjeev Mishra
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systemsOlivier Teytaud
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingCrossing Minds
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchArtemSunfun
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksBen Ball
 
Learning to Compose Domain-Specific Transformations for Data Augmentation
Learning to Compose Domain-Specific Transformations for Data AugmentationLearning to Compose Domain-Specific Transformations for Data Augmentation
Learning to Compose Domain-Specific Transformations for Data AugmentationTatsuya Shirakawa
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsKun Liu
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - SlidesAditya Joshi
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataWeCloudData
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Manjunath Sindagi
 

Similar a Online advertising and large scale model fitting (20)

Building and deploying analytics
Building and deploying analyticsBuilding and deploying analytics
Building and deploying analytics
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Silicon valleycodecamp2013
Silicon valleycodecamp2013Silicon valleycodecamp2013
Silicon valleycodecamp2013
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
kdd2015
kdd2015kdd2015
kdd2015
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Planning for power systems
Planning for power systemsPlanning for power systems
Planning for power systems
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Learning to Compose Domain-Specific Transformations for Data Augmentation
Learning to Compose Domain-Specific Transformations for Data AugmentationLearning to Compose Domain-Specific Transformations for Data Augmentation
Learning to Compose Domain-Specific Transformations for Data Augmentation
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
Analytics Boot Camp - Slides
Analytics Boot Camp - SlidesAnalytics Boot Camp - Slides
Analytics Boot Camp - Slides
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)
 

Más de Wush Wu

Predicting winning price in real time bidding
Predicting winning price in real time biddingPredicting winning price in real time bidding
Predicting winning price in real time biddingWush Wu
 
機器學習的技術債
機器學習的技術債機器學習的技術債
機器學習的技術債Wush Wu
 
社群對我職涯的影響
社群對我職涯的影響社群對我職涯的影響
社群對我職涯的影響Wush Wu
 
R 語言上手篇
R 語言上手篇R 語言上手篇
R 語言上手篇Wush Wu
 
Predicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataPredicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataWush Wu
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature HashingWush Wu
 
R, Git, Github, and CI
R, Git, Github, and CIR, Git, Github, and CI
R, Git, Github, and CIWush Wu
 

Más de Wush Wu (7)

Predicting winning price in real time bidding
Predicting winning price in real time biddingPredicting winning price in real time bidding
Predicting winning price in real time bidding
 
機器學習的技術債
機器學習的技術債機器學習的技術債
機器學習的技術債
 
社群對我職涯的影響
社群對我職涯的影響社群對我職涯的影響
社群對我職涯的影響
 
R 語言上手篇
R 語言上手篇R 語言上手篇
R 語言上手篇
 
Predicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored DataPredicting Winning Price in Real Time Bidding with Censored Data
Predicting Winning Price in Real Time Bidding with Censored Data
 
Introduction of Feature Hashing
Introduction of Feature HashingIntroduction of Feature Hashing
Introduction of Feature Hashing
 
R, Git, Github, and CI
R, Git, Github, and CIR, Git, Github, and CI
R, Git, Github, and CI
 

Último

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 

Último (20)

Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

Online advertising and large scale model fitting

  • 1. Online Advertising and Large-scale model fitting Wush Wu 2014-10-24
  • 2. Outline ● Introduction of Online Advertising ● Handling Real Data – Data Engineering – Model Matrix – Enhance Computation Speed of R ● Fitting Model to Large Scale Data – Batch Algorithm – Parallelizing Existed Algorithm – Online Algorithm – SGD, FTPRL and Learning Rate Schema ● Display Advertising Challenge
  • 3. Ad Formats – Pre Roll Video Ads
  • 4. Ad Formats – Banner/ Display Ads
  • 7. Online Advertising is Growing Rapidly
  • 8. Why Online Advertising is Growing? ● Wide reach ● Target oriented ● Quick conversion ● Highly informative ● Cost-effective ● Easy to use Measurable Half the money I spend on advertising is wasted; the trouble is I don't know which half.
  • 9. How do we measure the online ad? ● The user behavior on the internet is trackable. – We know who watches the ad. – We know who buys the product. ● We collect data for measurement.
  • 10. How do we collect the data?
  • 11. Performance-based advertising ● Pricing Model – Cost-Per-Mille (CPM) – Cost-Per-Click (CPC) – Cost-Per-Action (CPA) or Cost-Per-Order (CPO)
  • 12. To Improve Profit ● Display the ad with high Click-Through Rate(CTR) * CPC, or Conversion Rate (CVR) * CPO ● Estimation of the probability of click (conversion) is the central problem – Rule Based – Statistical Modeling (Machine Learning) 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 12 10 8 6 4 2 0
  • 13. System Website Ad Request Recommendation Website Ad Delivering Log Server Batch Model Fitting Online
  • 14. Rule Based ● Let the advertiser selects the target group XX
  • 15. Statistical Modeling ● We log the display and collect the response ● Features – Ad – Channel – User
  • 16. Features of Ad ● Ad type – Text – Figure – Video ● Ad Content – Fashion – Health – Game
  • 17. Features of Channel ● Visibility
  • 18. Features of User ● Sex ● Age ● Location ● Behavior
  • 19. Real Features Zhang, Weinan and Yuan, Shuai and Wang, Jun and Shen, Xuehua. Real-Time Bidding Benchmarking with iPinYou Dataset
  • 20. Know How v.s. Know Why ● We usually do not study the reason of high CTR ● Little improvement of accuracy implies large improvement of profit ● Predictive Analysis
  • 21. Data ● School – Static – Cleaned – Public ● Commercial – Dynamic – Error – Private
  • 22. Data Engineering Impression Click CLICK_TIME CLIENT_IP CLICKED ADID 2014/05/17 ... 2.17.x.x 133594 2014/05/17 ... 140.112.x.x 134811 +
  • 23. Data Engineering with R http://wush978.github.io/REngineering/ ● Automation of R Jobs – Convert R script to command line application – Learn modern tools such as jenkins ● Connections between multiple machine – Learn ssh ● Logging – Linux tools: bash redirection, tee – R package: logging ● R Error Handling – try, tryCatch
  • 24. Characteristic of Data ● Rare Event ● Large Amount of Categorical Features – Binning Numerical Features ● Features are highly correlated ● Some features occurs frequently, some occurs rarely
  • 25. Common Statistical Model for CTR ● Logistic Regression ● Gradient Boosted Regression Tree – Check xgboost
  • 26. Logistic Regression P(Click| x)= 1 1+e−wT x=σ(wT x) ● Linear relationship with features – Fast prediction – (Relative) Fast Fitting ● Usually fit the model with L2 regularization
  • 27. How large is the data? ● Instances: 10^9 ● Binary features: 10^5
  • 28. Subsampling ● Sampling is useful for: – Data exploration – Code testing ● Sampling might harm the accuracy (profit) – Rare event – Some features occurs frequently and some occurs rarely ● We do not subsample data so far
  • 29. Sampling ● Olivier Chapelle, et. al. Simple and scalable response prediction for display advertising.
  • 30. Computation P(Click| x)= 1 wT x 1+e−wT x
  • 32. Dense Matrix ● 10^9 instances ● 10^5 binary features ● 10^14 elements for model matrix ● Size: 4 * 10^14 bytes – 400 TB ● In memory is about 10^3 faster than on disk
  • 33. R and Large Scale Data ● R cannot handle large scale data ● R consumes lots of memory
  • 35. Sparse Matrix A∈ℝm×n and k nonzero elements Dense Matrix: [1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0] requires 4 mn size List: (1, 1,1) ,(1, 3,1) ,(4, 3,1) requires 12k size Compressed List: i :{1, 3,3 }, p: {2, 0, 0,1}, x :{1, 1,1} requires 8 k+4m size j :{1,1, 4 }, p:{1, 0, 2,0}, x :{1,1,1} requires 8 k+4 n size
  • 36. Sparse Matrix ● The size of non-zero could be estimated by the number of categorical variable m∼109 n∼105 k∼101×109 Dense Matrix: 4×1014 List: 12×109 Compressed: 12×109 or 8×109+4×105
  • 37. Sparse Matrix ● Sparse matrix is useful for: – Large amount of categorical data – Text Analysis – Tag Analysis
  • 38. R package: Matrix m1 <- matrix(0, 5, 5);m1[1, 4] <- 1 m1 library(Matrix) m2 <- Matrix(0, 5, 5, sparse=TRUE) m2[1,4] <- 1 m2
  • 39. Computation Speed m1 <- matrix(0, 5, 5);m1[1, 4] <- 1 library(Matrix) m2 <- Matrix(0, 5, 5, sparse=TRUE) m2[1,4] <- 1
  • 40. Advanced tips: package Rcpp ● C/C++ uses memory more efficiently ● Rcpp provides easy interface for R and C/C++ #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] SEXP XTv(S4 m, NumericVector v, NumericVector& retval) { //... }
  • 41. Two approach of fitting logistic regression to large-scaled data ● Batch Algorithm – Optimize the log likelihood globally ● Online Algorithm – Optimize the loss function instance per instance
  • 42. Batch Algorithm Negative Loglikelihood: f (w∣( x1, y1) ,⋯,( xm , ym)) m −yt log(σ(wT xt))−(1−yt)log(1−σ(wT xt)) =Σt =1 Gradient Decent: wt+1=wt−η∇ f (wt) Each update requires scanning all data
  • 43. Parallelize Existed Batch Algorithm Rowise Partition (X1 X2)v=(X1 v X2 v) (v1 v2)(X1 X2)=v1 X1+v2 X2 ● We could split data by instances to several machines ● The matrix-vector multiplication could be parallelized
  • 44. Framework of Parallelization ● Hadoop – Slow for iterative algorithm – False tolerance – Good for many machines ● MPI – If in memory, fast for iterative algorithm – No false tolerance – Good for several machines
  • 46. R Package: pbdMPI ● Easy to install (on ubuntu) – sudo apt-get install openmpi-bin openmpi-common libopenmpi-dev – install.packages("pbdMPI") ● Easy to develop (compared to Rmpi)
  • 47. R Package: pbdMPI library(pbdMPI) .rank <- comm.rank() filename <- sprintf("%d.csv", .rank) data <- read.csv(filename) target <- reduce(sum(data$value), op="sum") finalize()
  • 48. Parallelize Algorithm with pbdMPI ● Implement functions required for optimization with pbdMPI – optim requires f and g (gradient of f) – nlminb requires f, g, and H(hessian of f) – tron requires f, g, and Hs(H multiply a given vector s)
  • 49. Some Tips of Optimization ● Take care of stopping criteria – A relative threshold might be enough ● Save the coefficient during iteration and print the value of f and g with operator <<- – You can stop the iteration anytime – Monitor the convergence
  • 51. LinkedIn Way Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013 ● Too many data to fit in single machine – Billions of observations, million of features ● A Naive Approach – Partition the data and run logistic regression for each partition – Take the mean of the learned coefficients – Problem: Not guaranteed to converge to the model from single machine! ● Alternating Direction Method of Multipliers (ADMM) – Boyd et al. 2011 (based on earlier work from the 70s)
  • 52. ADMM For each nodes, the data and coefficients are different K f k (wk )+λ2∥w∥2 2 Σk =1 subject to wk=w∀k k =argminwk f k (wk )+ wt+1 ρ2 ∥wk−wt+ut k∥2 2 wt+1=argminw λ2∥w∥2 2 + ρ2 K ∥wt+1 Σk =1 k −w+ut k∥2 2 k =ut ut+1 k+wt+1 k −wt+1
  • 53. Update Coefficient Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013 BBIIGG DDAATTAA PPaarrttiittiioonn 11 PPaarrttiittiioonn 22 PPaarrttiittiioonn 33 PPaarrttiittiioonn KK Logistic Regression Logistic Regression Logistic Regression Logistic Regression Consensus Computation
  • 54. Update Regularization Deepak Agarwal. Computational Advertising: The LinkedIn Way. CIKM 2013 BBIIGG DDAATTAA PPaarrttiittiioonn 11 PPaarrttiittiioonn 22 PPaarrttiittiioonn 33 PPaarrttiittiioonn KK Logistic Regression Consensus Computation Logistic Regression Logistic Regression Logistic Regression
  • 55. Our Remark of ADMM ● ADMM saves the communication between the nodes ● In our environment, the overhead of communication is affordable – ADMM does not enhance the performance of our system
  • 56. Online Algorithm Stochastic Gradient Decent(SGD): f (w| yt , xt ) =−yt log(σ(wT xt ))−(1−yt )log(1−σ(wT xt )) wt+1=wt−η∇ f (wt| yt , xt ) ● Choose an initial value and learning rate ● Randomly shuffle the instance in the training set ● Scan the data and update the coefficient – Repeat until an approximate minimum is obtained
  • 57. SGD to Follow The Proximal Regularized Leader H. Brendan McMahan. Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization. AISTATS 2011 wt+1 = wt−ηt∇ f (wt| yt , xt) = argminw∇ f (wt| yt , xt )T w+ 1 2ηt (w−wt )T (w−wt) t f (wi| yi , xi) Let gt=f (wt| yt , xt ) and g1 :t=Σi =1 T w+t λ1‖w‖1+ wt+1 = argminw g1 :t t ‖w−wi‖2 2 λ2 2 Σi =1
  • 58. Regret of SGD H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for Online Convex Optimization. COLT 2010 T f t (wt )−minwΣt Regret :=Σt =1 T f t (w) =1 Global learning rate achives regret bound O(D M√T ) D is the L2 diameter of the feasible set M is the L2 bound of g
  • 59. Regret of SGD H. Brendan McMahan and Matthew Streeter. Adaptive Bound Optimization for Online Convex Optimization. COLT 2010 Per-coordinate Learning Rate: ηt ,i= α β+√Σs t gs ,i =1 2 achieves regret bound O(√T n 1−γ 2 ) n is the dimension of w. If w∈[−0.5,0.5]n ,D=√n P(xt ,i=1)∼i−γ for some γ∈[1,2)
  • 60. Comparison of Learning Rate Schema Xinran He, et. al. Practical Lessons from Predicting Clicks on Ads at Facebook. ADKDD 2014.
  • 61. Google KDD 2013, FTPRL H. Brendan McMahan, et. al. Ad Click Prediction: a View from the Trenches. KDD 2013.
  • 62. Some Remark for FTPRL ● FTPRL is a general optimization framework. – We used it successfully to fit neuron network ● The per-coordinate learning rate greatly improves the convergence on our data – SGD works with per-coordinate learning rate ● The “Proximal” part decreases the accuracy, but introduces the sparsity
  • 63. Implementation of FTPRL in R ● I am not aware of any implementation of online optimization in R ● The algorithm is simple. Just write it with a for loop. ● The overhead of loop is small in C/C++ compared to R ● I implemented the algorithm in https://github.com/wush978/BridgewellML/tree/r-pkg – Call for user – Contact me if you want to try
  • 65. Batch v.s. Online Olivier Chapelle, et. al. Simple and scalable response prediction for display advertising. ● Batch Algorithm – Optimize the likelihood function to a high accuracy once they are in a good neighborhood of the optimal solution. – Quite slow in reaching the solution – Straightforward to generalize batch learning to distributed environment ● Online Algorithm (mini-batch) – Optimize the likelihood to a rough precision quite fast – A handful of passes over the data. – Tricky to parallelize
  • 66. Criteo Inc. Hybrid of Online and Batch ● For each node, making one online pass over its local data according to adaptive gradient updates. ● Average these local weights to be the initial value of L-BFGS.
  • 67. Facebook Xinran He, et. al. Practical Lessons from Predicting Clicks on Ads at Facebook. ADKDD 2014. ● Decision Tree (Batch) for Feature Transforms ● Logistic Regression (Online)
  • 68. Data Size and Accuracy
  • 72. Improving New Models New Algorithms New Features Experiments Analysis
  • 73. Display Advertising Challenge ● https://www.kaggle.com/c/criteo-display-ad-challenge ● 7 * 10^7 instances ● 13 integer features and 26 categorical features with about 3 * 10^7 levels ● We were 9th over 718 teams – We fit the neuron network (2-layer logistic regression) to the data with FTPRL and dropout
  • 74. Dropout in SGD Geoffrey E. Hinton, et. al. Improving neural networks by preventing co-adaptation of feature detectors. CoRR 2012
  • 75. Tools of Large-scale Model Fitting ● Almost top 10 competitors were implemented algorithm by themselves – There is no dominant tool for large-scale model fitting ● The winner used 20GB memory only. See https://github.com/guestwalk/kaggle-2014-criteo ● For single machine, there are some good machine learning library – LIBLINEAR for linear model (The student in the Lab is no.1) – xgboost for gradient boosted regression tree (The author is no.12) – Vowpal Wabbit
  • 76. Thanks for your listening