SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
All the data and still not enough!
Claudia Perlich Chief Scientist
@claudia_perlich
Predictive Modeling:
Algorithms that LearnFunctions
Income Age Buy
123,000 30 yes
51,100 40 yes
68,000 55 no
74,000 46 no
23,000 47 yes
100,000 49 no
Data forPredictiveModeling
Target
Examples
Features
?
yes
yes
no
no
yes
no
RulesforPredictiveModeling
Target
Examples
Features
 Data should be:
 Large enough
 Independently Identically Distributed
Paradox of BigData:
“Youneverhave thedatayouwant”
Art of making due with second best
IBM:SalesForceOptimization
WalletisNEVERobserved
We observe
this in the
data
But we do not
observe this
IBM Sales to
this Company
Company Revenue (D&B)
Wallet/Opportunity
How can we make this a
predictive modeling problem?
Wallet
10
5
31
17
39
4
Data forWalletEstimation?
Target
Examples
Features
9
REALISTICWalletsas quantiles
 Motivation
 Imagine 100 identical firms with identical IT needs
 Consider the distribution of the IBM sales to these firms
 Bottom firms should spend as much as the top
 Define wallet as high percentile of spending conditional on the customer
attributes
Frequency
IBM Sales
Wallet Estimate
Revenue
10
5
31
17
39
4
Data forWalletEstimation
Target
Examples
Features
QuantileRegressionoptimizing
weightedabsoluteloss
10 20 30 40 50 60 70 80
1
2
3
4
5
6
7
8
9
Firm Sales
IBMRevenue
Company Sales
IBMRevenue
Opportunity for C 2
Opportunity for C 1
C1
C2
10 20 30 40 50 60 70 80
1
2
3
4
5
6
7
8
9
Firm Sales
IBMRevenue
Company Sales
IBMRevenue
Opportunity for C 2
Opportunity for C
1
C1
C2
10 20 30 40 50 60 70 80
1
2
3
4
5
6
7
8
9
10 20 30 40 50 60 70 80
1
2
3
4
5
6
7
8
9
Firm Sales
IBMRevenue
Company Sales
IBMRevenue
Opportunity for C 2
Opportunity for C 1
C1
C2
10 20 30 40 50 60 70 80
1
2
3
4
5
6
7
8
9
20 30 40 50 60 70 80
1
2
3
4
5
6
7
8
9
Firm Sales
IBMRevenue
Company Sales
IBMRevenue
Opportunity for C 2
Opportunity for C
1
C1
C2
MedicalDiagnosis:BrestCancer
© IBM Corporation 2008
Slide 13
Siemens: Computer-Aided Detection of Breast Cancer in Mammograms
1712 Patients 6816 Images
105,000 Candidates
[ x1 , x2 , … , x117 ]
Image feature vector
Malignant
?
MLO CC MLO CC
SiemensMedical:fMRIbreastcancerdata
245 Patients:
36% Cancer
414 Patients:
1% Cancer
1027 Patients
0% Cancer
18 Patients:
85% Cancer
Model
score
Log of Patient ID
Every point
is a candidate
Inessence,themostpredictivevariableisthepatientID
Data forDiagnosisfromMultiple
Sources
Target
Examples
Features
Cancer
yes
no
no
no
no
no
ModelingtheSources…
Target
Examples
Features
Source Cancer
1 yes
2 no
1 no
1 no
4 no
3 no
DigitalAdvertising
OnlineDisplayAdvertising
Do peoplebuystuff afterseeingan ad?
Datacollectionforpost-viewpurchase
conversion
Time
Cohort of
random
prospects
?
Data ForAdvertising
Target
Examples
Features
PV Buy
no
no
no
no
yes
yes
Multi-ArmedBandit:
Explorationvs.exploitation
 Show some random ads to learn a good model
 Tradeoff between learning and using
SizeoftheTrainingSample?
Target
Examples
Features
Buy
no
no
no
no
yes
yes
VeryfewLuxury carsareboughonline
Maserati $128,0000
$128,0000
RealityofOnlinePurchases
Target
Examples
Features
Buy
no
no
no
no
no
yes
OnlineDisplayAdvertising
Proxyfor purchase?How about click?
Click?
yes
yes
no
no
yes
no
OptimizingClicksinAdvertising?
ClickOptimization:Fumblingin theDark
Top 10 Apps by CTR
How BigData andOptimizationis
killingMetrics
 90% of clicks are ‘accidental/non intentional’
 10% are meaningful, and changes can be
measures
 Optimization can find structure in the other
90%
 You will end up with only non-intentional …
OnlineDisplayAdvertising
Whocaresabout thead anyway?
PredictOtherindicators:searchor
brandsitevisit/scheduletestdrive
Target
Examples
Features
Site Visit
no
no
no
yes
yes
yes
AdvertisingFraud
Istherereallyapersonontheother
endwantingtoseethesite?
Data forFraudDetection
Target
Examples
Features
Human?
yes
no
no
yes
yes
no
Tellingthedifferencebetweenan
algorithmandahuman
Turing test KAPTCHA
Bot traffic networks
OnlineDisplayAdvertising
Whoshouldyoureallyadvertiseto???
Data forAdvertisingImpact
Target
Examples
Features
Impact
1
0.3
0.5
0
0
0.1
AlternativeHistories(Counterfactual)
FundamentallyImpossible!
Target
Examples
Features
Impact
1
0.3
0.5
0
0
0.1
Buildtwoseparatemodelsand
calculateimpactas thedifference
Site Visit
yes
no
no
yes
no
no
Site Visit
yes
no
no
yes
no
no
Examples1
seenad
Examples2
notseenad
ExpectedImpact:
p(SV|Ad)-p(SV|noad)
Usepredictivemodelstomeasureimpact
Negative Test: wrong ad
Positive Test: A/B comparison
Relationshipoforganicconversionrateand
causalimpact
-0.001000
0.000000
0.001000
0.002000
0.003000
0.004000
0.005000
0.006000
0.40% 0.50% 0.60% 0.70% 0.80% 0.90% 1.00% 1.10% 1.20% 1.30% 1.40%
Organic conversion propensity
Additivecasualimpact
AudiencesinVideoAdvertising
Pleasingtheadvertisingoracle…
 Audience reports from
matched populations in
Facebook
 68% of the ads where shown
to females
 Makeup for 32% of ads
The Oracle
Data forAudienceOptimization
Target
Examples
Features
Gender
male
female
female
male
male
female
WeightedLogisticRegressionon
aggregated
Target
Examples
Features
Weight Gender
0.32 male
0.68 female
0.32 male
0.68 female
0.73 male
0.27 female
HyperlocalTargeting?
 Foursquare locations: very noisy…
Data forLocationReliabilityinAuction
Target
Examples
Features
Reliable?
yes
no
no
yes
yes
no
30%smartphoneuserstravelfaster
thanspeedof sound…
Catalan traditions
pop up everywhere….
Data forLocationReliabilityinAuction
Target
Examples
Features
Reliable?
maybe
no
no
maybe
maybe
no
Paradox of BigData:
“Youneverhave thedatayouwant”
Art of making due with second best
Allamatterhowcreativeyouareatcheating….

Más contenido relacionado

Más de MLconf

Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Más de MLconf (20)

Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and...
 
Niels Bantilan - Augmenting Mental Health Care in the Digital Age: Machine Le...
Niels Bantilan - Augmenting Mental Health Care in the Digital Age: Machine Le...Niels Bantilan - Augmenting Mental Health Care in the Digital Age: Machine Le...
Niels Bantilan - Augmenting Mental Health Care in the Digital Age: Machine Le...
 
LeAnna Kent - Using Network Analysis to Detect Kickback Schemes Among Medical...
LeAnna Kent - Using Network Analysis to Detect Kickback Schemes Among Medical...LeAnna Kent - Using Network Analysis to Detect Kickback Schemes Among Medical...
LeAnna Kent - Using Network Analysis to Detect Kickback Schemes Among Medical...
 
Liliana Cruz Lopez - Deep Reinforcement Learning based Insulin Controller for...
Liliana Cruz Lopez - Deep Reinforcement Learning based Insulin Controller for...Liliana Cruz Lopez - Deep Reinforcement Learning based Insulin Controller for...
Liliana Cruz Lopez - Deep Reinforcement Learning based Insulin Controller for...
 
Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection
Nitin sharma - Deep Learning Applications to Online Payment Fraud DetectionNitin sharma - Deep Learning Applications to Online Payment Fraud Detection
Nitin sharma - Deep Learning Applications to Online Payment Fraud Detection
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Claudia Perlich, Chief Scientist, Dstillery at MLconf NYC