SlideShare una empresa de Scribd logo
1 de 21
The Big Data Challenges of
Computational Market Research
Frank Smadja
frank.smadja@toluna.com (@FrankieMbaye)
EVP Engineering
Toluna
April 2014
Toluna
Table of Content
1. What is a Market Research study
2. The main challenge: Targeting.
3. Machine Learning Problem and Model
4. Some Experiments
5. Current and Future Work
Toluna
What is a Market Research Study?
Toluna
Market Research Goal:
Answering Questions for Brands
Customer/Employee Satisfaction:
• Are my customers happy?
• What can I do better for them?
• Am I getting better or worse?
Concept testing:
• Would dog owners buy my organic dog food?
• What should be my target market?
Ad testing:
• Is my advertising campaign effective?
Brand positioning:
• How is my brand doing compared to the competition?
• What are my perceived strong features?
• Where should I invest more?
And many more types of questions
Toluna
Output Example : ‘Positioning survey’ for
Hilton Garden Inn.
Toluna
Output Example : ‘Positioning survey’ for
Hilton Garden Inn.
Toluna
Example : Positioning survey for Beyonce
Toluna
Example : Positioning survey for Beyonce
Toluna
Market Research Main Challenge: Targeting
Select segment of respondents (sample) that is:
• Relevant to the question (dog owners who have one big dog
and one small dog, smokers who are trying to stop, etc.)
• Representative and balanced (not biased).
The tougher/restrictive the targeting, the more expensive the
study.
Toluna
The Targeting Pipeline and Incidence Rate
Demographics Behavioral Study
Select the right
population based on
simple demographic
attributes: Age,
Gender, Region,
Ethnicity, Income,
etc.
Further select based
on behavioral and
custom attributes:
fly more than 5 times
a year, uses aspirin
on a daily basis, etc.
Fixed set of
attributes known
beforehand
Free style attributes,
usually unknown.
Incidence Rate:
IR = Completes / Starts
Cost is a growing function of IR
Targeting process
Start
Complete
Toluna
Why is targeting hard?
Looking for 1,000 people in the UK who
“smoke,” “tried to stop in the past,”
“live around London,” “age 24-50.”
Data on UK population:
• 18% of the UK adults smoke
• 40% of smokers tried to stop
• 15% of the population is in the
London area
• 30% is between 24-50
Incidence rate:
0.18 * 0.4 *.15 * .3 = 0.3 %
Sample size: 333,333 UK
London
Adults
Smokers
Tried to
stop
Toluna
State of the Art: Use Known Demographic
Features
• Basic Demographics are known: 100% incidence.
o Age and London
• Smokers: 18%
• Tried to stop: 40%
Incidence rate:
1 * 0.18 * 0.4 = 9 %
Sample size: 11,000
Adults in the London
Area
Smokers
Tried to
stop
Toluna
New Direction: Use Known Features and
Predict Unknown Features
• Basic Demographics are known:
100% incidence.
o Age and London
• If we could predict smokers with 85%
accuracy.
• Tried to stop still unknown: 40%
Incidence rate:
1 * 0.85 * 0.4 = 34 %
Sample size: 2,900
Adults in the London
Area who are predicted
to be smokers
Tried to
stop
Smokers
Toluna
How to Predict Features?
The Space Model
Users
Features
Shirt color
Red Blue
Smokes?
Yes No
Sex, Age, Region, etc.
User 1
User 2
User 3
User 4
10^^9 users
10^^7 features
Sparse Matrix containing all the attributes (integer answers to
questions) we have ever asked.
Demographic
attributes
Behavioral attributes
Toluna
The Learning Task - The Model
Try to predict answer to the “Smokes?” attribute based on other
attributes.
Smokes? Dog owner? Jogger? Overweight?
Toluna
The Learning Task - Collaborative Filtering
User correlation or Feature correlation
User correlation: High level features [William Cohen]
• If Josie and Bob both have the X feature then if Josie has the
Y feature, Bob is more likely to have the Y feature as well.
• Dog owners
• Political inclination, Taste, Lifestyle
Feature correlation:
• If Josie has the X feature, Josie is more likely to also have the
Y feature.
• Joggers (y) and Smokers (n)
• Favorite sports and Race/Ethnicity
• Income level and Education level
Toluna
Smaller Task: Complete missing data on a single survey for a single
customer.
Example: On a specific survey, some respondents skip some questions on
income, some other skip the income level question. Use answers
provided by other respondents to impute the missing data.
Imputation: Complete missing data with substituted values with more or
less sophistication. Mean, Nearest neighbor, Multiple Imputation, etc.
[Andridge & Little 2011], [Rubin 1987], ...
Implementation: IBM, SPSS Missing Values module. Uses an iterative
Markov Chain Monte Carlo (MCMC) and multiple imputation.
Used by the US Census bureau.
First Experiments with Multiple Imputation
Toluna
First Experiments with Multiple Imputation
Some Results
Where it does not work:
• Too much missing data (over 10%)
• Too many possible answers (what is the name of your
children? what is your home city, etc.)
• Not enough data overall (less than 1,000)
Example of features that work well:
Dog owners, Smokers, Income level, Age (3 bands), etc.
Accuracy: 85% using blind tests.
Toluna
Current Work
Currently working on the storing component in
AWS using Hbase, Elastic search and Hadoop.
Some queries:
• Find people who Smoke, Have a red shirt and are
between 22 and 34.
• Compute and store the similarity or correlation
between any two pair of users.
• Compute and store the similarity between
features.
Toluna
Future Work
• Define model: binary features (smokes), Integer
(number of children, income), Strings (city,
diseases, etc.).
• Experiment on a large scale with Collaborative
Filtering algorithm and others.
• Experiment with user based and feature based
filtering (blend?, Slope-One?)
• Integrate this into Targeting methodology
Toluna
Q&A
Suggestions?
Ideas?
Comments?
Questions?

Más contenido relacionado

Similar a Big data market research

Toluna & RAPP - Using DIY Research to Win New Business Pitches
Toluna & RAPP - Using DIY Research to Win New Business PitchesToluna & RAPP - Using DIY Research to Win New Business Pitches
Toluna & RAPP - Using DIY Research to Win New Business PitchesMark Simon
 
Tfi2 doc-market research1
Tfi2 doc-market research1Tfi2 doc-market research1
Tfi2 doc-market research1littlebird125
 
Poynter lesson 2
Poynter lesson 2Poynter lesson 2
Poynter lesson 2Ray Poynter
 
7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk
7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk
7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkkyeasmin75648
 
Six crucial survey concepts that UX professionals need to know
Six crucial survey concepts that UX professionals need to knowSix crucial survey concepts that UX professionals need to know
Six crucial survey concepts that UX professionals need to knowCaroline Jarrett
 
Institute for public relations summit on measurement class measurement 201
Institute for public relations summit on measurement class  measurement 201Institute for public relations summit on measurement class  measurement 201
Institute for public relations summit on measurement class measurement 201David Geddes
 
Mr course module 02 b
Mr course module 02 bMr course module 02 b
Mr course module 02 bMROC Japan
 
The latte levy; Why environmental policy requires theory in design and the pu...
The latte levy; Why environmental policy requires theory in design and the pu...The latte levy; Why environmental policy requires theory in design and the pu...
The latte levy; Why environmental policy requires theory in design and the pu...Peter King
 
Basic ideas in sampling 01.10.2022.pdf
Basic ideas in sampling 01.10.2022.pdfBasic ideas in sampling 01.10.2022.pdf
Basic ideas in sampling 01.10.2022.pdfBimsaraWijayarathne2
 
Oeb presentation what we learned from our leader mooc pilot
Oeb presentation what we learned from our leader mooc pilotOeb presentation what we learned from our leader mooc pilot
Oeb presentation what we learned from our leader mooc pilotBert De Coutere
 
Making the most of the media. Small charities communications conference, 23 S...
Making the most of the media. Small charities communications conference, 23 S...Making the most of the media. Small charities communications conference, 23 S...
Making the most of the media. Small charities communications conference, 23 S...CharityComms
 
Poynter Lesson 13 - More Quantitative Market Research
Poynter Lesson 13 - More Quantitative Market ResearchPoynter Lesson 13 - More Quantitative Market Research
Poynter Lesson 13 - More Quantitative Market ResearchRay Poynter
 
Research Methods in Marketing
Research Methods in MarketingResearch Methods in Marketing
Research Methods in MarketingVartika Kundu
 
London data and digital masterclass for councillors slides 14-Feb-20
London data and digital masterclass for councillors slides 14-Feb-20London data and digital masterclass for councillors slides 14-Feb-20
London data and digital masterclass for councillors slides 14-Feb-20LG Inform Plus
 
Presentation 4 consult, c insight and comm-safety
Presentation 4   consult, c insight and comm-safetyPresentation 4   consult, c insight and comm-safety
Presentation 4 consult, c insight and comm-safetyCambridgeshireInsight
 
Session 4 - Why numbers matter in everyday life
Session 4 - Why numbers matter in everyday lifeSession 4 - Why numbers matter in everyday life
Session 4 - Why numbers matter in everyday lifedavidjwilkins
 

Similar a Big data market research (20)

Sampling in Market Research
Sampling in Market ResearchSampling in Market Research
Sampling in Market Research
 
Toluna & RAPP - Using DIY Research to Win New Business Pitches
Toluna & RAPP - Using DIY Research to Win New Business PitchesToluna & RAPP - Using DIY Research to Win New Business Pitches
Toluna & RAPP - Using DIY Research to Win New Business Pitches
 
Tfi2 doc-market research1
Tfi2 doc-market research1Tfi2 doc-market research1
Tfi2 doc-market research1
 
Poynter lesson 2
Poynter lesson 2Poynter lesson 2
Poynter lesson 2
 
7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk
7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk
7jjjjjjjjjjjjjvcxzffghjknbvfhjknbvcduukkk
 
Six crucial survey concepts that UX professionals need to know
Six crucial survey concepts that UX professionals need to knowSix crucial survey concepts that UX professionals need to know
Six crucial survey concepts that UX professionals need to know
 
Institute for public relations summit on measurement class measurement 201
Institute for public relations summit on measurement class  measurement 201Institute for public relations summit on measurement class  measurement 201
Institute for public relations summit on measurement class measurement 201
 
Mr course module 02 b
Mr course module 02 bMr course module 02 b
Mr course module 02 b
 
Marketing research
Marketing researchMarketing research
Marketing research
 
The latte levy; Why environmental policy requires theory in design and the pu...
The latte levy; Why environmental policy requires theory in design and the pu...The latte levy; Why environmental policy requires theory in design and the pu...
The latte levy; Why environmental policy requires theory in design and the pu...
 
Basic ideas in sampling 01.10.2022.pdf
Basic ideas in sampling 01.10.2022.pdfBasic ideas in sampling 01.10.2022.pdf
Basic ideas in sampling 01.10.2022.pdf
 
Oeb presentation what we learned from our leader mooc pilot
Oeb presentation what we learned from our leader mooc pilotOeb presentation what we learned from our leader mooc pilot
Oeb presentation what we learned from our leader mooc pilot
 
Making the most of the media. Small charities communications conference, 23 S...
Making the most of the media. Small charities communications conference, 23 S...Making the most of the media. Small charities communications conference, 23 S...
Making the most of the media. Small charities communications conference, 23 S...
 
Better UX Surveys part 1
Better UX Surveys part 1Better UX Surveys part 1
Better UX Surveys part 1
 
Poynter Lesson 13 - More Quantitative Market Research
Poynter Lesson 13 - More Quantitative Market ResearchPoynter Lesson 13 - More Quantitative Market Research
Poynter Lesson 13 - More Quantitative Market Research
 
Research Methods in Marketing
Research Methods in MarketingResearch Methods in Marketing
Research Methods in Marketing
 
Presentation4 feb 15
Presentation4 feb 15Presentation4 feb 15
Presentation4 feb 15
 
London data and digital masterclass for councillors slides 14-Feb-20
London data and digital masterclass for councillors slides 14-Feb-20London data and digital masterclass for councillors slides 14-Feb-20
London data and digital masterclass for councillors slides 14-Feb-20
 
Presentation 4 consult, c insight and comm-safety
Presentation 4   consult, c insight and comm-safetyPresentation 4   consult, c insight and comm-safety
Presentation 4 consult, c insight and comm-safety
 
Session 4 - Why numbers matter in everyday life
Session 4 - Why numbers matter in everyday lifeSession 4 - Why numbers matter in everyday life
Session 4 - Why numbers matter in everyday life
 

Más de Frank Smadja

Webcentives09 Smadja - Short Paper
Webcentives09 Smadja  - Short PaperWebcentives09 Smadja  - Short Paper
Webcentives09 Smadja - Short PaperFrank Smadja
 
Webcentives Smadja toluna - Slideshow
Webcentives Smadja toluna - SlideshowWebcentives Smadja toluna - Slideshow
Webcentives Smadja toluna - SlideshowFrank Smadja
 
RawSugar Faceted Search
RawSugar Faceted SearchRawSugar Faceted Search
RawSugar Faceted SearchFrank Smadja
 
RawSugar Technology Components
RawSugar Technology ComponentsRawSugar Technology Components
RawSugar Technology ComponentsFrank Smadja
 

Más de Frank Smadja (6)

Webcentives09 Smadja - Short Paper
Webcentives09 Smadja  - Short PaperWebcentives09 Smadja  - Short Paper
Webcentives09 Smadja - Short Paper
 
Webcentives Smadja toluna - Slideshow
Webcentives Smadja toluna - SlideshowWebcentives Smadja toluna - Slideshow
Webcentives Smadja toluna - Slideshow
 
RawSugar Faceted Search
RawSugar Faceted SearchRawSugar Faceted Search
RawSugar Faceted Search
 
RawSugar Technology Components
RawSugar Technology ComponentsRawSugar Technology Components
RawSugar Technology Components
 
Orcas
OrcasOrcas
Orcas
 
Hamazon
HamazonHamazon
Hamazon
 

Último

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 

Último (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 

Big data market research

  • 1. The Big Data Challenges of Computational Market Research Frank Smadja frank.smadja@toluna.com (@FrankieMbaye) EVP Engineering Toluna April 2014
  • 2. Toluna Table of Content 1. What is a Market Research study 2. The main challenge: Targeting. 3. Machine Learning Problem and Model 4. Some Experiments 5. Current and Future Work
  • 3. Toluna What is a Market Research Study?
  • 4. Toluna Market Research Goal: Answering Questions for Brands Customer/Employee Satisfaction: • Are my customers happy? • What can I do better for them? • Am I getting better or worse? Concept testing: • Would dog owners buy my organic dog food? • What should be my target market? Ad testing: • Is my advertising campaign effective? Brand positioning: • How is my brand doing compared to the competition? • What are my perceived strong features? • Where should I invest more? And many more types of questions
  • 5. Toluna Output Example : ‘Positioning survey’ for Hilton Garden Inn.
  • 6. Toluna Output Example : ‘Positioning survey’ for Hilton Garden Inn.
  • 7. Toluna Example : Positioning survey for Beyonce
  • 8. Toluna Example : Positioning survey for Beyonce
  • 9. Toluna Market Research Main Challenge: Targeting Select segment of respondents (sample) that is: • Relevant to the question (dog owners who have one big dog and one small dog, smokers who are trying to stop, etc.) • Representative and balanced (not biased). The tougher/restrictive the targeting, the more expensive the study.
  • 10. Toluna The Targeting Pipeline and Incidence Rate Demographics Behavioral Study Select the right population based on simple demographic attributes: Age, Gender, Region, Ethnicity, Income, etc. Further select based on behavioral and custom attributes: fly more than 5 times a year, uses aspirin on a daily basis, etc. Fixed set of attributes known beforehand Free style attributes, usually unknown. Incidence Rate: IR = Completes / Starts Cost is a growing function of IR Targeting process Start Complete
  • 11. Toluna Why is targeting hard? Looking for 1,000 people in the UK who “smoke,” “tried to stop in the past,” “live around London,” “age 24-50.” Data on UK population: • 18% of the UK adults smoke • 40% of smokers tried to stop • 15% of the population is in the London area • 30% is between 24-50 Incidence rate: 0.18 * 0.4 *.15 * .3 = 0.3 % Sample size: 333,333 UK London Adults Smokers Tried to stop
  • 12. Toluna State of the Art: Use Known Demographic Features • Basic Demographics are known: 100% incidence. o Age and London • Smokers: 18% • Tried to stop: 40% Incidence rate: 1 * 0.18 * 0.4 = 9 % Sample size: 11,000 Adults in the London Area Smokers Tried to stop
  • 13. Toluna New Direction: Use Known Features and Predict Unknown Features • Basic Demographics are known: 100% incidence. o Age and London • If we could predict smokers with 85% accuracy. • Tried to stop still unknown: 40% Incidence rate: 1 * 0.85 * 0.4 = 34 % Sample size: 2,900 Adults in the London Area who are predicted to be smokers Tried to stop Smokers
  • 14. Toluna How to Predict Features? The Space Model Users Features Shirt color Red Blue Smokes? Yes No Sex, Age, Region, etc. User 1 User 2 User 3 User 4 10^^9 users 10^^7 features Sparse Matrix containing all the attributes (integer answers to questions) we have ever asked. Demographic attributes Behavioral attributes
  • 15. Toluna The Learning Task - The Model Try to predict answer to the “Smokes?” attribute based on other attributes. Smokes? Dog owner? Jogger? Overweight?
  • 16. Toluna The Learning Task - Collaborative Filtering User correlation or Feature correlation User correlation: High level features [William Cohen] • If Josie and Bob both have the X feature then if Josie has the Y feature, Bob is more likely to have the Y feature as well. • Dog owners • Political inclination, Taste, Lifestyle Feature correlation: • If Josie has the X feature, Josie is more likely to also have the Y feature. • Joggers (y) and Smokers (n) • Favorite sports and Race/Ethnicity • Income level and Education level
  • 17. Toluna Smaller Task: Complete missing data on a single survey for a single customer. Example: On a specific survey, some respondents skip some questions on income, some other skip the income level question. Use answers provided by other respondents to impute the missing data. Imputation: Complete missing data with substituted values with more or less sophistication. Mean, Nearest neighbor, Multiple Imputation, etc. [Andridge & Little 2011], [Rubin 1987], ... Implementation: IBM, SPSS Missing Values module. Uses an iterative Markov Chain Monte Carlo (MCMC) and multiple imputation. Used by the US Census bureau. First Experiments with Multiple Imputation
  • 18. Toluna First Experiments with Multiple Imputation Some Results Where it does not work: • Too much missing data (over 10%) • Too many possible answers (what is the name of your children? what is your home city, etc.) • Not enough data overall (less than 1,000) Example of features that work well: Dog owners, Smokers, Income level, Age (3 bands), etc. Accuracy: 85% using blind tests.
  • 19. Toluna Current Work Currently working on the storing component in AWS using Hbase, Elastic search and Hadoop. Some queries: • Find people who Smoke, Have a red shirt and are between 22 and 34. • Compute and store the similarity or correlation between any two pair of users. • Compute and store the similarity between features.
  • 20. Toluna Future Work • Define model: binary features (smokes), Integer (number of children, income), Strings (city, diseases, etc.). • Experiment on a large scale with Collaborative Filtering algorithm and others. • Experiment with user based and feature based filtering (blend?, Slope-One?) • Integrate this into Targeting methodology