SlideShare a Scribd company logo
1 of 28
Beyond Collaborative
Filtering: ML &
Recommendations at
Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola
My Background
● Machine Learning Engineer
● At Meetup since May 2012
● BS Computer Science
○ Information Retrieval
○ Data Mining
○ Math
■ Linear Algebra
■ Graph Theory
Meetup what are you
Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and >125 million rsvps by >17 million
members
● ~3 million rsvps in the last 30 days
○ >1/second
Tools at Meetup
● Hive - SQL on Hadoop
● Spark - Distributed Scala on Hadoop cluster
● Scala - Recommendations service
● R - Data analysis, Model building
● Python - Scripting, Data organizing
● Java - Backend of our web stack
“Everything is a recommendation”
Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this
Weaknesses of CF
● Sparsity
● Cold Start
● Coverage
● Diversity
Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Groups/person
○ Membership: 0.001%
○ Compare to Netflix: 1%
Cleaning data
● Schenectady
● Beverly Hills
● Fake RSVP boosts (+100 guests!)
● Rsvp hogs
Real data is gross
● Preprocessing is critical!
○ missing data
○ outliers
○ log scale
○ bucketing
○ sampling bias
Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup group/Didn’t join Meetup
group
Ranking
● Membership << expected error rate
○ Sample to 50/50 join/no-join
● Model output label no longer explicitly true
● Use a classifier that gives you a useful
output
Topic Match
State Match
Ensemble Learning
“... use multiple learning algorithms to obtain
better predictive performance than could be
obtained from any of the constituent learning
algorithms”
Ensemble Learning
● Collaborative Filtering on Topics
● Other simple features
Logistic Regression Output
● RsvpScore 0.02
● FbFriends 2.02
● 2ndFbFriends 0.09
● AgeUnmatch -2.40
● GenUnmatch -3.37
● Distance -0.04
● StateMatch 0.54
● CountyMatch 0.41
● ZipScore 0.06
● TopicScore 4.14
● ExtendedTS 0.47
● RelatedTS 0.66
● FbLikeTS 0.78
Facebook Likes
● Lots of information, but how to use?
● Map to topics, let training the model take
care of the rest!
● Bonus: Recommendations server knows
topics, generated topics can be passed in by
request
Mapping FB Likes to Meetup Topics
● Text based?
○ Go(game) vs Go(lang)?
○ Burton?
● Data approach!
○ Grab most popular topics across all members with
the same like
Normalization
● Top topics for Burton-Likers
○ Meeting New People, Coffee, bla bla
○ Most popular still dominates
● Normalize based on expected topic
occurrence in sample
Normalization
● For members with a given Like
● Compare percent with each topic to
expected among total population
● Burton:
○ 20% “Meeting New People”
○ 9% “Snowboarding”
● Total population
○ 20% “Meeting New People”
○ 2% “Snowboarding
Processing
● Load FB Like connections, topics into
Hadoop
● Process with Hive to generate top topics for
each like
● Join with member likes to generate top
topics per member
● Add feature to model using FB-Like-
Generated-Topics crossover with groups...
Results
● Positive weight
○ Very good sign
○ Captured information about member identity, not just
behavior
● Deploy/Split test
○ 1.5% lift in conversion overall
○ (Only have facebook data for ~10% of members)
Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/

More Related Content

Similar to Estola meetup big_datacampla_6_14_evan_estola

Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892
mercedes calderon
 
How we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptxHow we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptx
jficwwc
 

Similar to Estola meetup big_datacampla_6_14_evan_estola (20)

Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892Parismlmeetupfinalslides 151209190037-lva1-app6892
Parismlmeetupfinalslides 151209190037-lva1-app6892
 
Customer segmentation scbcn17
Customer segmentation scbcn17Customer segmentation scbcn17
Customer segmentation scbcn17
 
Life in CSE.pptx
Life in CSE.pptxLife in CSE.pptx
Life in CSE.pptx
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career Guidance
 
Be Part of a Community
Be Part of a CommunityBe Part of a Community
Be Part of a Community
 
Cepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptxCepstrum Placement Talk 2022.pptx
Cepstrum Placement Talk 2022.pptx
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
Graph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesGraph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph frames
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...Dataiku   hadoop summit - semi-supervised learning with hadoop for understand...
Dataiku hadoop summit - semi-supervised learning with hadoop for understand...
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 
Better programmer overview
Better programmer   overviewBetter programmer   overview
Better programmer overview
 
Overcome the Reign of Chaos
Overcome the Reign of ChaosOvercome the Reign of Chaos
Overcome the Reign of Chaos
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
Codemotion 2017 - "Dime cómo manejas tus datos y te diré qué clase de base de...
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with Dialogues
 
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
Quora ML Workshop: Maintaining High Quality User-Generated Content through Ma...
 
Recommending the world's knowledge
Recommending the world's knowledgeRecommending the world's knowledge
Recommending the world's knowledge
 
How we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptxHow we won the 2nd place In KaggleDays Championship.pptx
How we won the 2nd place In KaggleDays Championship.pptx
 

More from Data Con LA

Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 

Estola meetup big_datacampla_6_14_evan_estola

  • 1. Beyond Collaborative Filtering: ML & Recommendations at Meetup Evan Estola Meetup.com evan@meetup.com @estola
  • 2. My Background ● Machine Learning Engineer ● At Meetup since May 2012 ● BS Computer Science ○ Information Retrieval ○ Data Mining ○ Math ■ Linear Algebra ■ Graph Theory
  • 4. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and >125 million rsvps by >17 million members ● ~3 million rsvps in the last 30 days ○ >1/second
  • 5.
  • 6. Tools at Meetup ● Hive - SQL on Hadoop ● Spark - Distributed Scala on Hadoop cluster ● Scala - Recommendations service ● R - Data analysis, Model building ● Python - Scripting, Data organizing ● Java - Backend of our web stack
  • 7. “Everything is a recommendation”
  • 8.
  • 9. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  • 10. Weaknesses of CF ● Sparsity ● Cold Start ● Coverage ● Diversity
  • 11. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Groups/person ○ Membership: 0.001% ○ Compare to Netflix: 1%
  • 12. Cleaning data ● Schenectady ● Beverly Hills ● Fake RSVP boosts (+100 guests!) ● Rsvp hogs
  • 13. Real data is gross ● Preprocessing is critical! ○ missing data ○ outliers ○ log scale ○ bucketing ○ sampling bias
  • 14. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup group/Didn’t join Meetup group
  • 15. Ranking ● Membership << expected error rate ○ Sample to 50/50 join/no-join ● Model output label no longer explicitly true ● Use a classifier that gives you a useful output
  • 16.
  • 19. Ensemble Learning “... use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms”
  • 20. Ensemble Learning ● Collaborative Filtering on Topics ● Other simple features
  • 21. Logistic Regression Output ● RsvpScore 0.02 ● FbFriends 2.02 ● 2ndFbFriends 0.09 ● AgeUnmatch -2.40 ● GenUnmatch -3.37 ● Distance -0.04 ● StateMatch 0.54 ● CountyMatch 0.41 ● ZipScore 0.06 ● TopicScore 4.14 ● ExtendedTS 0.47 ● RelatedTS 0.66 ● FbLikeTS 0.78
  • 22. Facebook Likes ● Lots of information, but how to use? ● Map to topics, let training the model take care of the rest! ● Bonus: Recommendations server knows topics, generated topics can be passed in by request
  • 23. Mapping FB Likes to Meetup Topics ● Text based? ○ Go(game) vs Go(lang)? ○ Burton? ● Data approach! ○ Grab most popular topics across all members with the same like
  • 24. Normalization ● Top topics for Burton-Likers ○ Meeting New People, Coffee, bla bla ○ Most popular still dominates ● Normalize based on expected topic occurrence in sample
  • 25. Normalization ● For members with a given Like ● Compare percent with each topic to expected among total population ● Burton: ○ 20% “Meeting New People” ○ 9% “Snowboarding” ● Total population ○ 20% “Meeting New People” ○ 2% “Snowboarding
  • 26. Processing ● Load FB Like connections, topics into Hadoop ● Process with Hive to generate top topics for each like ● Join with member likes to generate top topics per member ● Add feature to model using FB-Like- Generated-Topics crossover with groups...
  • 27. Results ● Positive weight ○ Very good sign ○ Captured information about member identity, not just behavior ● Deploy/Split test ○ 1.5% lift in conversion overall ○ (Only have facebook data for ~10% of members)
  • 28. Thanks! Smart people come work with me. http://www.meetup.com/jobs/