SlideShare una empresa de Scribd logo
1 de 23
Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
Dedication
Background
Social Local Mobile Loco
Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
YP.comLogin/Registration
Login Layer A
Oauth 2 Dance
Semi-Social Search
Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
Location Real-Time Fuzzy Matcher FP0 (exact match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Append tokens FP3 (fuzzy match)     Append LISTING_NAME + ADDRESS + CITY + PHONE     Tokenize, normalize, strip punctuation, and stem     Remove tokens that are less than 2 chars long     Remove upper-case short tokens (i.e., MD, CPA, DDS, etc)     Remove non-phone, short, numerical tokens      Remove stopwords based on top 170 most occurring listing_name tokens     Order tokens alphabetically     Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value  FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang

Más contenido relacionado

Destacado

Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of Diversity
Abrazil
 
Slide 1
Slide 1Slide 1
Slide 1
izadat
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramı
c_lagan
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)
Ashleey Leong
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
Nicole Allen
 
Classroom Management
Classroom ManagementClassroom Management
Classroom Management
Jane Wolff
 

Destacado (20)

Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...Encouraging engagement with the provision of emotional competency coaching fo...
Encouraging engagement with the provision of emotional competency coaching fo...
 
Multifacet Themes of Diversity
Multifacet Themes of DiversityMultifacet Themes of Diversity
Multifacet Themes of Diversity
 
VAICIURGIS Dominycas
VAICIURGIS DominycasVAICIURGIS Dominycas
VAICIURGIS Dominycas
 
Slide 1
Slide 1Slide 1
Slide 1
 
Things you should know before you build your site
Things you should know before you build your siteThings you should know before you build your site
Things you should know before you build your site
 
ç. Z. kuramı
ç. Z. kuramıç. Z. kuramı
ç. Z. kuramı
 
Social media updates oct (comms day)
Social media updates oct (comms day)Social media updates oct (comms day)
Social media updates oct (comms day)
 
italien presentation
italien presentationitalien presentation
italien presentation
 
Undrah
UndrahUndrah
Undrah
 
Evaluation qu's 1&2
Evaluation qu's 1&2Evaluation qu's 1&2
Evaluation qu's 1&2
 
Penn State #OERSummit16 Keynote
Penn State #OERSummit16 KeynotePenn State #OERSummit16 Keynote
Penn State #OERSummit16 Keynote
 
DIPLOMA - young artists 2016
DIPLOMA - young artists 2016DIPLOMA - young artists 2016
DIPLOMA - young artists 2016
 
2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks2014-03-18 US OER Policy Overview for #OERPolicyWorks
2014-03-18 US OER Policy Overview for #OERPolicyWorks
 
Ficha planificación espacio
Ficha planificación espacioFicha planificación espacio
Ficha planificación espacio
 
2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis2012-10-24 OER and Solving the Textbook Cost Crisis
2012-10-24 OER and Solving the Textbook Cost Crisis
 
эко урок
эко урокэко урок
эко урок
 
Classroom Management
Classroom ManagementClassroom Management
Classroom Management
 
Estrategias y tecnicas de estudio noviembre 2015
Estrategias y  tecnicas de estudio noviembre 2015Estrategias y  tecnicas de estudio noviembre 2015
Estrategias y tecnicas de estudio noviembre 2015
 
Sgp
SgpSgp
Sgp
 
George Business Consultancy Operating Model
George Business Consultancy Operating ModelGeorge Business Consultancy Operating Model
George Business Consultancy Operating Model
 

Similar a Likes and Locations - Adventure in Social Data Mining

Apache Unomi Project In-depth
Apache Unomi Project In-depthApache Unomi Project In-depth
Apache Unomi Project In-depth
Jahia Solutions Group
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
mkhinke
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
Chris Haller
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
Steve Watt
 

Similar a Likes and Locations - Adventure in Social Data Mining (20)

IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
 
Apache Unomi Project In-depth
Apache Unomi Project In-depthApache Unomi Project In-depth
Apache Unomi Project In-depth
 
Archive It Dlc Oct08
Archive It Dlc Oct08Archive It Dlc Oct08
Archive It Dlc Oct08
 
Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)Mongo at Sailthru (MongoNYC 2011)
Mongo at Sailthru (MongoNYC 2011)
 
Apache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 SessionApache Unomi In Depth - ApacheCon EU 2015 Session
Apache Unomi In Depth - ApacheCon EU 2015 Session
 
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter ResumeNational Society Of Black Engineers Carnegie Mellon University Chapter Resume
National Society Of Black Engineers Carnegie Mellon University Chapter Resume
 
Lessons Learned - Building YDN
Lessons Learned - Building YDNLessons Learned - Building YDN
Lessons Learned - Building YDN
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Online Engagement in Urban Planning
Online Engagement in Urban PlanningOnline Engagement in Urban Planning
Online Engagement in Urban Planning
 
Microsoft Flow For Developers
Microsoft Flow For DevelopersMicrosoft Flow For Developers
Microsoft Flow For Developers
 
MongoDB In Production At Sailthru
MongoDB In Production At SailthruMongoDB In Production At Sailthru
MongoDB In Production At Sailthru
 
SearchMonkey
SearchMonkeySearchMonkey
SearchMonkey
 
AD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With AnalyticsAD306 - Turbocharge Your Enterprise Social Network With Analytics
AD306 - Turbocharge Your Enterprise Social Network With Analytics
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Mining the Web for Information using Hadoop
Mining the Web for Information using HadoopMining the Web for Information using Hadoop
Mining the Web for Information using Hadoop
 
How hackers collate information about employees
How hackers collate information about employees How hackers collate information about employees
How hackers collate information about employees
 
Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009Tools, Glorious Tools - SMX West 2009
Tools, Glorious Tools - SMX West 2009
 
Scraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap ListScraping Cryptocurrency Prices & Market Cap List
Scraping Cryptocurrency Prices & Market Cap List
 
Veryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax codingVeryfi API for document data extraction (OCR) & tax coding
Veryfi API for document data extraction (OCR) & tax coding
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Likes and Locations - Adventure in Social Data Mining

  • 1. Likes and LocationsAdventure in Social Data Mining Gene Chuang – Exec Dir of Social Eng, ATTi Masahji Stewart – Founder, Synctree Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
  • 4.
  • 6. Why Mine Social and Local Data? Signals to improve user experience Timely and “Placely” Engagement Provide value – save time, save money Opt In, Privacy
  • 7. Yp.com Infrastructure Ruby on Rails for Web, Login and API Solr/Lucene for Search Hadoop for Data pipeline Hive for Ad Hoc queries on Hadoop Ruby ETL scripts
  • 8. Oauth 2 Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens Think Valet Key
  • 13.
  • 14.
  • 15. Social Mining - Extract Extract Script Pull data out of a database (like Oracle), Hive, Files, hit Facebook, or any other source and output JSON data to STDOUT: For example to get count of the total users signed up by day: $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14 {"day":"2011-02-14","count":891,"total":1328636} {"day":"2011-02-15","count":1088,"total":1329724} {"day":"2011-02-16","count":1016,"total":1330740} {"day":"2011-02-17","count":1359,"total":1332099} {"day":"2011-02-18","count":1143,"total":1333242} {"day":"2011-02-19","count":660,"total":1333902} {"day":"2011-02-20","count":597,"total":1334499} {"day":"2011-02-21","count":874,"total":1335373}
  • 16. Social Mining - Transform Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT For example, to add ypids to existing facebook likes then filter out location and ypid matching data: $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]} {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"} {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"} {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
  • 17. Social Mining - Load Load Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard) For example loading total facebook accounts by day into the web dashboard $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
  • 18.
  • 19.
  • 20. Location Real-Time Fuzzy Matcher FP0 (exact match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Append tokens FP3 (fuzzy match) Append LISTING_NAME + ADDRESS + CITY + PHONE Tokenize, normalize, strip punctuation, and stem Remove tokens that are less than 2 chars long Remove upper-case short tokens (i.e., MD, CPA, DDS, etc) Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokens Order tokens alphabetically Append tokens Example: Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710 FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
  • 21. Social Data Valid Facebook Access Tokens: 14K Total Unique Likes: 300K % Likes with Locations and/or Phones: 19% % Likes mapped to YPID: 38% Total Check-Ins: 530
  • 22. Social Mining Mother Lode Social Search Local Recommendation Engine Discovery Wall Top 10 List Social e-Commerce Online Presence Management – Social CRM
  • 23. Questions? genechuang@gmail.com http://www.twitter.com/genechuang http://www.quora.com/Gene-Chuang http://www.linkedin.com/in/genechuang