SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Jobin Wilson
                         jobin.wilson@flytxt.com


Copyright © 2011 Flytxt B.V. All rights reserved.   9/13/2011
Who am I ?
  • Architect @ Flytxt (Big Data Analytics & Automation)

  • Passionate about data, distributed computing , machine learning

  • Previously

       •Virtualization & Cloud Lifecycle Management(BMC)

             • Designed and Implemented Cloud Life Cycle Management Interface@BMC

       • Large Scale Data Centre Automation(AOL)

             • Implemented Centralized Data Center Management Framework for AOL

       •Workflow Systems & Automation (Accenture)

             • Implemented Service Management Suit for various customers
Session Agenda!

• Recommendation Engines – What's the big deal?

• Conceptual Overview

• Collaborative Filtering

• Engineering Challenges

• Apache Mahout

• Getting your recommender to production

• Q&A




                                                  3
What's the big deal?
Ooh Ads too!
Big deal?                                   Advertisers




                           Recommend Best Ads
                  Ads

                Content

  Users
                                                   Ad
                                                   Network



            Content Publishers
                                         ML Algorithms
                                        User Behavior Modelling
                                        Maximization Criteria
BTW, What was the challenge?
User Base : 2 billion+ users world wide

Content Base : 12.51 billion+ indexed pages

Advertiser Base : millions of active advertisers

Real-time nature : Responses in < 200 ms

Multi –objective optimization problem

Noisy Data
Recommendation Engines: Overview
 A specific type of information filtering system
 technique that attempts to recommend information
 items or social elements that are likely to be of interest
 to the user.

 Technologies that can help us sift through all the
 available information to predict products or services
 that could be interesting to us.

 Applying knowledge discovery techniques to the
 problem of making personalized recommendations for
 information, products or services, usually during a live
 interaction.
We need a crystal ball to predict ?
  We all have opinions/tastes which we express as our likes or dislikes.

  Our tastes follow some patterns.

  We tend to like things which are similar to things which we already
  like(e.g. Songs)

  We tend to like things which are liked by people who are similar to
  us(e.g. Movies)

  From fancy research to mainstream
Collaborative Filtering
 Problem : We have U users and I items in the system, a user Uk need to
 be recommended with a set of m items which are yet un-picked by him
 which he might be interested in picking up.

 Solution :

 Maintain a database of users’ ratings of a variety of items.

 For a given user, find other similar users whose ratings strongly
 correlate with the current user - User Neighborhood

 Recommend items rated highly by these similar users, but not rated by
 the current user.

 E.g. Amazon, Filpkart etc
Utility Matrix
 Matrix of values representing each user’s level of affinity to each item.
 Sparse matrix

 Recommendation engine needs to predict the values for the empty cells
 based on available cell values

 Denser the matrix, better the quality of recommendation

 User | Item i1           i2           i3           i4           i5
 u1                       r12                       r14          r15
 u2          r21          r22                                    r25
 u3                       r32                       r34
 u4                                    r43                       r45
Engineering Challenges
 Massive Data Volume : how do I deal with TBs of raw data to build my
 recommendations?

 Hadoop and Map-Reduce shines!


 How can I make it work in ‘Real-Time’ ?

 Batch pre-compute and store in HBase could help!



 Will my solution scale? soon my user base is going to double!.

 Sure, you can make it scale!
Engineering Challenges

 Do I need a cloud based infrastructure?

 Depends!


 Hadoop compatible Machine Learning library?

 Mahout would help!


 How can I represent/transform my input data appropriately?

 Pig/Hive might help!, if not ,map-reduce is always there!
Apache Mahout Overview
 Scalable machine learning library

 core algorithms for clustering, classification and batch based
 collaborative filtering implemented over Hadoop

 Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA
 etc

 Vibrant community support.

 Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on)

 mahout-dev-subscribe@apache.org
Taking Recommendation Engines to production

 Analyzing the input data, what kind of info I can collect from users

 Selecting the appropriate recommender (e.g. user based, Item based )

 Strategy to recommend to anonymous users(or first time users)

 Strategy for distributed computing, modeling the problem as map-
 reduce

 Choosing the deployment model

 Monitoring the system
Conclusion

 Very popular field of research and implementation

 More and more products and services are leveraging the concept

 From fancy research to live production systems at scale

 Making peoples lives easier by assisting in making decisions
Some more concepts.…

 Concept of similarity – distance measure etc

 Pearson Correlation

 User neighborhood computation
THANK YOU
  Contact : jobin.wilson@flytxt.com
http://www.flytxt.com/community/




                  Copyright © 2011 Flytxt B.V. All rights reserved.   9/13/2011   18
http://www.flytxt.com/community/




               Copyright © 2011 Flytxt B.V. All rights reserved.   9/13/2011   19

Más contenido relacionado

Destacado

Data analytics driven customer experience programs
Data analytics driven customer experience programsData analytics driven customer experience programs
Data analytics driven customer experience programsFlytxt
 
Hadoop for carrier
Hadoop for carrierHadoop for carrier
Hadoop for carrierFlytxt
 
7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay DoshiFlytxt
 
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...Flytxt
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
Big data analytics and building intelligent applications
Big data analytics and building intelligent applicationsBig data analytics and building intelligent applications
Big data analytics and building intelligent applicationsFlytxt
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Flytxt
 
Transforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to JourneysTransforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to JourneysMcKinsey on Marketing & Sales
 

Destacado (9)

Data analytics driven customer experience programs
Data analytics driven customer experience programsData analytics driven customer experience programs
Data analytics driven customer experience programs
 
Hadoop for carrier
Hadoop for carrierHadoop for carrier
Hadoop for carrier
 
7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi7th prepaid mobile summit presentation by Abhay Doshi
7th prepaid mobile summit presentation by Abhay Doshi
 
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
The Omnichannel Opportunity in Digital World: Unlocking the potential of conn...
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
Big data analytics and building intelligent applications
Big data analytics and building intelligent applicationsBig data analytics and building intelligent applications
Big data analytics and building intelligent applications
 
Multichannel Customer Journeys
Multichannel Customer JourneysMultichannel Customer Journeys
Multichannel Customer Journeys
 
Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...Roadmap to realizing the value of telco data – opportunities, challenges, use...
Roadmap to realizing the value of telco data – opportunities, challenges, use...
 
Transforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to JourneysTransforming Customer Experience: From Moments to Journeys
Transforming Customer Experience: From Moments to Journeys
 

Similar a Recommendation engines matching items to users

Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxLokeshKumarReddy8
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIsBala Iyer
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
The implementation of Big Data and AI on Digital Marketing
The implementation of Big Data and AI on Digital MarketingThe implementation of Big Data and AI on Digital Marketing
The implementation of Big Data and AI on Digital MarketingMohamed Hanafy
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine LearningOgilvy Consulting
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptxprathammishra28
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf13DikshaDatir
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentationrenjan131
 
SRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
SRS2014: Towards a Scalable Recommender Engine for Online MarketplacesSRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
SRS2014: Towards a Scalable Recommender Engine for Online MarketplacesDominik Kowald
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02BIWUG
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointJoris Poelmans
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
Recommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- IntroductionRecommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- IntroductionBee-Chung Chen
 
Predictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerPredictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerRyan Withop
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1Bill Liu
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015Marianne Sweeny
 

Similar a Recommendation engines matching items to users (20)

Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Major_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptxMajor_Project_Presentaion_B14.pptx
Major_Project_Presentaion_B14.pptx
 
Liberating data power of APIs
Liberating data power of APIsLiberating data power of APIs
Liberating data power of APIs
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
The implementation of Big Data and AI on Digital Marketing
The implementation of Big Data and AI on Digital MarketingThe implementation of Big Data and AI on Digital Marketing
The implementation of Big Data and AI on Digital Marketing
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine Learning
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
 
Map Reduce amrp presentation
Map Reduce amrp presentationMap Reduce amrp presentation
Map Reduce amrp presentation
 
SRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
SRS2014: Towards a Scalable Recommender Engine for Online MarketplacesSRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
SRS2014: Towards a Scalable Recommender Engine for Online Marketplaces
 
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
Spsbepoelmanssharepointbigdataclean 150421080105-conversion-gate02
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Recommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- IntroductionRecommender Systems Tutorial (Part 1) -- Introduction
Recommender Systems Tutorial (Part 1) -- Introduction
 
Predictive Analytics: An Executive Primer
Predictive Analytics: An Executive PrimerPredictive Analytics: An Executive Primer
Predictive Analytics: An Executive Primer
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
 

Más de Flytxt

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochureFlytxt
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraFlytxt
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experienceFlytxt
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageFlytxt
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer AnalyticsFlytxt
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochureFlytxt
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingFlytxt
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and HadoopFlytxt
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopFlytxt
 
Co existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and HadoopCo existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and HadoopFlytxt
 

Más de Flytxt (12)

Flytxt corporate brochure
Flytxt corporate brochureFlytxt corporate brochure
Flytxt corporate brochure
 
Data analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital eraData analytics is a game changer for telcos in the digital era
Data analytics is a game changer for telcos in the digital era
 
Omni channel customer experience
Omni channel customer experienceOmni channel customer experience
Omni channel customer experience
 
Analytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital ageAnalytics tools drive customer experience in the digital age
Analytics tools drive customer experience in the digital age
 
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 Enhancing Connected Customer Experience through Mobile Consumer Analytics Enhancing Connected Customer Experience through Mobile Consumer Analytics
Enhancing Connected Customer Experience through Mobile Consumer Analytics
 
Flytxt: Personalizing Engagement
Flytxt: Personalizing EngagementFlytxt: Personalizing Engagement
Flytxt: Personalizing Engagement
 
Flytxt a unique success story in big data analytics
Flytxt a unique success story in big data analyticsFlytxt a unique success story in big data analytics
Flytxt a unique success story in big data analytics
 
Flytxt brochure
Flytxt brochureFlytxt brochure
Flytxt brochure
 
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile MarketingAfaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
Afaqs Reporter: Strategise, Leap & Lead with Mobile Marketing
 
Co-existence or competition - RDBMS and Hadoop
Co-existence or competition  - RDBMS and HadoopCo-existence or competition  - RDBMS and Hadoop
Co-existence or competition - RDBMS and Hadoop
 
Co existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and HadoopCo existence or Competitions? RDBMS and Hadoop
Co existence or Competitions? RDBMS and Hadoop
 
Co existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and HadoopCo existence or Competition ? - RDBMS and Hadoop
Co existence or Competition ? - RDBMS and Hadoop
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Recommendation engines matching items to users

  • 1. Jobin Wilson jobin.wilson@flytxt.com Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011
  • 2. Who am I ? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface@BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers
  • 3. Session Agenda! • Recommendation Engines – What's the big deal? • Conceptual Overview • Collaborative Filtering • Engineering Challenges • Apache Mahout • Getting your recommender to production • Q&A 3
  • 6. Big deal? Advertisers Recommend Best Ads Ads Content Users Ad Network Content Publishers ML Algorithms User Behavior Modelling Maximization Criteria
  • 7. BTW, What was the challenge? User Base : 2 billion+ users world wide Content Base : 12.51 billion+ indexed pages Advertiser Base : millions of active advertisers Real-time nature : Responses in < 200 ms Multi –objective optimization problem Noisy Data
  • 8. Recommendation Engines: Overview A specific type of information filtering system technique that attempts to recommend information items or social elements that are likely to be of interest to the user. Technologies that can help us sift through all the available information to predict products or services that could be interesting to us. Applying knowledge discovery techniques to the problem of making personalized recommendations for information, products or services, usually during a live interaction.
  • 9. We need a crystal ball to predict ? We all have opinions/tastes which we express as our likes or dislikes. Our tastes follow some patterns. We tend to like things which are similar to things which we already like(e.g. Songs) We tend to like things which are liked by people who are similar to us(e.g. Movies) From fancy research to mainstream
  • 10. Collaborative Filtering Problem : We have U users and I items in the system, a user Uk need to be recommended with a set of m items which are yet un-picked by him which he might be interested in picking up. Solution : Maintain a database of users’ ratings of a variety of items. For a given user, find other similar users whose ratings strongly correlate with the current user - User Neighborhood Recommend items rated highly by these similar users, but not rated by the current user. E.g. Amazon, Filpkart etc
  • 11. Utility Matrix Matrix of values representing each user’s level of affinity to each item. Sparse matrix Recommendation engine needs to predict the values for the empty cells based on available cell values Denser the matrix, better the quality of recommendation User | Item i1 i2 i3 i4 i5 u1 r12 r14 r15 u2 r21 r22 r25 u3 r32 r34 u4 r43 r45
  • 12. Engineering Challenges Massive Data Volume : how do I deal with TBs of raw data to build my recommendations? Hadoop and Map-Reduce shines! How can I make it work in ‘Real-Time’ ? Batch pre-compute and store in HBase could help! Will my solution scale? soon my user base is going to double!. Sure, you can make it scale!
  • 13. Engineering Challenges Do I need a cloud based infrastructure? Depends! Hadoop compatible Machine Learning library? Mahout would help! How can I represent/transform my input data appropriately? Pig/Hive might help!, if not ,map-reduce is always there!
  • 14. Apache Mahout Overview Scalable machine learning library core algorithms for clustering, classification and batch based collaborative filtering implemented over Hadoop Few popular algos: K-Means, fuzzy K-Means ,Canopy clustering ,LDA etc Vibrant community support. Used by – Adobe ,Yahoo! ,Amazon , AOL, Flytxt…. (list goes on) mahout-dev-subscribe@apache.org
  • 15. Taking Recommendation Engines to production Analyzing the input data, what kind of info I can collect from users Selecting the appropriate recommender (e.g. user based, Item based ) Strategy to recommend to anonymous users(or first time users) Strategy for distributed computing, modeling the problem as map- reduce Choosing the deployment model Monitoring the system
  • 16. Conclusion Very popular field of research and implementation More and more products and services are leveraging the concept From fancy research to live production systems at scale Making peoples lives easier by assisting in making decisions
  • 17. Some more concepts.… Concept of similarity – distance measure etc Pearson Correlation User neighborhood computation
  • 18. THANK YOU Contact : jobin.wilson@flytxt.com http://www.flytxt.com/community/ Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011 18
  • 19. http://www.flytxt.com/community/ Copyright © 2011 Flytxt B.V. All rights reserved. 9/13/2011 19