SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Coen Stevens
Lead Recommendation Engineer
How to build a recommender system?
           Wakoopa use case
Mission:
Discover software & games
Software tracker
Windows         Mac          Linux
Your profile
Updates
Software pages
Recommendations
Building a recommender system
       Approach and challenges
Data
                      what do we have?

Usage (implicit)                       Ratings (explicit)
                             vs.


•                                  •
    Noisy                              Accurate


•                                  •
    Only positive feedback             Positive and negative
                                       feedback

•                                  •
    Easy to collect                    Hard to collect
Data
                       what do we use?


•   Active users (Tracker activity in the past month): ~9.000

•   Actively used software items (in the past month): ~10.000

•   We calculate recommendations for each OS together with
    Web applications separately
Recommender system methods
Collaborative recommendations: The user will be
recommended items that people with similar tastes and
preferences liked (used) in the past

•   Item-based collaborative filtering

•   User-based collaborative filtering (we only use for
    calculating user similarities to find people like you)

•   Combining both methods
Item-Based Collaborative Filtering
           User software usage matrix
                     Software items




             220   90         180          22

             280   12    42           80

   Users     175 210          210          45

             165         35   195     13   25

                   100   50   185          35   190

                   60         65                185
User software usage matrix [0, 1]
                      Software items




              1   1      0     1       0   1   0

              1   1      1     0       1   0   0

Users         1   1      0     1       0   1   0

              1   0      1     1       1   1   0

              0   1      1     1       0   1   1

              0   1      0     1       0   0   1
How do we predict the probability that I would like to use GMail?
                              Software items




                      1   1      0     1       0   1   0

                      1   1      1     0       1   0   0

                                 ?
         Users        1   1            1       0   1   0

                      1   0      1     1       1   1   0

                      0   1      1     1       0   1   1

                      0   1      0     1       0   0   1
Calculate the similarities between Gmail and the other software items.
                                            Software items




                                1       1       0       1    0   1   0

                                1       1       1       0    1   0   0

            Users               1       1       0       1    0   1   0

                                1       0       1       1    1   1   0

                                0       1       1       1    0   1   1

                                0       1       0       1    0   0   1


                    Cosine Similarity(Firefox, Gmail)
Calculate the similarities between Gmail and the other software items.
                                            Software items




                                1       1       0       1    0   1   0

                                1       1       1       0    1   0   0

            Users               1       1       0       1    0   1   0

                                1       0       1       1    1   1   0

                                0       1       1       1    0   1   1

                                0       1       0       1    0   0   1


                    Cosine Similarity(Firefox, Gmail)
Calculate the similarities between Gmail and the other software items.
                                            Software items




                                1       1       0       1    0   1   0

                                1       1       1       0    1   0   0
                                                                         Popularity correction,
            Users               1       1       0       1    0   1   0
                                                                            we put less trust
                                1       0       1       1    1   1   0
                                                                          in popular software
                                0       1       1       1    0   1   1

                                0       1       0       1    0   0   1


                    Cosine Similarity(Firefox, Gmail)
Item-item correlation matrix



    1    0.1   0.6   0.1   0.1   0.1   0.7

   0.2   1     0.8   0.5   0.8   0.1   0.9

   0.1   0.6   1     0.5   0.7   0.2   0.3

   0.2   0.6   0.4   1     0.8   0.2   0.3

   0.5   0.4   0.4   0.4   1     0.1   0.2

   0.5   0.5   0.3   0.5   0.3   1     0.3

   0.2   0.6   0.3   0.8   0.7   0.7   1
Item-item correlation matrix
Gmail similarities




          0.6            1    0.1   0.6   0.1   0.1   0.1   0.7

          0.8           0.2   1     0.8   0.5   0.8   0.1   0.9

          0.4           0.1   0.6   1     0.5   0.7   0.2   0.3

          0.4           0.2   0.6   0.4   1     0.8   0.2   0.3

          0.3           0.5   0.4   0.4   0.4   1     0.1   0.2

          0.3           0.5   0.5   0.3   0.5   0.3   1     0.3

                        0.2   0.6   0.3   0.8   0.7   0.7   1
K-nearest neighbor approach
Gmail similarities


                     •   Performance vs quality
          0.6
                     •   We take only the ‘K’ most similar items (say 4)
          0.8

                     •   Space complexity: O(m + Kn)
          0.4


                     •
          0.4
                         Computational complexity: O(m + n²)
          0.3

          0.3
Calculate the predicted value for Gmail
Gmail similarities   User usage




                            1
          0.6

                            1
          0.8

                            1
          0.4

          0.4

                            1
Calculate the predicted value for Gmail
Gmail similarities   User usage




                           0.9
          0.6
                                        Usage correction,
                           0.8
          0.8
                                       more usage results
                                     in a higher score [0,1]
                           0.6
          0.4

          0.4

                           0.2
Calculate the predicted value for Gmail
Gmail similarities   User usage




                           0.9
          0.6

                           0.8
          0.8

                           0.6
          0.4

          0.4

                           0.2

                                          (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6)
                                                                                    = 0.82
                                                  0.6 + 0.8 + 0.4 + 0.4
Calculate the predicted value for Gmail

                                       • User feedback
Gmail similarities   User usage


                                       • Contacts usage
                           0.9
          0.6
                                       • Commercial vs Free
                           0.8
          0.8

                           0.6
          0.4

          0.4

                           0.2

                                          (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6)
                                                                                    = 0.82
                                                  0.6 + 0.8 + 0.4 + 0.4
Calculate all unknown values and
show the Top-N recommendations to each user
                    Software items




                       ?             ?
                     ?
            1   1            1           1

                  ?1??
            1 1 1

                ?1?1?
Users       1 1

              ?1111?
            1

            ?111?11
            ?1?1??1
Explainability
             Why did I get this recommendation?


•   Overlap between the item’s (K) neighbors and your usage
User-Based Collaborative Filtering
                                 Finding people like you



                                  1   1   0   1   0   1    0

                                  1   1   1   0   1   0    0

                                  1   1   0   1   0   1    0

                                  1   1   1   1   1   1    0
Cosine Similarity(Coen, Menno)


                                  0   1   1   1   0   1    1

                                  0   1   0   1   0   0    1
Applying inverse user frequency

        log(n/ni): ni is the number of users that uses item i and n is
                  the total number of users in the database


                                    0.1   0.2   0     0.4   0     0.4   0

                                    0.1   0.2   0.6   0     0.8   0     0

                                    0.1   0.2   0     0.4   0     0.4   0

                                    0.1   0.2   0.6   0.4   0.8   0.4   0
Cosine Similarity(Coen, Menno)


                                    0     0.2   0.6   0.4   0     0.4   0.2

                                    0     0.2   0     0.4   0     0     0.2

        The fact that you both use Textmate tells you more than
                       when you both use firefox
0.1   0.2   0     0.4   0     0.4   0

                                 0.1   0.2   0.6   0     0.8   0     0

                                 0.1   0.2   0     0.4   0     0.4   0

                                 0.1   0.2   0.6   0.4   0.8   0.4   0
Cosine Similarity(Coen, Menno)


                                 0     0.2   0.6   0.4   0     0.4   0.2

                                 0     0.2   0     0.4   0     0     0.2
User-user correlation matrix



     1     0.8   0.6   0.5   0.7   0.2

     0.8   1     0.4   0.7   0.5   0.5

     0.6   0.4   1     0.4   0.9   0.1

     0.5   0.8   0.4   1     0.6   0.4

     0.8   0.5   0.9   0.6   1     0.2

     0.2   0.5   0.1   0.4   0.2   1
Performance
                 measure for success

•   Cross-validation: Train-Test split (80-20)

•   Precision and Recall:
    - precision = size(hit set) / size(total given recs)
    - recall = size(hit set) / size(test set)

•   Root mean squared error (RMSE)
Implementation

•   Ruby Enterprise Edition (garbage collection)

•   MySQL database

•   Built our own c-libraries

•   Amazon EC2:
    - Low cost
    - Flexibility
    - Ease of use

•   Open source
Future challenges


•   What is the best algorithm for Wakoopa? (or you)

•   Reducing space-time complexity (scalability):
    - Parallelization (Clojure)
    - Distributed computing (Hadoop)
1 evening, 3 speakers, 100 developers
           www.recked.org

Más contenido relacionado

La actualidad más candente

Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 

La actualidad más candente (20)

Recent advances in deep recommender systems
Recent advances in deep recommender systemsRecent advances in deep recommender systems
Recent advances in deep recommender systems
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Content based filtering
Content based filteringContent based filtering
Content based filtering
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Recommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsRecommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life Applications
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 
Personalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
 

Destacado

Design of recommender systems
Design of recommender systemsDesign of recommender systems
Design of recommender systems
Rashmi Sinha
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
NYC Predictive Analytics
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Lior Rokach
 
Impact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutionsImpact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutions
sarvenaz arianfar
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commerce
Roger Chen
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
Sri Ambati
 

Destacado (20)

Design of recommender systems
Design of recommender systemsDesign of recommender systems
Design of recommender systems
 
How to build a Recommender System
How to build a Recommender SystemHow to build a Recommender System
How to build a Recommender System
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Recommender Engines Seminar Paper
Recommender Engines Seminar PaperRecommender Engines Seminar Paper
Recommender Engines Seminar Paper
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
How to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based FilteringHow to Build Recommender System with Content based Filtering
How to Build Recommender System with Content based Filtering
 
genetic algorithm based music recommender system
genetic algorithm based music recommender systemgenetic algorithm based music recommender system
genetic algorithm based music recommender system
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Wakoopa Recommendation Engine on AWS
Wakoopa Recommendation Engine on AWSWakoopa Recommendation Engine on AWS
Wakoopa Recommendation Engine on AWS
 
Impact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutionsImpact of web 2.0 on evaluation and select solutions
Impact of web 2.0 on evaluation and select solutions
 
Interaction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender SystemsInteraction Design Patterns in Recommender Systems
Interaction Design Patterns in Recommender Systems
 
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
Anyone Can Build A Recommendation Engine With Solr: Presented by Doug Turnbul...
 
Recommender Systems in E-Commerce
Recommender Systems in E-CommerceRecommender Systems in E-Commerce
Recommender Systems in E-Commerce
 
recommender_systems
recommender_systemsrecommender_systems
recommender_systems
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 

Similar a How to build a recommender system?

Oslo Schibsted Performance Gathering
Oslo Schibsted Performance GatheringOslo Schibsted Performance Gathering
Oslo Schibsted Performance Gathering
Almudena Vivanco
 
Presentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association EventPresentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association Event
Ben Cheng
 

Similar a How to build a recommender system? (20)

Open Social Tech Talk Beijing
Open Social Tech Talk   BeijingOpen Social Tech Talk   Beijing
Open Social Tech Talk Beijing
 
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
 
DevOps goes Mobile (daho.am)
DevOps goes Mobile (daho.am)DevOps goes Mobile (daho.am)
DevOps goes Mobile (daho.am)
 
Wakoopa Recommendations Engine on AWS
Wakoopa Recommendations Engine on AWSWakoopa Recommendations Engine on AWS
Wakoopa Recommendations Engine on AWS
 
Oslo Schibsted Performance Gathering
Oslo Schibsted Performance GatheringOslo Schibsted Performance Gathering
Oslo Schibsted Performance Gathering
 
Spil games konrad
Spil games konradSpil games konrad
Spil games konrad
 
Building native apps with web components
Building native apps with web componentsBuilding native apps with web components
Building native apps with web components
 
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
IoT Building Blocks: From Edge Devices to Analytics in the Cloud - SRV204 - A...
 
Keeping Swift Apps Small
Keeping Swift Apps SmallKeeping Swift Apps Small
Keeping Swift Apps Small
 
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
IoT Analytics Workshop (IOT314-R1) - AWS re:Invent 2018
 
Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018
Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018
Customer Showcase for AWS IoT Analytics (IOT219) - AWS re:Invent 2018
 
Our Favorite Admin Features in Cognos Analytics 11.1
Our Favorite Admin Features in Cognos Analytics 11.1Our Favorite Admin Features in Cognos Analytics 11.1
Our Favorite Admin Features in Cognos Analytics 11.1
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
UCLA HACKU'11
UCLA HACKU'11UCLA HACKU'11
UCLA HACKU'11
 
Introduction to GluonNLP
Introduction to GluonNLPIntroduction to GluonNLP
Introduction to GluonNLP
 
Windows10TipsandTricksBooklet
Windows10TipsandTricksBookletWindows10TipsandTricksBooklet
Windows10TipsandTricksBooklet
 
systemd and configuration management
systemd and configuration managementsystemd and configuration management
systemd and configuration management
 
Presentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association EventPresentation at Hong Kong Start-Up Association Event
Presentation at Hong Kong Start-Up Association Event
 
Feature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon ReviewsFeature Based Opinion Mining from Amazon Reviews
Feature Based Opinion Mining from Amazon Reviews
 
IoT State of the Union
IoT State of the UnionIoT State of the Union
IoT State of the Union
 

Más de blueace (8)

Research & Tracking via een Social Network
Research & Tracking via een Social NetworkResearch & Tracking via een Social Network
Research & Tracking via een Social Network
 
Enhanced research via software & web tracking
Enhanced research via software & web trackingEnhanced research via software & web tracking
Enhanced research via software & web tracking
 
(Dutch) CSN: social network succes
(Dutch) CSN: social network succes(Dutch) CSN: social network succes
(Dutch) CSN: social network succes
 
Recommendations 101
Recommendations 101Recommendations 101
Recommendations 101
 
(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0
(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0
(Dutch) Web 2.0 Succesfactoren @ Overheid 2.0
 
Roomware - The operating system for interactive spaces
Roomware - The operating system for interactive spacesRoomware - The operating system for interactive spaces
Roomware - The operating system for interactive spaces
 
Wakoopa at The Next Web 2008
Wakoopa at The Next Web 2008Wakoopa at The Next Web 2008
Wakoopa at The Next Web 2008
 
How we did RoR in Wakoopa
How we did RoR in WakoopaHow we did RoR in Wakoopa
How we did RoR in Wakoopa
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

How to build a recommender system?

  • 1.
  • 3. How to build a recommender system? Wakoopa use case
  • 10. Building a recommender system Approach and challenges
  • 11. Data what do we have? Usage (implicit) Ratings (explicit) vs. • • Noisy Accurate • • Only positive feedback Positive and negative feedback • • Easy to collect Hard to collect
  • 12. Data what do we use? • Active users (Tracker activity in the past month): ~9.000 • Actively used software items (in the past month): ~10.000 • We calculate recommendations for each OS together with Web applications separately
  • 13. Recommender system methods Collaborative recommendations: The user will be recommended items that people with similar tastes and preferences liked (used) in the past • Item-based collaborative filtering • User-based collaborative filtering (we only use for calculating user similarities to find people like you) • Combining both methods
  • 14. Item-Based Collaborative Filtering User software usage matrix Software items 220 90 180 22 280 12 42 80 Users 175 210 210 45 165 35 195 13 25 100 50 185 35 190 60 65 185
  • 15. User software usage matrix [0, 1] Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  • 16. How do we predict the probability that I would like to use GMail? Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 ? Users 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  • 17. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  • 18. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Users 1 1 0 1 0 1 0 1 0 1 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  • 19. Calculate the similarities between Gmail and the other software items. Software items 1 1 0 1 0 1 0 1 1 1 0 1 0 0 Popularity correction, Users 1 1 0 1 0 1 0 we put less trust 1 0 1 1 1 1 0 in popular software 0 1 1 1 0 1 1 0 1 0 1 0 0 1 Cosine Similarity(Firefox, Gmail)
  • 20. Item-item correlation matrix 1 0.1 0.6 0.1 0.1 0.1 0.7 0.2 1 0.8 0.5 0.8 0.1 0.9 0.1 0.6 1 0.5 0.7 0.2 0.3 0.2 0.6 0.4 1 0.8 0.2 0.3 0.5 0.4 0.4 0.4 1 0.1 0.2 0.5 0.5 0.3 0.5 0.3 1 0.3 0.2 0.6 0.3 0.8 0.7 0.7 1
  • 21. Item-item correlation matrix Gmail similarities 0.6 1 0.1 0.6 0.1 0.1 0.1 0.7 0.8 0.2 1 0.8 0.5 0.8 0.1 0.9 0.4 0.1 0.6 1 0.5 0.7 0.2 0.3 0.4 0.2 0.6 0.4 1 0.8 0.2 0.3 0.3 0.5 0.4 0.4 0.4 1 0.1 0.2 0.3 0.5 0.5 0.3 0.5 0.3 1 0.3 0.2 0.6 0.3 0.8 0.7 0.7 1
  • 22. K-nearest neighbor approach Gmail similarities • Performance vs quality 0.6 • We take only the ‘K’ most similar items (say 4) 0.8 • Space complexity: O(m + Kn) 0.4 • 0.4 Computational complexity: O(m + n²) 0.3 0.3
  • 23. Calculate the predicted value for Gmail Gmail similarities User usage 1 0.6 1 0.8 1 0.4 0.4 1
  • 24. Calculate the predicted value for Gmail Gmail similarities User usage 0.9 0.6 Usage correction, 0.8 0.8 more usage results in a higher score [0,1] 0.6 0.4 0.4 0.2
  • 25. Calculate the predicted value for Gmail Gmail similarities User usage 0.9 0.6 0.8 0.8 0.6 0.4 0.4 0.2 (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6) = 0.82 0.6 + 0.8 + 0.4 + 0.4
  • 26. Calculate the predicted value for Gmail • User feedback Gmail similarities User usage • Contacts usage 0.9 0.6 • Commercial vs Free 0.8 0.8 0.6 0.4 0.4 0.2 (0.6 * 0.9) + (0.8 * 0.8) + (0.4 * 0.6) = 0.82 0.6 + 0.8 + 0.4 + 0.4
  • 27. Calculate all unknown values and show the Top-N recommendations to each user Software items ? ? ? 1 1 1 1 ?1?? 1 1 1 ?1?1? Users 1 1 ?1111? 1 ?111?11 ?1?1??1
  • 28. Explainability Why did I get this recommendation? • Overlap between the item’s (K) neighbors and your usage
  • 29. User-Based Collaborative Filtering Finding people like you 1 1 0 1 0 1 0 1 1 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 Cosine Similarity(Coen, Menno) 0 1 1 1 0 1 1 0 1 0 1 0 0 1
  • 30. Applying inverse user frequency log(n/ni): ni is the number of users that uses item i and n is the total number of users in the database 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0 0.8 0 0 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0.4 0.8 0.4 0 Cosine Similarity(Coen, Menno) 0 0.2 0.6 0.4 0 0.4 0.2 0 0.2 0 0.4 0 0 0.2 The fact that you both use Textmate tells you more than when you both use firefox
  • 31. 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0 0.8 0 0 0.1 0.2 0 0.4 0 0.4 0 0.1 0.2 0.6 0.4 0.8 0.4 0 Cosine Similarity(Coen, Menno) 0 0.2 0.6 0.4 0 0.4 0.2 0 0.2 0 0.4 0 0 0.2
  • 32. User-user correlation matrix 1 0.8 0.6 0.5 0.7 0.2 0.8 1 0.4 0.7 0.5 0.5 0.6 0.4 1 0.4 0.9 0.1 0.5 0.8 0.4 1 0.6 0.4 0.8 0.5 0.9 0.6 1 0.2 0.2 0.5 0.1 0.4 0.2 1
  • 33. Performance measure for success • Cross-validation: Train-Test split (80-20) • Precision and Recall: - precision = size(hit set) / size(total given recs) - recall = size(hit set) / size(test set) • Root mean squared error (RMSE)
  • 34. Implementation • Ruby Enterprise Edition (garbage collection) • MySQL database • Built our own c-libraries • Amazon EC2: - Low cost - Flexibility - Ease of use • Open source
  • 35. Future challenges • What is the best algorithm for Wakoopa? (or you) • Reducing space-time complexity (scalability): - Parallelization (Clojure) - Distributed computing (Hadoop)
  • 36.
  • 37. 1 evening, 3 speakers, 100 developers www.recked.org