SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Mendeley Suggest:
       Engineering a
  Personalised Article
Recommender System




          Kris Jack, PhD
         Chief Data Scientist
   https://twitter.com/_krisjack
Overview

➔
    What's Mendeley?

➔
    What's Mendeley Suggest?

➔
    Computation Layer

➔
    Serving Layer
    ➔
      Architecture
    ➔
      Technologies
    ➔
      Deployment

➔
    Conclusions
What's Mendeley?
➔
    Mendeley is a platform that connects
    researchers, research data and apps




                         Mendeley Open API
➔
    Mendeley is a platform that connects
    researchers, research data and apps




                         Mendeley Open API


➔
    Startup company with ~20 R&D engineers
What's Mendeley
       Suggest?
Use Case
➔
    Good researchers are on top of their game
➔
    Difficult with the amount being produced

➔
    There must be a technology that can help




➔
    Help researchers by recommending relevant research
Mendeley Suggest
Computation
     Layer
Mendeley Suggest
Mendeley Suggest
Mendeley Suggest
Running on Amazon's Elastic Map Reduce




                On demand use and easy to cost
Computation Layer                                      1.5M Users, 50M Articles
                                      Mahout's
    Normalised Amazon Hours          Performance




                              No. Good Recommendations/10
Computation Layer                                          1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad
    Normalised Amazon Hours              Performance            Costly & Good




              Cheap & Bad         No. Good Recommendations/10   Cheap & Good
Computation Layer                                          1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad
    Normalised Amazon Hours              Performance            Costly & Good




              Cheap & Bad         No. Good Recommendations/10   Cheap & Good
Computation Layer                                          1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad
    Normalised Amazon Hours              Performance            Costly & Good




              Cheap & Bad         No. Good Recommendations/10   Cheap & Good
Computation Layer                                     1.5M Users, 50M Articles
                                        Mahout's
                   Costly & Bad        Performance          Costly & Good
                              7K
    Normalised Amazon Hours


                              6K

                              5K

                              4K

                              3K

                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5         3
              Cheap & Bad   No. Good Recommendations/10     Cheap & Good
Computation Layer                                         1.5M Users, 50M Articles
                                          Mahout's
                   Costly & Bad          Performance         Costly & Good
                              7K
                                       6.5K, 1.5
    Normalised Amazon Hours


                              6K       Orig. item-based


                              5K

                              4K

                              3K

                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5           3
              Cheap & Bad   No. Good Recommendations/10       Cheap & Good
Computation Layer                                             1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance         Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5               3
              Cheap & Bad   No. Good Recommendations/10           Cheap & Good
Computation Layer                                                       1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                   Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K
                                                              -4.1K
                                                              (63%)
                              4K
                                                                 Paritioners
                                                                 MR allocation
                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5                         3
              Cheap & Bad   No. Good Recommendations/10                     Cheap & Good
Computation Layer                                             1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance         Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K

                              1K

                               0
                          0.5     10     1.5   2      2.5               3
              Cheap & Bad   No. Good Recommendations/10           Cheap & Good
Computation Layer                                                           1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                        Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                                  Orig. user-based
                              1K
                                                              ➔
                                                                  1K, 2.5


                               0
                          0.5     10     1.5   2      2.5                             3
              Cheap & Bad   No. Good Recommendations/10                          Cheap & Good
Computation Layer                                                           1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                        Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                                              +1 (67%)
                                       ➔
                                           2.4K, 1.5
                              2K              -1.4K
                                                                  Orig. user-based
                                              (58%)
                              1K
                                                              ➔
                                                                  1K, 2.5


                               0
                          0.5     10     1.5   2      2.5                             3
              Cheap & Bad   No. Good Recommendations/10                          Cheap & Good
Computation Layer                                                        1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                      Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                                Orig. user-based
                              1K
                                                              ➔
                                                                1K, 2.5
                                                                Cust. user-based
                                                              ➔
                                                                0.3K, 2.5
                               0
                          0.5     10     1.5   2      2.5                           3
              Cheap & Bad   No. Good Recommendations/10                        Cheap & Good
Computation Layer                                                      1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                   Costly & Good
                              7K
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K
                                                              -4.1K
                                                              (63%)
                              4K

                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                             Orig. user-based
                              1K                             1K, 2.5
                                                               ➔


                                                      -0.7K  Cust. user-based
                                                      (70%) ➔0.3K, 2.5
                               0
                          0.5     10     1.5   2      2.5                        3
              Cheap & Bad   No. Good Recommendations/10                     Cheap & Good
Computation Layer                                                        1.5M Users, 50M Articles
                                              Mahout's
                   Costly & Bad              Performance                      Costly & Good
                              7K                              +1 (67%)
                                           6.5K, 1.5
    Normalised Amazon Hours


                              6K           Orig. item-based


                              5K

                              4K
                                                                         -6.2K
                                                                         (95%)
                              3K           Cust. item-based
                                       ➔
                                           2.4K, 1.5
                              2K
                                                                Orig. user-based
                              1K
                                                              ➔
                                                                1K, 2.5
                                                                Cust. user-based
                                                              ➔
                                                                0.3K, 2.5
                               0
                          0.5     10     1.5   2      2.5                           3
              Cheap & Bad   No. Good Recommendations/10                        Cheap & Good
Mahout as the Computation
Layer
➔
    Out of the box, didn't work so well for us
➔
    Needed to understand Hadoop better
➔
    Contributed patch back to community (user-user)

➔
    Next step, the serving layer...
Serving Layer
Architecture




                           Mendeley
                            Hadoop
                            Cluster
   User        Cascading
 Libraries
                                      Computation
                                      Layer
Architecture

                       AWS


                                               Elastic
                                                 Elastic
                                              Beanstalk
                               DynamoDB           Elastic
                                               Beanstalk
                                                 Beanstalk
             Serving
             Layer

                                          Mendeley
                                           Hadoop
                                           Cluster
   User                      Map Reduce
 Libraries
                                                     Computation
                                                     Layer
Technologies

➔
    Spring dependency injection framework
    ➔
        Context-wide integration testing is easy, including pre-loading
        of test data
    ➔
        Allows other Spring features (cache, security, messaging)
➔
    Spring MVC 3.2.M1
    ➔
        Annotated controllers, type conversion 'for free'
    ➔
        Asynchronous Servlet 3.0 supports thread 'parking'
➔
    AlternatorDB
    ➔
        In-memory DynamoDB implementation for testing
Technologies


                                   Recommendation<K>




              LongRecommendation                         UuidRecommendation



GroupRecommendation       PersonRecommendation         DocumentRecommendation




➔
    Build once, employ in several use cases
Deployment

➔
    AWS ElasticBeanstalk
    ➔
        Managed, auto-scaling, health-checking .war container
➔
    Jenkins continuous integration (CI) server
➔
    Maven build tool (useful dependency management)
➔
    beanstalk-maven-plugin (push a button to deploy)
    ➔
        Deploys to ElasticBeanstalk
    ➔
        Replaces existing application version if required
    ➔
        'Zero downtime' updates (tested at ~300ms)
    ➔
        Triggered by Jenkins
Putting it all together... $$$
➔
    Real-time article recommendations for 2 million users
➔
    20 requests per second
➔
    $65.84/month
    ➔
        $34.24 ElasticBeanstalk
    ➔
        $28.17 DynamoDB
    ➔
        $2.76 bandwidth
➔
    $30 to update the computation layer periodically
Conclusions
Conclusions
➔
    Mendeley Suggest is a personalised article recommender
➔
    Built by small team for big data
➔
    Uses Mahout as computation layer
    ➔
        Needs some love out of the box
➔
    Serves from AWS
    ➔
        Reduces maintenance costs and is reliable
➔
    Intend to release Mendeley Suggest to all users this year
We're Hiring!
➔
    Data Scientist
    ➔
        apply recommender technologies to Mendeley's data
    ➔
        work on improving the quality of Mendeley's research catalogue
    ➔
        starting in first quarter of 2013
    ➔
        6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7
        TEAM project (http://team-project.tugraz.at/)
➔
    http://www.mendeley.com/careers/
www.mendeley.com

Más contenido relacionado

Más de Kris Jack

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ MendeleyKris Jack
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Kris Jack
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Kris Jack
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesKris Jack
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutKris Jack
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyKris Jack
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesKris Jack
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Kris Jack
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionKris Jack
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...Kris Jack
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...Kris Jack
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleKris Jack
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersKris Jack
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureKris Jack
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureKris Jack
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyKris Jack
 

Más de Kris Jack (17)

Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Machine Learning @ Mendeley
Machine Learning @ MendeleyMachine Learning @ Mendeley
Machine Learning @ Mendeley
 
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...Mendeley’s Research Catalogue: building it, opening it up and making it even ...
Mendeley’s Research Catalogue: building it, opening it up and making it even ...
 
Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?Mendeley Suggest: What will you read next?
Mendeley Suggest: What will you read next?
 
Mendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data ChallengesMendeley's Data and Perspectives on Data Challenges
Mendeley's Data and Perspectives on Data Challenges
 
Scientific Article Recommendation with Mahout
Scientific Article Recommendation with MahoutScientific Article Recommendation with Mahout
Scientific Article Recommendation with Mahout
 
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at MendeleyMahout Becomes a Researcher: Large Scale Recommendations at Mendeley
Mahout Becomes a Researcher: Large Scale Recommendations at Mendeley
 
improving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similaritiesimproving explicit preference entry by visualising data similarities
improving explicit preference entry by visualising data similarities
 
Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...Etude de la pertinence de critères de recherche en recherche d'informations s...
Etude de la pertinence de critères de recherche en recherche d'informations s...
 
A Computational Model of Staged Language Acquisition
A Computational Model of Staged Language AcquisitionA Computational Model of Staged Language Acquisition
A Computational Model of Staged Language Acquisition
 
From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...From Syllables to Syntax: Investigating Staged Linguistic Development through...
From Syllables to Syntax: Investigating Staged Linguistic Development through...
 
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...A Collaborative Tool for the Computational Modelling of Child Language Acquis...
A Collaborative Tool for the Computational Modelling of Child Language Acquis...
 
Mendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scaleMendeley: crowdsourcing and recommending research on a large scale
Mendeley: crowdsourcing and recommending research on a large scale
 
Mendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchersMendeley, putting data into the hands of researchers
Mendeley, putting data into the hands of researchers
 
Mendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic LiteratureMendeley: Recommendation Systems for Academic Literature
Mendeley: Recommendation Systems for Academic Literature
 
Recommendation Engines for Scientific Literature
Recommendation Engines for Scientific LiteratureRecommendation Engines for Scientific Literature
Recommendation Engines for Scientific Literature
 
Cloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from MendeleyCloud Elephants and Witches: A Big Data Tale from Mendeley
Cloud Elephants and Witches: A Big Data Tale from Mendeley
 

Último

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Mendeley Suggest: Engineering a Personalised Article Recommender System

  • 1. Mendeley Suggest: Engineering a Personalised Article Recommender System Kris Jack, PhD Chief Data Scientist https://twitter.com/_krisjack
  • 2. Overview ➔ What's Mendeley? ➔ What's Mendeley Suggest? ➔ Computation Layer ➔ Serving Layer ➔ Architecture ➔ Technologies ➔ Deployment ➔ Conclusions
  • 4. Mendeley is a platform that connects researchers, research data and apps Mendeley Open API
  • 5. Mendeley is a platform that connects researchers, research data and apps Mendeley Open API ➔ Startup company with ~20 R&D engineers
  • 6. What's Mendeley Suggest?
  • 7. Use Case ➔ Good researchers are on top of their game ➔ Difficult with the amount being produced ➔ There must be a technology that can help ➔ Help researchers by recommending relevant research
  • 9. Computation Layer
  • 13. Running on Amazon's Elastic Map Reduce On demand use and easy to cost
  • 14. Computation Layer 1.5M Users, 50M Articles Mahout's Normalised Amazon Hours Performance No. Good Recommendations/10
  • 15. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 16. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 17. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Normalised Amazon Hours Performance Costly & Good Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 18. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K Normalised Amazon Hours 6K 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 19. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 20. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 21. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K Paritioners MR allocation 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 22. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K 1K 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 23. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 24. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based +1 (67%) ➔ 2.4K, 1.5 2K -1.4K Orig. user-based (58%) 1K ➔ 1K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 25. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 26. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K -4.1K (63%) 4K 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K 1K, 2.5 ➔ -0.7K Cust. user-based (70%) ➔0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 27. Computation Layer 1.5M Users, 50M Articles Mahout's Costly & Bad Performance Costly & Good 7K +1 (67%) 6.5K, 1.5 Normalised Amazon Hours 6K Orig. item-based 5K 4K -6.2K (95%) 3K Cust. item-based ➔ 2.4K, 1.5 2K Orig. user-based 1K ➔ 1K, 2.5 Cust. user-based ➔ 0.3K, 2.5 0 0.5 10 1.5 2 2.5 3 Cheap & Bad No. Good Recommendations/10 Cheap & Good
  • 28. Mahout as the Computation Layer ➔ Out of the box, didn't work so well for us ➔ Needed to understand Hadoop better ➔ Contributed patch back to community (user-user) ➔ Next step, the serving layer...
  • 30. Architecture Mendeley Hadoop Cluster User Cascading Libraries Computation Layer
  • 31. Architecture AWS Elastic Elastic Beanstalk DynamoDB Elastic Beanstalk Beanstalk Serving Layer Mendeley Hadoop Cluster User Map Reduce Libraries Computation Layer
  • 32. Technologies ➔ Spring dependency injection framework ➔ Context-wide integration testing is easy, including pre-loading of test data ➔ Allows other Spring features (cache, security, messaging) ➔ Spring MVC 3.2.M1 ➔ Annotated controllers, type conversion 'for free' ➔ Asynchronous Servlet 3.0 supports thread 'parking' ➔ AlternatorDB ➔ In-memory DynamoDB implementation for testing
  • 33. Technologies Recommendation<K> LongRecommendation UuidRecommendation GroupRecommendation PersonRecommendation DocumentRecommendation ➔ Build once, employ in several use cases
  • 34. Deployment ➔ AWS ElasticBeanstalk ➔ Managed, auto-scaling, health-checking .war container ➔ Jenkins continuous integration (CI) server ➔ Maven build tool (useful dependency management) ➔ beanstalk-maven-plugin (push a button to deploy) ➔ Deploys to ElasticBeanstalk ➔ Replaces existing application version if required ➔ 'Zero downtime' updates (tested at ~300ms) ➔ Triggered by Jenkins
  • 35. Putting it all together... $$$ ➔ Real-time article recommendations for 2 million users ➔ 20 requests per second ➔ $65.84/month ➔ $34.24 ElasticBeanstalk ➔ $28.17 DynamoDB ➔ $2.76 bandwidth ➔ $30 to update the computation layer periodically
  • 37. Conclusions ➔ Mendeley Suggest is a personalised article recommender ➔ Built by small team for big data ➔ Uses Mahout as computation layer ➔ Needs some love out of the box ➔ Serves from AWS ➔ Reduces maintenance costs and is reliable ➔ Intend to release Mendeley Suggest to all users this year
  • 38. We're Hiring! ➔ Data Scientist ➔ apply recommender technologies to Mendeley's data ➔ work on improving the quality of Mendeley's research catalogue ➔ starting in first quarter of 2013 ➔ 6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7 TEAM project (http://team-project.tugraz.at/) ➔ http://www.mendeley.com/careers/