Mendeley Suggest is a personalised article recommender system built by Mendeley to help researchers stay on top of new research. It uses Mahout as the computation layer to generate recommendations, running on Amazon's Elastic MapReduce and serving recommendations using AWS services like Elastic Beanstalk and DynamoDB. The system was improved from Mahout's out-of-the-box performance by customizing algorithms and partitioning data for better performance at lower cost. It can now provide recommendations for 2 million users for $65.84 per month.
7. Use Case
➔
Good researchers are on top of their game
➔
Difficult with the amount being produced
➔
There must be a technology that can help
➔
Help researchers by recommending relevant research
28. Mahout as the Computation
Layer
➔
Out of the box, didn't work so well for us
➔
Needed to understand Hadoop better
➔
Contributed patch back to community (user-user)
➔
Next step, the serving layer...
32. Technologies
➔
Spring dependency injection framework
➔
Context-wide integration testing is easy, including pre-loading
of test data
➔
Allows other Spring features (cache, security, messaging)
➔
Spring MVC 3.2.M1
➔
Annotated controllers, type conversion 'for free'
➔
Asynchronous Servlet 3.0 supports thread 'parking'
➔
AlternatorDB
➔
In-memory DynamoDB implementation for testing
33. Technologies
Recommendation<K>
LongRecommendation UuidRecommendation
GroupRecommendation PersonRecommendation DocumentRecommendation
➔
Build once, employ in several use cases
34. Deployment
➔
AWS ElasticBeanstalk
➔
Managed, auto-scaling, health-checking .war container
➔
Jenkins continuous integration (CI) server
➔
Maven build tool (useful dependency management)
➔
beanstalk-maven-plugin (push a button to deploy)
➔
Deploys to ElasticBeanstalk
➔
Replaces existing application version if required
➔
'Zero downtime' updates (tested at ~300ms)
➔
Triggered by Jenkins
35. Putting it all together... $$$
➔
Real-time article recommendations for 2 million users
➔
20 requests per second
➔
$65.84/month
➔
$34.24 ElasticBeanstalk
➔
$28.17 DynamoDB
➔
$2.76 bandwidth
➔
$30 to update the computation layer periodically
37. Conclusions
➔
Mendeley Suggest is a personalised article recommender
➔
Built by small team for big data
➔
Uses Mahout as computation layer
➔
Needs some love out of the box
➔
Serves from AWS
➔
Reduces maintenance costs and is reliable
➔
Intend to release Mendeley Suggest to all users this year
38. We're Hiring!
➔
Data Scientist
➔
apply recommender technologies to Mendeley's data
➔
work on improving the quality of Mendeley's research catalogue
➔
starting in first quarter of 2013
➔
6 month secondment in KNOW Center, TU Graz, Austria as part of the EC FP7
TEAM project (http://team-project.tugraz.at/)
➔
http://www.mendeley.com/careers/