SlideShare una empresa de Scribd logo
1 de 30
Amazon Web Services
at
Mendeley
Dan Harvey
Data Architect



twitter: @danharvey
dan.harvey@mendeley.com
Overview
• What do we do?
• System design
• AWS details
• Future plans
• Summary
Mendeley helps researchers work smarter
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic data extraction




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            External database integration




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic bibliography generation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Tagging and annotation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter


                        3) Mendeley aggregates research
                                       data in the cloud
1) Install
Mendeley Desktop




               2) Manage
            your research
                   papers
By doing this, Mendeley makes science more
collaborative and transparent
Mendeley in numbers
• 1 million users

• 130 million research articles
• 40 million unique

• 14 million unique files uploaded
• 13 TB in total
System Overview
     S3
                                                                                                  ng
            Amazon Web                                       Web             Web           S ynci
             Services                                       Server          Server
EM
  R
                                                                                           Brow
                                                                                               sing




                                                             Docs
     EC
       2




                                                                              Usage Logs
                                           MySQL

                                                    MySQL


                                                               MySQL
           Da
             ta S
                 erv
                    ice
                       s
                              Map Reduce




                                                   HB
                                                     ase               HD
                                                                         FS
File Storage
• Sync to and from clients
 –Backed onto S3

• How to render 13TB of pdfs?
PDF Previews
• Elastic Beanstalk
• Java servlet
 –Load & render
 –Store into S3
• Quick to prototype
 –Fast iterations
 –No infrastructure to set up
                                   ©   Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011

 –Developers in control
 –No upfront cost in hardware
• No dependency on rest of our infrastructure
Adapt to take advantage
• Improve delivery
 –Cloud Front
 –Faster worldwide

• Re-working for cost saving
 –SQS
 –Spot instances
 –Render when it’s cheapest!
Article Search
• 40 million papers
• Gives 40GB index in Solr

• Variable load

• Moved to EC2
 –Elastic Load Balancer
                             Two
fold
variance
in
traffic
over
a
week
 –Auto-scale instances
Solr Instance Layout
• Master
                                         Solr
 –Single instance                       Master

 –Matched to indexing load
 –Backed onto EBS
                              Solr
                                          Solr        Solr
                             Slave
                                         Slave       Slave

• Slaves
 –HTTP sync to master
 –Pre-built AMI images                  Elastic
                                     Load Balancer
 –EC2 auto scaling
Desktop Client
• Client Downloads
 –From S3
 –Adding CloudFront


• Crash Reports
 –Stack traces into S3
 –Analytic reports on top
 –More focused bug fixing
The future
• Aim to buy no more hardware

• More Java on Elastic Beanstalk
• SQS - replace queues

• EMR - log analysis
• SimpleDB & S3 for data stores
Problems Faced
• Accounting usage
 –Mix of users on account
 –Start early with this!
 –IAM helps

• Orchestration
 –Cloud Formation
 –Elastic Beanstalk
 –Finding we need more
Summary
• Not all or nothing

• Focus on your problem
       not “Undifferentiated heavy lifting”
                                  - Werner Vogels


• Learn the building blocks provided
• Modular system design helps
Mendeley Binary Battle
• $10,001 prize + $1000 aws vouchers
• Collaboration with PLoS
• Prizes to best use of the API

• Judging panel includes
 –Werner Vogels
 –Tim O'Reilly
We’re hiring
     http://mendeley.com/careers/

             or chat to me after

• Lead Mobile Developer, iOS
• Web Developer, PHP/MySQL
• Software Engineer, Java

Más contenido relacionado

Destacado

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms wordWouter Verkerken
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop PresentationSalma Patel
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing processKhalid Hakeem
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?Annette Gerritsen
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescriptionDani Firman
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Dani Firman
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research PaperDraizelle Sexon
 

Destacado (10)

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms word
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop Presentation
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing process
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescription
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)
 
How to Write a Thesis
How to Write a ThesisHow to Write a Thesis
How to Write a Thesis
 
Structured writing - What's it Good For?
Structured writing - What's it Good For?Structured writing - What's it Good For?
Structured writing - What's it Good For?
 
Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research Paper
 

Más de Dan Harvey

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopDan Harvey
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to HadoopDan Harvey
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Dan Harvey
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loadingDan Harvey
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at MendeleyDan Harvey
 

Más de Dan Harvey (6)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at Mendeley
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Amazon Web Services at Mendeley

  • 1. Amazon Web Services at Mendeley Dan Harvey Data Architect twitter: @danharvey dan.harvey@mendeley.com
  • 2. Overview • What do we do? • System design • AWS details • Future plans • Summary
  • 4. Mendeley helps researchers work smarter 1) Install Mendeley Desktop
  • 5. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic data extraction 2) Manage your research papers
  • 6. Mendeley helps researchers work smarter 1) Install Mendeley Desktop External database integration 2) Manage your research papers
  • 7. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic bibliography generation 2) Manage your research papers
  • 8. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Tagging and annotation 2) Manage your research papers
  • 9. Mendeley helps researchers work smarter 3) Mendeley aggregates research data in the cloud 1) Install Mendeley Desktop 2) Manage your research papers
  • 10. By doing this, Mendeley makes science more collaborative and transparent
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Mendeley in numbers • 1 million users • 130 million research articles • 40 million unique • 14 million unique files uploaded • 13 TB in total
  • 19. System Overview S3 ng Amazon Web Web Web S ynci Services Server Server EM R Brow sing Docs EC 2 Usage Logs MySQL MySQL MySQL Da ta S erv ice s Map Reduce HB ase HD FS
  • 20. File Storage • Sync to and from clients –Backed onto S3 • How to render 13TB of pdfs?
  • 21. PDF Previews • Elastic Beanstalk • Java servlet –Load & render –Store into S3 • Quick to prototype –Fast iterations –No infrastructure to set up © Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011 –Developers in control –No upfront cost in hardware • No dependency on rest of our infrastructure
  • 22. Adapt to take advantage • Improve delivery –Cloud Front –Faster worldwide • Re-working for cost saving –SQS –Spot instances –Render when it’s cheapest!
  • 23. Article Search • 40 million papers • Gives 40GB index in Solr • Variable load • Moved to EC2 –Elastic Load Balancer Two
fold
variance
in
traffic
over
a
week –Auto-scale instances
  • 24. Solr Instance Layout • Master Solr –Single instance Master –Matched to indexing load –Backed onto EBS Solr Solr Solr Slave Slave Slave • Slaves –HTTP sync to master –Pre-built AMI images Elastic Load Balancer –EC2 auto scaling
  • 25. Desktop Client • Client Downloads –From S3 –Adding CloudFront • Crash Reports –Stack traces into S3 –Analytic reports on top –More focused bug fixing
  • 26. The future • Aim to buy no more hardware • More Java on Elastic Beanstalk • SQS - replace queues • EMR - log analysis • SimpleDB & S3 for data stores
  • 27. Problems Faced • Accounting usage –Mix of users on account –Start early with this! –IAM helps • Orchestration –Cloud Formation –Elastic Beanstalk –Finding we need more
  • 28. Summary • Not all or nothing • Focus on your problem not “Undifferentiated heavy lifting” - Werner Vogels • Learn the building blocks provided • Modular system design helps
  • 29. Mendeley Binary Battle • $10,001 prize + $1000 aws vouchers • Collaboration with PLoS • Prizes to best use of the API • Judging panel includes –Werner Vogels –Tim O'Reilly
  • 30. We’re hiring http://mendeley.com/careers/ or chat to me after • Lead Mobile Developer, iOS • Web Developer, PHP/MySQL • Software Engineer, Java

Notas del editor

  1. \n
  2. \n
  3. as\n
  4. as\n
  5. as\n
  6. as\n
  7. as\n
  8. as\n
  9. as\n
  10. as\n
  11. as\n
  12. as\n
  13. as\n
  14. as\n
  15. as\n
  16. as\n
  17. as\n
  18. as\n
  19. as\n
  20. as\n
  21. as\n
  22. as\n
  23. as\n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n