Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

AWS for HPC in Drug Discovery

1.988 visualizaciones

Publicado el

Drug discovery at 2x speed. Faster, more comprehensive testing approval processes. Identifying gene targets in massive sequencing data sets. These goals are ambitious yet attainable, but not without increasing the computational capabilities of today's researchers. While everyone agrees that simply deploying more infrastructure is not the answer, running that work in the cloud is not without challenges. In this talk we will discuss and illustrate elements of those workloads that Cycle Computing's customers have run on AWS, generating vastly better results than would have been attained on traditional infrastructure. We will cover some common problems they encountered, and how they resolved them using Amazon EC2, S3, Glacier, and Cycle's software.

Presenters: Dougal Ballantyne, Business Development, AWS; Rob Futrick, CTO, Cycle Computing

Publicado en: Tecnología
  • Sé el primero en comentar

AWS for HPC in Drug Discovery

  1. 1. CONFIDENTIAL © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. CONFIDENTIAL Why AWS for HPC? Low cost with flexible pricing Efficient clusters Unlimited infrastructure Faster time to results Concurrent clusters on-demand Increased collaboration
  3. 3. CONFIDENTIAL Schrodinger Material Sciences Tools Estimated $68M for a cluster purchase, or 200-years on an on-premise machine vs 50,000-core analytics job run on AWS cloud, completed in 18 hours using 1.21 petaflops of computing capacity at peak… …for a total of $33K
  4. 4. CONFIDENTIAL Novartis Estimated 50,000 cores and $40M to experiment internally vs 10,600 Spot Instances, ~87,000 compute cores 39 years of computational chemistry in 9 hours… …for a total of $4,232
  5. 5. CONFIDENTIAL But cloud provides more than scale • Compliance • Data management – Secure – Integrated Lifecycle Management • Collaboration – Real time desktop sharing – Controlled sharing of data
  6. 6. CONFIDENTIAL AWS Global Infrastructure Application Services Networking Deployment & Administration DatabaseStorageCompute
  7. 7. CONFIDENTIAL Compiance Collaboration Scale Cloud offers… …with higher performance and lower cost than on-premise HPC
  8. 8. CONFIDENTIAL HPC ♥ Cloud
  9. 9. CONFIDENTIAL BETTER ANSWERS, FASTER. Rob Futrick rob.futrick@cyclecomputing.com 8
  10. 10. CONFIDENTIAL BigData + BigCompute Fraud Detection Risk Modeling Drug Design Genomics Modeling & Simulation Customer Analysis Data Lakes
  11. 11. CONFIDENTIAL Utility computing should be ubiquitous and easily accessible.
  12. 12. CONFIDENTIAL Well, after he’s at daycare...
  13. 13. CONFIDENTIAL Great, so… what’s the problem?
  14. 14. CONFIDENTIAL Recognize this?
  15. 15. CONFIDENTIAL The Problem: Fixed Capacity
  16. 16. CONFIDENTIAL Can cloud help?
  17. 17. CONFIDENTIAL • Arkema Comp Chem • Tute Genomics NGS • J&J PK/PD, clinical trial simulation • Novartis Institutes for Biomedical Research Drug Discovery • Large BioTech Petabyte+ Genome data archiving • J&J Statistical modeling, data archival, and computation Cloud helped…
  18. 18. CONFIDENTIAL For users, focus should be on science not IT. Easy access to compute changes everything. Accelerating compute accelerates people. Data wants to be stored and processed. Patterns
  19. 19. CONFIDENTIAL The Problem in 2015: • Need to run applications such as Gromacs, LAMMPS, & Quantum Espresso • No internal option to procure or support a cluster • Small amount of compute Solution: AWS & Create fully functional compute clusters - of a few nodes - on demand 18 Arkema: Comp chemistry Data Workflow Cloud Orchestration Analytics Modeling Compute Workflow
  20. 20. CONFIDENTIAL 19 Tute Genomics: NGS The Problem in 2015: • Need to run an in-house genome sequencing and analysis pipeline • No internal option to procure or support a cluster • Small initial compute needs Solution: AWS & Create fully functional compute clusters on demand. Data Workflow Cloud Orchestration Analytics Modeling Compute Workflow
  21. 21. CONFIDENTIAL 20 Users are focused on Science Not cluster management Data Workflow Cloud Orchestration Analytics Modeling Compute Workflow
  22. 22. CONFIDENTIAL 21 J&J: Clinical Trial Simulations The Problem: • Need to run multiple versions of apps like NONMEM in qualified and validated environments. • Environments must be maintained for years! • Need to replace EOLed infrastructure. Solution: AWS & Create qualified and validated compute environments on demand in AWS. Data Workflow Cloud Orchestration Analytics Modeling Compute Workflow
  23. 23. CONFIDENTIAL Your compute environment - at the push of a button – changes everything
  24. 24. CONFIDENTIAL Expected Impact 720 (hours) 720 720 Computing Analysis 2880 hours / 120 Days to Results Computing 720 Analysis CURRENT PROCESS (in hours) 720 Computing Analysis Analysis 1456 hours / 60.6 Days to Results 7208 8 Computing ANTICIPATED BENEFIT (in hours)
  25. 25. CONFIDENTIAL Benefit: 2-3X faster time to results 720 (hours) 720 720 Computing Analysis 2880 hours / 120 Days to Results Computing 720 Analysis CURRENT PROCESS (in hours) Higher Quality Output, Iterative Analysis, Less Context Switching Computing & Analysis POST ADOPTION: AGILE DESIGN PROCESS 8
  26. 26. CONFIDENTIAL Accelerating compute, Accelerates people. Enable them to ask the right questions.
  27. 27. CONFIDENTIAL Transform Healthcare/Life Sciences The Problem in 2013: • Cancer research needed 50,000 cores, not available in-house The options they didn’t choose: • Buy infrastructure: Spend $2M, wait 6 months • Write software for 9-12 months this 1 app Solution: • Created 10,600 server cluster • 39.5 years of computing in 8 hours • Found 3 potential drug candidates! • Total infrastructure bill: $4,372 26
  28. 28. CONFIDENTIAL What about data?
  29. 29. CONFIDENTIAL Data wants to be stored properly
  30. 30. CONFIDENTIAL San Diego BioTech • 1+ Petabytes of data • DirectConnect • Uses DataMan to fully utilize bandwidth – Encryption keys managed internally – Scheduled and just in time transfers, easy for users Internal File System 1 Petabyte Firewall Amazon S3 Amazon Glacier HTTPS Command Lines/Sched uled Transfers
  31. 31. CONFIDENTIAL Data wants to be processed
  32. 32. CONFIDENTIAL Amazon S3 Amazon Glacier S3 -> Lustre -> Processing -> Glacier Data Workflow Analytics Modeling Cloud Compute
  33. 33. CONFIDENTIAL So we’ve covered…
  34. 34. CONFIDENTIAL For users, focus should be on science not IT. Easy access to compute changes everything. Accelerating compute accelerates people. Data wants to be stored and processed. Patterns
  35. 35. CONFIDENTIAL So how do I get there?
  36. 36. CONFIDENTIAL Technical Computing Access Problems Technical Problems: - Migration: Cloud presents new APIs, interfaces - Workflow: Data, optimization - Security: Audit, compliance, and encryption - Performance: Error-handling, cost - Reporting: Chargeback, utilization, planning Users Cluster Workload 35 Business Problems: - Starting instances is just the beginning - LOBs access new, highly dynamic scale - Scale vs. cost equation - The workflow is more important than any tool
  37. 37. CONFIDENTIAL Cycle powers cloud BigData & BigCompute Data Workflow Cloud Orchestration Analytics Modeling Internal Compute Burst Software required to drive cloud analytics & simulation: • Easy access • Highly automated • Cost Optimized
  38. 38. CONFIDENTIAL Cycle Makes BigCompute Easy Burst, Data Workflow, & Orchestration  Handles errors, reliability  Automatic Spot Bidding  Schedules data movement  Secures, encrypts and audits  Provides reporting and chargeback  Validation, compliance  Supports productions operations  Scales from 100s - 100,000s cores
  39. 39. CONFIDENTIAL Utility computing should be ubiquitous and easily accessible.
  40. 40. CONFIDENTIAL We can solve the Problem!
  41. 41. CONFIDENTIAL Help your organization ask the right questions
  42. 42. CONFIDENTIAL Not only the ones that fit in fixed internal infrastructure
  43. 43. CONFIDENTIAL Think outside the boxes you own
  44. 44. CONFIDENTIAL Reap the benefits of boxes you borrow Analytics Modeling Simulation On Premise Data
  45. 45. CONFIDENTIAL BETTER ANSWERS, FASTER. Questions? 44 @rfutrick @cyclecomputing
  46. 46. CONFIDENTIAL 45
  47. 47. CONFIDENTIAL The Challenge for the Scientist • Dr. Mark Thompson • “Solar energy has the potential to replace some of our dependence on fossil fuels, but only if the solar panels can be made very inexpensively and have reasonable to high efficiencies. Organic solar cells have this potential.”
  48. 48. CONFIDENTIAL Challenge: 205,000 compounds totaling 2,312,959 core-hours, or 264 core-years
  49. 49. CONFIDENTIAL 16,788 Spot Instances, 156,314 cores! 205,000 molecules 264 years of computing
  50. 50. CONFIDENTIAL 156,314 cores = 1.21 PetaFLOPS (Rpeak) Equivalent to Top500 Jun2013 #29 205,000 molecules 264 years of computing
  51. 51. CONFIDENTIAL Done in 18 hours On $68M system for $33k 205,000 molecules 264 years of computing
  52. 52. CONFIDENTIAL BETTER ANSWERS, FASTER. Thank you. 51

×