Scientific computing on the cloud lured scientists at H3 Biomedicine in Cambridge, Massachusetts, with the promise of near-limitless compute capacity potential of Amazon EC2. Today, scientists run a wide array of applications in the cloud that contribute to the integration of human cancer genomics with chemistry and biology to discover a library of specialty cancer treatment drugs.
In this webinar, you'll hear how this organization has built cloud infrastructure in a way that reduces latency and gives them storage flexibility, and does so in a way that helps them save money and support their business strategy. The H3 Biomedicine story will be supported by a look at the cloud technology and AWS services that have enabled application migration to the cloud in a hybrid IT environment.
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Scientific Computing in the Cloud: Speeding Access for Drug Discovery
1. Scientific Computing in the Cloud
Speeding Access for Drug Discovery
Jacob Feala, Bioinformatics Platform Lead, H3 Biomedicine
Bret Martin, Principal Research Computing Architect, H3 Biomedicine
Scott Jeschonek, Director of Product Management, Cloud Solutions, Avere Systems
Sabina Joseph, Global Ecosystem Segment Leader, Partnerships & Alliances, Amazon Web Services
September 22, 2015
2. Today’s Speakers
Jacob Feala Bret Martin Scott Jeschonek Sabina Joseph
Bioinformatics
Platform Lead
Principal Research
Computing Architect
Director of Product
Management, Cloud
Global Ecosystem
Segment Leader,
Partnerships & Alliances
6. Cancer cells hijack the mRNA splicing machinery
• Spliceosome genes have
recurrent mutations in some
types of cancer
• These “hotspot” mutations
have been shown to alter
splicing in hundreds of mRNA
transcripts
mutations
DNA
pre-mRNA
splicing
mRNA
altered splicing normal splicing
7. RNA-seq is the driver for our mRNA splicing research
• H3’s chemistry platform generates
compounds that can modulate splicing
• H3 biologists can genetically engineer
the spliceosome mutations in cell lines
• RNA-seq allows us to detect novel
splicing caused by drugs or mutations
mutations
drugs
DNA
pre-mRNA
splicing
mRNA
mRNA sequencing
altered splicing
8. When in doubt, just sequence it!
Plummeting
sequencing costs
+
Powerful
insights
=
Mounds of data
9. Early days on AWS: StarCluster and Amazon S3
• Bash scripts submitted to
SunGrid Engine scheduler
• All data downloaded and
uploaded to/from Amazon S3
• Completely disjoint from local
development environment
10. Evolution of H3 NGS workflows: Luigi, Docker, and Avere
• Atomic, idempotent workflow
scheduling with Luigi
• Environment management with
Docker
• Unified local/cloud filesystem
with Avere
• Seamless transitions from
interactive work, to pipeline dev,
to scale-out
12. Challenges
• Management
– Native AWS features (especially VPC and IAM)
– Configuration management with Ansible
• Compute challenges
– Scalable compute infrastructure on Amazon EC2
– Improving user experience with CycleCloud
• Storage challenges
– Pace of growth
– Want to focus on science, not IT and storage management
– Protect investment in on-premises storage in the medium term
– Present traditional file system to “bridge” users to the cloud
13. H3 Biomedicine Architecture Overview
• 2 VPCs
• 1 Gbps Direct
Connect with VPN
backup
• AWS services
accessed over
Direct Connect
14. H3 Biomedicine Architecture Overview
• 2 VPCs
• 1 Gbps Direct
Connect with VPN
backup
• AWS services
accessed over
Direct Connect
15. H3 Biomedicine Architecture Overview
• 2 VPCs
• 1 Gbps Direct
Connect with VPN
backup
• AWS services
accessed over
Direct Connect
16. What is Avere vFXT?
• Virtual deployment of AvereOS using Amazon EC2 instances
– 3 x r3.2xlarge
– 4 TB encrypted Amazon EBS (10 x 400 GB volumes)
• Deployed with AWS CloudFormation
• Managed using web interface and SSH
• Metrics available on things like hot files, hot clients, IOPS
17. Avere vFXT Enables H3 Biomedicine in the Cloud
• Avere vFXT makes our on-premises EMC Isilon NAS storage usable
at AWS
• Latency reduced from ~15ms to under 1ms for hot data
• Same shared filesystems are available at AWS as “at home”
– Home directories
– Group directories (including many software tools)
– Lab data directories
– Scratch space
• Provides a familiar environment to scientific staff, but with the
benefits of the cloud
18. Avere vFXT Tips
• Setup will take under an hour, if you’re ready with
– AWS account
– VPC with connectivity to core filer
– Security groups
• Proper EC2/VPC security groups and firewall rules are essential
– Allow traffic to your core filer from the vFXT
– Allow traffic from your core filer to the vFXT
• vFXT is not magic
– You need permanent, reliable connectivity to your core filer
19. H3 Biomedicine Future Storage Plans
• Replace some on-premises NAS capacity with physical Avere FXT
Series filers, providing NFS front end with S3 as backing store
• Use Avere FlashMove migration capabilities to transparently move
data from on-premises NAS to S3 – without disrupting user access
• Use vFXT for CIFS clients at AWS (Amazon WorkSpaces, regular
EC2 Windows instances)
• Replace tape with S3 using AWS Storage Gateway
21. Enterprise Grade NAS for the Cloud
Avere combines Enterprise-grade NAS
with a complete feature set for the cloud.
Enterprise-
grade NAS
100% Cloud
Enabled
Cloud
Gateway
Vendors
Traditional
NAS
Vendors
• NFS & SMB/
CIFS access
• Clustering for
performance and
capacity scaling
• Efficient use of
EBS and ECS
optimizes
performance and
cost
• Cloud snapshots
22. Benefits to Meet Scientific Computing Challenges
Industry Challenges
• Add compute resources at peak
times
• Need for 1-3 months, no long-
term commitment
• Do NOT want to rewrite
applications
• Do NOT want to move data to
the cloud
Avere Benefits
• Virtual FXT: scalable NAS for
compute cloud
• Hide latency to on-prem NAS and
object storage
• Easy setup, easy teardown
• Pay only for what is used
• Future: move data to the cloud for
better economics
23. Avere/AWS Cloud Platform Use Cases
On-Prem Compute
Bucket 2
Bucket n
Bucket 1
Physical FXT
Virtual FXT
Virtual Compute Farm
Cloud
On-prem
NAS
Cloud NAS
NAS Optimization
Amazon EC2 Amazon S3
On-Prem Storage
24. Avere Edge-Core Architecture
Private Object
Core Filers
Amazon S3WAN
WAN
Legacy NAS (local & remote)
Avere FXT Edge FilerClient
Workstations
Compute
Farm
Low latency read,
write & metadata ops
Add performance
& capacity
Place data
25. Avere Hybrid Cloud Feature Set
Customer Need Avere Feature
Low-latency file access Edge-Core Architecture
Scalable Performance and Availability Scale-out Cluster with HA
File System access whether the storage is File or
Object
FlashCloudTM File System for Object-based storage
Familiar File System Interfaces NFS and SMB client support
A single storage topology Avere Global Name Space
Data Protection Cloud Snapshots, FlashMirror
Security AES-256 Bit Encryption
Integration with AWS STS
Integration with KMIP Servers
28. What sets AWS apart?
Building and managing cloud since 2006
50+ services to support any cloud workload
History of rapid, customer-driven releases
11 regions, 30 availability zones, 53 edge locations
49 proactive price reductions to date
Thousands of partners; 2,100+ Marketplace products
Experience
Service Breadth & Depth
Pace of Innovation
Global Footprint
Pricing Philosophy
Ecosystem
*as of July 31, 2014
29. TECHNICAL &
BUSINESS
SUPPORT
Account
Management
Support
Professional
Services
Solutions
Architects
Training &
Certification
Security
& Pricing
Reports
Partner
Ecosystem
AWS
MARKETPLACE
Backup
Big Data
& HPC
Business
Apps
Databases
Development
Industry
Solutions
Security
MANAGEMENT
TOOLS
Queuing
Notifications
Search
Orchestration
Email
ENTERPRISE
APPS
Virtual
Desktops
Storage
Gateway
Sharing &
Collaboration
Email &
Calendaring
Directories
HYBRID CLOUD
MANAGEMENT
Backups
Deployment
Direct
Connect
Identity
Federation
Integrated
Management
SECURITY &
MANAGEMENT
Virtual Private
Networks
Identity &
Access
Encryption
Keys Configuration Monitoring Dedicated
INFRASTRUCTURE
SERVICES
Regions
Availability
Zones Compute
Storage
Objects, Blocks,
Files
Databases
SQL, NoSQL,
Caching
CDNNetworking
PLATFORM
SERVICES
App
Mobile
& Web
Front-end
Functions
Identity
Data Store
Real-time
Development
Containers
Source
Code
Build
Tools
Deployment
DevOps
Mobile
Sync
Identity
Push
Notifications
Mobile
Analytics
Mobile
Backend
Analytics
Data
Warehousing
Hadoop
Streaming
Data
Pipelines
Machine
Learning
30. Elas%c
Block
Store,
Elas%c
File
System,
S3
and
Glacier
Fundamental Storage Options
Simple
Storage
Service
Highly
scalable
object
storage
1
byte
to
5TB
per
object
99.999999999%
durability
Elas1c
Block
Store
High
performance
block
storage
device
1GB
to
16TB
per
volume
Mount
as
drives
to
instances
with
snapshot/cloning
func%onali%es
Glacier
Long
term
object
archive
Extremely
low
cost
per
gigabyte
99.999999999%
durability
Disks
for
instances
Slow,
rare
access
Managed
archival
system
Managed
object
storage
Elas1c
File
System
File
System
for
EC2
SSD
based,
NFSv4
Fully
managed
Mul%-‐AZ
redundancy
Managed
file
storage
31. Global Footprint
Over 1 million active customers across
190 countries
1,700 government agencies
4,500 educational institutions
11 regions
30 availability zones
53 edge locations
Everyday, AWS adds enough new server capacity to support
Amazon.com when it was a $7 billion global enterprise.
Region
Edge Location
32. Security: A Shared Responsibility
§ You retain ownership of your IP and content – AWS does not have access
§ You control region(s) where your data is stored
§ You can build end-to-end compliance, including HIPAA compliance
§ AWS data centers always “on”; robust connectivity and bandwidth
§ Ongoing audit and assurance program
§ Industry certificationsAWS secures the
infrastructure....
....so you can
secure your
patient data
33. Enabling Compliance
Build HIPAA-compliant applications that store, process and transmit PHI
Business Associate Agreement (BAA) addendum available
HIPAA-eligible services:
Amazon EC2
Amazon S3
Amazon Glacier
Amazon Redshift
Amazon EBS
Elastic Load Balancing
Data Warehousing
Compute
Object Storage
Data Archiving
Block Storage Volumes
Traffic Distribution
Amazon EMR
Amazon RDS*
DynamoDB
Managed Big Data
Relational Database
NoSQL Database
*MySQL and Oracle engines only
34. Josh Siegel
Systems Architect
Amazon's ability to sign a business
associate agreement with us as well as
their track record of security and
compliance really helped us make our
customers feel comfortable
35. We completed the equivalent
of thirty-nine years of
computational chemistry in just
under 9 hours for a cost of
around $4200.
Steve Litster
Global Head of Scientific Computing, Novartis
”
“
Novartis: Acceleration of pre-clinical R&D
• Existing infrastructure to screen 10
million compounds in a computational
model not available
• New infrastructure would have cost
approximately $40 million to build
Novartis used AWS for HPC
computational chemistry
36. “
”
• Clinical lab based in Massachusetts; spin-off of
Boston Children’s Hospital
• As a result of spin-off, Claritas lost access to data
centers at Boston Children’s and Harvard Medical
School
• Needed cost-effective infrastructure to support the
lab’s business
• Result:
• Clinical lab running with production on AWS
• Rapid (4-6 week) turnaround time for results
"[AWS] allowed me to not have to build
out a data center which was going to
be such a high cost....at least $5 million
dollars; we’ve also decreased our
turnaround time for clinical [tests]….some
companies are looking at 4-6 months,
we’re looking at 4-6 weeks
Elizabeth Boudreau
IT – Senior Manager, Claritas Genomics
Claritas Genomics – Cost-effective IT Diagnostic
Labs
39. Alex Dickinson
SVP, Strategic Initiatives
Working with AWS lets us focus
on what we’re good at, which is
doing sequencing
40. Next Steps
• Enter your questions
• Please download resources
• Please rate this webinar
• How to reach our panelists
41. Contact Us
Jacob Feala Bret Martin Scott Jeschonek Sabina Joseph
Bioinformatics Platform
Lead
Jacob_Feala@h3biomedicine.com
Principal Research
Computing Architect
Bret_Martin@h3biomedicine.com
Twitter: @bret_martin
Director of Product
Management, Cloud
sjeschonek@averesystems.com
Global Ecosystem
Segment Leader
sabinaj@amazon.com
Averesystems.com aws.amazon.comH3biomedicine.com