SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nick Whalen
Principal Engineer
Novartis Institutes for BioMedical Research
Gene Ting
Solution Architect
Amazon Web Services
Analyzing Slide Images and Processing
Phenotypic Assays at Scale on AWS
C M P 3 5 8
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
AWS services walkthrough
HCSIA overview
Compute environment design
Post-processing workflow
Architecting for resilience
Final thoughts
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Batch
Managed
No software to install or
servers to manage.
AWS Batch provisions,
manages, and scales your
infrastructure
Integrated with AWS
Natively integrated with the
AWS products and services,
AWS Batch jobs can easily
and securely interact with
services such as Amazon
Simple Storage Service
(Amazon S3), Amazon
DynamoDB, and Amazon
Rekognition
Cost-optimized resource
provisioning
AWS Batch automatically
provisions compute resources
tailored to the needs of your
jobs using Amazon Elastic
Compute Cloud (Amazon
EC2) and Spot Instances
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Batch key concepts
• Jobs
• Job definition
• Job queue
• Scheduler
• Compute environments
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Step Functions
Manages the logic
of your application.
This removes extra
code that may be
repeated in your
microservices and
functions
Write less code
AWS Step Functions
manage state,
checkpoints, and
restarts to make
sure your application
executes in order
and as expected
Improve
resiliency and scale
Easy to connect and
coordinate
distributed
components and
microservices to
quickly create apps
Build and update
apps quickly
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon EC2 Spot Instances
Low cost Faster results Easy access Resource
flexibility
Spare EC2 capacity that AWS can reclaim with two minutes of notice
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
About NIBR
• The Novartis Institutes for BioMedical Research (NIBR) is the innovation
engine of Novartis
• We collaborate across scientific and organizational boundaries, with a
focus on new technologies that have the potential to help produce
therapeutic breakthroughs for patients
• Six research campuses across the globe
• 200+ projects in clinical pipeline
• 500+ clinical trials in progress
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High-content screening image analysis
• Scientists need a user-friendly means to manage batch image analyses
• HCSIA empowers scientists to directly run image analyses without
depending on cluster experts or custom scripts
• HCSIA provides for faster assay development and execution with
more focus on the science rather than tools
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
plates
wells
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture overview
Update job
status
Check job
status
Amazon
Aurora job
tracker*
Check job
scheduler
HCS images
Job
completion
queue
Image
analytics
results
Notify
scientists
UI and web
services
Profiler workers
HPC cluster
Users
On-premise
Post process and
merge workflow
HCSIA VPC
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Acquire compute resources at scale
CE1
CE2
CE3
CE4
Max vCPUs: 30000
Min vCPUs: 0
Desired vCPUs: 0
Optimal
Max vCPUs: 25984
Min vCPUs: 0
Desired vCPUs: 0
m4.16xlarge
Max vCPUs: 25984
Min vCPUs: 0
Desired vCPUs: 0
r4.8xlarge
Max vCPUs: 26000
Min vCPUs: 0
Desired vCPUs: 0
m4.10xlarge
Spot Fleet
CE1
CE2
CE3
CE4
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Analytics results post process and merge
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Merge only/merge and post process
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture overview – job tracking
Update job
status
Check job
status
Aurora job
tracker*
Check job
scheduler
HCS images
Job
completion
queue
Image
analytics
results
Notify
scientists
UI endpoint
Profiler workers
HPC cluster
Users
On-premise
Post process and
merge workflow
HCSIA VPC
Job completion tracking
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecting against job failures
• Application failures
• Bad data
• Incorrect resource requirements
• Application bugs
• Infrastructure failures
• Disk failures
• Instance failures
• Spot reclamation
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failure handling
Failure
analyzer
Job resubmit
worker
Exception
queue
Task failure
event
Fatal
exception
notification
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Final thoughts – Step Functions
• Be confident with state machines - use a minimal amount of steps
• Use Amazon S3 to persist and iterate over large data sets and pass
object keys
• Extract certain business functionality by state machine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Final thoughts – Error handling
• Understand possible exceptions in each step
• On any given step, determine if the state machine should stop
execution or continue
• Use exponential back-off and retry
• Catch exceptions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Final thoughts – Know the limits
• Ensure sufficient EBS IOPs for Docker hosts
• Iterate through large data sets using for loop or
iterator
• Avoid exceeding the maximum number of history
events in an execution (25000):
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts
Wednesday, Nov. 28
High Performance Computing on AWS: Driving Innovation without Infrastructure Constraints
3:15 p.m.–4:15 p.m. | Aria East, Plaza Level, Orovada 2
Wednesday, Nov. 28
Optimizing Risk Analysis with Grid Computing on AWS
1:00 p.m.–2:00 p.m. | Venetian, Level 4, Lando 4305
Wednesday, Nov. 28
Setting Up Your First HPC Cluster on AWS
11:30 a.m.–12:30 p.m. | Mirage, Grand Ballroom B, Table 5
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gene Ting
geneting@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...
A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...
A Practitioner's Guide to Securing Your Cloud (Like an Expert) (SEC203-R1) - ...
 
Deep dive - AWS Fargate
Deep dive - AWS FargateDeep dive - AWS Fargate
Deep dive - AWS Fargate
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your Firewall
 
Operating at Scale - Preparing for the Journey
Operating at Scale - Preparing for the JourneyOperating at Scale - Preparing for the Journey
Operating at Scale - Preparing for the Journey
 
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
How Amazon WorkSpaces Powers the Hands-On Labs (BAP317) - AWS re:Invent 2018
 
Container Scheduling
Container SchedulingContainer Scheduling
Container Scheduling
 
AWS Storage and Edge Processing
AWS Storage and Edge ProcessingAWS Storage and Edge Processing
AWS Storage and Edge Processing
 
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
Optimize Performance and Reduce Risk Using AWS Support Tools (ENT316-R1) - AW...
 
Building Real-time Serverless Backends with GraphQL
Building Real-time Serverless Backends with GraphQLBuilding Real-time Serverless Backends with GraphQL
Building Real-time Serverless Backends with GraphQL
 
Building Highly Scalable Retail Order Management Systems with Serverless
Building Highly Scalable Retail Order Management Systems with ServerlessBuilding Highly Scalable Retail Order Management Systems with Serverless
Building Highly Scalable Retail Order Management Systems with Serverless
 
Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops Hitchhiker's Guide to Cloud Ops
Hitchhiker's Guide to Cloud Ops
 
Design with Ops in Mind.pdf
Design with Ops in Mind.pdfDesign with Ops in Mind.pdf
Design with Ops in Mind.pdf
 
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
Pause and Resume your EC2 Instances with Hibernate (CMP392) - AWS re:Invent 2018
 
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
 
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
Continuous Integration Best Practices (DEV319-R1) - AWS re:Invent 2018
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
 
Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Inve...
Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Inve...Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Inve...
Another Week, Another Million Containers on Amazon EC2 (CMP376) - AWS re:Inve...
 
Using AWS to Ingest, Store, Archive, Share and carry out Analysis of Video Co...
Using AWS to Ingest, Store, Archive, Share and carry out Analysis of Video Co...Using AWS to Ingest, Store, Archive, Share and carry out Analysis of Video Co...
Using AWS to Ingest, Store, Archive, Share and carry out Analysis of Video Co...
 
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
Optimizing Costs as You Scale on AWS (ENT302) - AWS re:Invent 2018
 
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
Control for Your Cloud Environment Using AWS Management Tools (ENT226-R1) - A...
 

Similar a Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) - AWS re:Invent 2018

Similar a Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) - AWS re:Invent 2018 (20)

Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
Easy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWSEasy and Efficient Batch Computing on AWS
Easy and Efficient Batch Computing on AWS
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
 
Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...
Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...
Mainframe Modernization with AWS: Patterns and Best Practices (GPSTEC305) - A...
 
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
Build Your Own Log Analytics Solutions on AWS (ANT323-R) - AWS re:Invent 2018
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
Monitor the World: Meaningful Metrics for Containerized Apps and Clusters (CO...
 
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
Have Your Front End and Monitor It, Too (ANT303) - AWS re:Invent 2018
 
Work Anywhere with Amazon Workspaces (Level: 200)
Work Anywhere with Amazon Workspaces (Level: 200)Work Anywhere with Amazon Workspaces (Level: 200)
Work Anywhere with Amazon Workspaces (Level: 200)
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
Become a Serverless Black Belt - Optimizing Your Serverless Applications - AW...
 
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
Running Your SQL Server Database on Amazon RDS (DAT329) - AWS re:Invent 2018
 
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...Considerations for Building Your First Streaming Application (ANT359) - AWS r...
Considerations for Building Your First Streaming Application (ANT359) - AWS r...
 
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
Build Your First Big Data Application on AWS (ANT213-R1) - AWS re:Invent 2018
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Move Your Desktops and Apps to AWS with Amazon WorkSpaces and AppStream 2.0 -...
Move Your Desktops and Apps to AWS with Amazon WorkSpaces and AppStream 2.0 -...Move Your Desktops and Apps to AWS with Amazon WorkSpaces and AppStream 2.0 -...
Move Your Desktops and Apps to AWS with Amazon WorkSpaces and AppStream 2.0 -...
 
Workshop: Building Serverless Real-time Data Processing (Now with Unicorns!)
Workshop: Building Serverless Real-time Data Processing (Now with Unicorns!)Workshop: Building Serverless Real-time Data Processing (Now with Unicorns!)
Workshop: Building Serverless Real-time Data Processing (Now with Unicorns!)
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Serverless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best PracticesServerless on AWS: Architectural Patterns and Best Practices
Serverless on AWS: Architectural Patterns and Best Practices
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Nick Whalen Principal Engineer Novartis Institutes for BioMedical Research Gene Ting Solution Architect Amazon Web Services Analyzing Slide Images and Processing Phenotypic Assays at Scale on AWS C M P 3 5 8
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda AWS services walkthrough HCSIA overview Compute environment design Post-processing workflow Architecting for resilience Final thoughts
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Batch Managed No software to install or servers to manage. AWS Batch provisions, manages, and scales your infrastructure Integrated with AWS Natively integrated with the AWS products and services, AWS Batch jobs can easily and securely interact with services such as Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and Amazon Rekognition Cost-optimized resource provisioning AWS Batch automatically provisions compute resources tailored to the needs of your jobs using Amazon Elastic Compute Cloud (Amazon EC2) and Spot Instances
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Batch key concepts • Jobs • Job definition • Job queue • Scheduler • Compute environments
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Step Functions Manages the logic of your application. This removes extra code that may be repeated in your microservices and functions Write less code AWS Step Functions manage state, checkpoints, and restarts to make sure your application executes in order and as expected Improve resiliency and scale Easy to connect and coordinate distributed components and microservices to quickly create apps Build and update apps quickly
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon EC2 Spot Instances Low cost Faster results Easy access Resource flexibility Spare EC2 capacity that AWS can reclaim with two minutes of notice
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. About NIBR • The Novartis Institutes for BioMedical Research (NIBR) is the innovation engine of Novartis • We collaborate across scientific and organizational boundaries, with a focus on new technologies that have the potential to help produce therapeutic breakthroughs for patients • Six research campuses across the globe • 200+ projects in clinical pipeline • 500+ clinical trials in progress
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High-content screening image analysis • Scientists need a user-friendly means to manage batch image analyses • HCSIA empowers scientists to directly run image analyses without depending on cluster experts or custom scripts • HCSIA provides for faster assay development and execution with more focus on the science rather than tools
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. plates wells
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture overview Update job status Check job status Amazon Aurora job tracker* Check job scheduler HCS images Job completion queue Image analytics results Notify scientists UI and web services Profiler workers HPC cluster Users On-premise Post process and merge workflow HCSIA VPC
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Acquire compute resources at scale CE1 CE2 CE3 CE4 Max vCPUs: 30000 Min vCPUs: 0 Desired vCPUs: 0 Optimal Max vCPUs: 25984 Min vCPUs: 0 Desired vCPUs: 0 m4.16xlarge Max vCPUs: 25984 Min vCPUs: 0 Desired vCPUs: 0 r4.8xlarge Max vCPUs: 26000 Min vCPUs: 0 Desired vCPUs: 0 m4.10xlarge Spot Fleet CE1 CE2 CE3 CE4
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Analytics results post process and merge
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Merge only/merge and post process
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecture overview – job tracking Update job status Check job status Aurora job tracker* Check job scheduler HCS images Job completion queue Image analytics results Notify scientists UI endpoint Profiler workers HPC cluster Users On-premise Post process and merge workflow HCSIA VPC Job completion tracking
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Architecting against job failures • Application failures • Bad data • Incorrect resource requirements • Application bugs • Infrastructure failures • Disk failures • Instance failures • Spot reclamation
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failure handling Failure analyzer Job resubmit worker Exception queue Task failure event Fatal exception notification
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Final thoughts – Step Functions • Be confident with state machines - use a minimal amount of steps • Use Amazon S3 to persist and iterate over large data sets and pass object keys • Extract certain business functionality by state machine
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Final thoughts – Error handling • Understand possible exceptions in each step • On any given step, determine if the state machine should stop execution or continue • Use exponential back-off and retry • Catch exceptions
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Final thoughts – Know the limits • Ensure sufficient EBS IOPs for Docker hosts • Iterate through large data sets using for loop or iterator • Avoid exceeding the maximum number of history events in an execution (25000):
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Related breakouts Wednesday, Nov. 28 High Performance Computing on AWS: Driving Innovation without Infrastructure Constraints 3:15 p.m.–4:15 p.m. | Aria East, Plaza Level, Orovada 2 Wednesday, Nov. 28 Optimizing Risk Analysis with Grid Computing on AWS 1:00 p.m.–2:00 p.m. | Venetian, Level 4, Lando 4305 Wednesday, Nov. 28 Setting Up Your First HPC Cluster on AWS 11:30 a.m.–12:30 p.m. | Mirage, Grand Ballroom B, Table 5
  • 24. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Gene Ting geneting@amazon.com
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.