Medical Researchers are constantly looking for ways to be able to conduct more experiments, innovate at a faster rate and derive meaningful research outcomes more quickly. One of the major barriers to achieving this is long processing times due to giant datasets. A combined industry and research partnership, large-scale on-demand compute and the cloud has been key to making inroads to solving this very common challenge.
DiUS and the Walter Eliza Hall Institute of Medical Research (WEHI) have been working on approaches to accelerate the capture, processing and analysis of bioimagery and microscopy data used in the research labs at WEHI. In this talk, Pavi and Lachlan will share a case study starting with a background on microscope development and a synopsis of state-of-the-art microscopy techniques requiring large scale compute. The session will then launch into a discussion of scaling complex image analysis using Fiji, a bio-science image analysis package and dealing with ever-growing bioimaging datasets.
You will learn about the development of tailored high performance compute (HPC) platforms on AWS to enable this kind of research as well as the 'convention-over-configuration' framework developed by DiUS as a repeatable solution. Lower level technical considerations around network integration, efficient data movement and cluster compute approaches using CfnCluster on AWS will also be discussed in detail.
Speakers: Lachlan Whitehead, PhD, BioImage Analyst and Microscopy Walter and Eliza Hall Institute of Medical Research & Pavi De Alwis, Snr.Software Engineer, DiUS
How to Troubleshoot Apps for the Modern Connected Worker
Faster Time to Science - Scaling BioMedical Research in the Cloud with SciOps - Session Sponsored by DiUS
1. Faster Time to Science
Scaling BioMedical Research in the Cloud with SciOps
Pavi De Alwis LachlanWhitehead
@paviOO
pda@dius.com.au
@DrLachie
whitehead@wehi.edu.au
2. Talk Outline
Who?
§ Who are we and what are we doing here?
Why?
§ Microscopy and Image Analysis
How?
§ How we utilised AWS to speed up the science
What?
§ Our solution (WIA framework)
3. DiUS
DiUS is an Australian-based technology
services crew with a DNA that's cloud-
first, human-powered, ‘small-a’ agile, lean
and outcome focused.
We use AWS to help our customers
transform the way they develop and
deliver digital products; to experiment
better, move faster, enter new markets,
compete and win.
4. About Me: Pavi De Alwis
§ Software Engineer at DiUS
§ Many hats across all SDLC activities
§ Experience across different domains, languages and tools
§ Applying Software Engineering and DevOps to Scientific Computing
5. Walter and Eliza Hall Institute
§ Oldest medical research institute in
Australia
§ Discovery, Translation and Education
§ Cancer
§ Inflammation
§ Infection and Immunity
§ Stem Cells
§ Personalised Medicine
§ Etc.
6. About Me: Lachlan Whitehead
§ PhD in Physics from the University of
Melbourne in ARC Centre of Excellence for
Coherent X-ray science
§ BioImage Analyst at the Walter and Eliza Hall
Institute in the imaging laboratory
§ What do I do:
§ Pretty pictures are no longer good enough
§ This is a quickly developing field and heavy on
computation – something most biologists have
no experience in
8. What is a Microscope Image?
From a raw data perspective?
§ XY (+ Intensity)
§ Color Channel
§ Z (depth)
§ Time
Microscope companies keep inventing more:
Position, Plate number, Block, Wavelength etc.
9. Cutting Edge Microscopy
Pros:
§ Very fast
§ Very gentle
§ Very high resolution
Cons:
§ Very fast
§ Very gentle
§ Very high resolution
Chenetal.Science2014
10. The Problem – Data Size
§ Uncompressed 8-bit or 16-bit files
§ A 3 channel, 15 slice image with 200 time points is nearly 30GB
11. The Problem - Compute
§ It can be hard to just move files that size around
§ Generally a whole image loaded into RAM
§ Large numbers of small files similarly problematic
12. Walter and Eliza Hall Institute Art of Science Competition
The Problem - Variety of Experiments
13. What can we analyse?
Object counts – cell death / proliferation
Intensity – Protein / gene expression
Morphology – Size / shape / location of
tumours
Motion – Cell behaviour over time,
speed and direction of migration
Image Analysis
Analysis ‘arms race’
Tools of the trade
Microscope companies also provide their
own (limited) tools for dealing with data.
Many tools are open-source, others are
extremely expensive
Image analysis is embarrassingly parallel
14. Aside - Embarrassingly Parallel
“an embarrassingly parallel workload or problem (also called perfectly
parallel or pleasingly parallel) is one where little or no effort is needed to
separate the problem into a number of parallel tasks”
-Wikipedia
15. The Brief
From:
§ Locking up my desktop and running
analysis for hours at a time
To:
§ Running parallelised analysis on the AWS
cloud
Requirements:
§ Must be simple
§ Must be efficient
§ Must be reusable
16. § Software Engineering and DevOps techniques
§ Pairing with Scientists and Labs
§ Conventions, light-weight frameworks and expose configuration
§ Seamless - dev on desktop, workload on cluster
§ Data from Instrument-to-cloud
§ Ad-hoc custom compute
SciOps [sic]
19. Cluster Setup
cfnCluster
CLI tool to build and
manage HPC clusters
Provide configuration
Press enter and wait a
couple of minutes
Custom spec cluster
§ CloudFormation
§ IAM
§ SNS
§ SQS
§ EC2
§ AutoScaling
§ EBS
§ CloudWatch
§ S3
§ Dynamodb
User
Defined
21. What’s My Parallel Processing Model?
1. Get the directory or file
2. List the files or ‘dimensions’ in a file
3. Run the same analysis across files / dimensions
4. Display steps live on screen interactively
22. What’s My Runtime Model in AWS?
1. Run headless ec2
2. Start Fiji with macro and a configuration file
3. Configuration file contains ‘subset’ to analyse (i.e files or
dimensions)
4. Write results to disk
23. Fiji and AWS
1. Custom AMI with Fiji pre-installed
2. Modify analysis macros to run online
3. Fiji plugins can’t headless
4. Multiple instances of Fiji on EC2 causes all sorts of
problems - RMI
24. Project Lifecycle
§ New project on the LAN/local machine
§ Sync - to AWS
§ Kick-off image processing workloads to HPC cluster
§ Multiple jobs queued per-nodes
§ Sync - from AWS
Choices to make
§ Size of machines
§ Level or parallelisation
§ Time costs / benefits
Considerations
§ Costs of cluster
§ How long cluster will be up
§ Data transfer isn’t instant
25. Cost
S3 (Storage)
~3c per gigabyte per month.
EC2 (Compute)
Scales with machine type.
Machine Name Specs Price/hr
T2.micro 1CPU, 500MB RAM, cloud
storage
2c
M4.2xlarge 8CPU, 32GB RAM, SSD
storage
67.3c
R3.8xlarge 32CPU, 244GB RAM, SSD
storage
$3.192
26. Cost m4.large
+ 4x m4.2xlarge
= $3.03 / hour
Only need 4 - m4.2xlarges for pretty
large image data.
What if we had lots of small images?
t2.large
+ 10x t2.large
= $ 1.93 / hour
Cost scales more or less linearly
with the number of machines.
So does computation time!
Master
Node
m4.large
S3 Storage ‘Bucket’
Compute
Node
m4.2xlarge
Compute
Node
m4.2xlarge
Compute
Node
m4.2xlarge
Compute
Node
m4.2xlarge
28. WIA (Imaging on AWS)
§ Fully documented SciOps framework
§ Contains cli tools:
§ Create new project structure
§ Generate config files
§ Sync data into S3 and back
§ Create AMI with customised Fiji
§ Submit and manage jobs on HPC queues
§ Also contains:
§ cfnCluster config file and instructions
§ Generic Fiji macro launcher
projectName/
|──
bin
|──
input
|──
output
└──
src
29. Conventions
Established by
§ Analytical need
§ Software tools
§ Varies by problem
§ Custom compute frameworks
§ Experiment, build, automate
§ Repeatable templates
§ Short lived clusters
§ HPC on demand
via SciOps
30. The Future
§ Our lab is building and expanding
§ Many labs don’t have access to local cluster
compute
§ Faster development and turnaround from
acquisition to analysis to publication
§ If this was around when I was a PhD I would
have completed sooner
31. Stay in Touch
Pavi De Alwis LachlanWhitehead
@paviOO
pda@dius.com.au
@DrLachie
whitehead@wehi.edu.au
Acknowledgments
AWS: Adrian White
DIUS: Paula Ngov
DIUS: Voon Wong
WEHI: Kelly Rogers
WEHI: Andrew Webb
Find us @ location G1
Right next to the AWS Booth
32. Resources – From DiUS and AWS
Read the case study:
§ Proving High-Performance Cloud Computing Can Support
Disease Prevention
Check out our technical blogs:
§ Scientific image processing in the cloud with Fiji/ImageJ
§ Building an auto-scaling R cluster using CfnCluster
Read the AWS Blog:
§ High Performance Cloud Computing Supports Disease
Prevention