Scientific Computing @ Fred Hutch

Scientific
Computing at
Fred Hutch
AIRI IT 2018
Slides: Updated April 30th, 2018

2
About Fred Hutch
 Cancer & HIV Research
 3200 Staff in Seattle
 240 Faculty
 $500M Budget (71%
Grants/Contracts)
 5 Scientific Divisions
 1.5M Sqft buildings

3
Compute Infrastructure / HPC

4
HPC 2016/2017: the need for cloud bursting ?

• Git(hub): Manage
code and config
• Containers:
Encapsulate and
version software
• Object Storage:
Cheap, resilient,
scalable, like S3
• Cloud APIs:
Secret Sauce, but
works
…. or cloud native computing ?

AWS Batch, container based computing from Github

Sample 1 -> Genome assembly 1
…
Step 1: De novo genome assembly
Step 2:
Deduplicate
genome
assemblies
Sample 1 -> Gene abundances 1
…
Step 3: Quantify microbial genes
Database
AWS Batch Task
NCBI SRA Database
AWS S3 – FASTQ Cache
Extract FASTQ Assemble Genome
AWS S3 – Genome Storage
Pool Genomes
AWS S3 – Database StorageQuantify Genes
AWS S3 – Final Results
Identify microbial Genes
Identify microbial (bacterial,
viral, etc.) genes which are
expressed in the gut of
people with inflammatory
bowel disease

8
HPC with AWS Batch
Opportunities
 Multitenancy - Not yet designed for
many different users launching jobs
in a single AWS account
 No accounting
 Custom tools needed to be written:
 a wrapper to mitigate
accounting issues
 Tool to facilitate use of named
pipes for streaming
 Store Batch events in database
(otherwise they disappear after
24h)
 A dashboard
Successes
 Great for scaling jobs that use
docker containers and can make
use of S3
 Successful projects :
 Multi-step array job (picard,
kallisto, pizzly)
 Microbiome pipeline

9
End users don’t have AWS console access, so we built
a custom batch console…

Azure Batch
And doAzureParallel:
Create an R
supercomputer on
your Laptop
https://github.com/Azure/doAzureParallel

11
Globus & S3
 SaaS solution
 Tag filtering
 Integrated
Workflows
 S3 creds need
to be kept
server side 
 SSO using
Okta

12
HPC – Native, Hybrid and multi cloud
 First try the AWS Batch with Containers and Object Store
 If that does not work for all go back to traditional HPC workloads
Opportunities
 Select workloads based on “mountstats”, low IO uses traditional HPC
 Did we select a stack that can work in multiple clouds?
AWS Batch -
Containers & S3
Traditional HPC
in Hybrid Cloud

14
Traditional HPC in Cloud
Opportunities
 Perpetuates legacy workflows &
postpones need for change.
 Layer 2 VPNs for high IO ?
 Slurm 17.11 cluster federation will
increase resource utilization
 Spot market and scheduler efficiency
 Path to HIPAA readyness
Successes
 “sbatch -M beagle” : extremely
simple to “be in the cloud”
 VPN with custom Fortinet kernel:
150MB /s nfs to on-prem vs 50MB
 Consistent workflows on-prem and
multi-cloud with data & code
access.
 Manual cloud bursting
 Using slurm power saving api to
automatically shutdown idle nodes

15
2 Projects published on pypi.org
 Ongoing support from Fred Hutch staff
 In production
Slurm Limiter
 Dynamically adjust account limits
 Increase responsiveness and util.
Ldap Caching
 Fast LDAP / Idmap
 Replicates AD, replaces Centrify / SSSD

16
Scientific Pipelines
 We have many and too many are homegrown (shell scripts, make, scons)
 Lack of cloud compatibility, error checking, etc
 Must pipelines be written in a language people know ? If yes: Python
 Does it need to be CWL compatible ? http://www.commonwl.org/ says:
 Tools to be tested at FredHutch: Luigi/SciLuigi, Airflow, Snakemake
 Are tools originated in research outfits sustainable ? Toil, Cromwell ?

17
HPC – a word about GPUs and machine learning
 Gaming GPUs have a 10x price performance advantage for tensorflow
https://github.com/tobigithub/tensorflow-deep-learning/wiki/tf-benchmarks
 But nvidia does not want you to use them:
https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/
 Will we see homegrown GeForce Racks in lab spaces ??

18
HPC Future – Combining Kubernetes and Slurm ?
 Containers on bare medal, virtualization without performance penalty
 Run Docker based and HPC workflows on same infrastructure
Risks
 Nobody has really done it
Opportunities
 Dynamically share infrastructure
 Compatibility with Cloud based container
services
 No need to use bridge-tech (singularity)
Node 1
Node 2
Node 5
Node 3
Node 4
Node 997
Node 998
Node 999
Docker
Kubernetes
LXC / LXD
Slurm

20
Collaborate on building faster &
reproducible Scientific Software
Why ? run rbench using R-3.3.0:
 179 secs R compiled on Ubuntu 14.04
 83 secs (54% faster) EasyBuild foss-2016a R
 91 secs (49% faster) EasyBuild intel-2016a R
 86 secs (52% faster) Microsoft R (yes, on linux)

21
So, how does it work?
The sys admin clones a git repos and builds software :
> eb R-3.4.3-foss-2016b-fh2.eb --robot
The user runs :
> ml R/3.4.3-foss-2016b-fh2
> which R
/app/easybuild/software/R/3.4.3-foss-2016b-fh2/bin/R
> R --version
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"

22
Shouldn’t you be using docker, doh ?
Off course you can do this in your Docker file:
RUN apt-get -y install r-base
RUN Rscript -e "install.packages('yhatr')"
RUN Rscript -e "install.packages('ggplot2')"
RUN Rscript -e "install.packages('plyr')"
RUN Rscript -e "install.packages('reshape2')"
And then publish the docker container for reproducibility

23
what if he want to use a specific
version of R or a ….
specific version of an R package? Compile ? BlackBox?

24
What do others say ?
James Cuff *):
 “So what about back to rolling
your own? ……To be clear this is
R on top of python on top of
Tensorflow. It’s a deep stack.”
 AI tools: Significantly more
research will be needed to go ….
…..techniques aren’t documented
clearly with example code that
is also fully reproducible.
*) former Assistant Dean and Distinguished Engineer for Research Computing at Harvard
https://www.nextplatform.com/2018/04/09/deep-learning-in-r-documentation-drives-algorithms/
https://www.nextplatform.com/2018/04/18/containing-the-complexity-of-the-long-tail/

25
EB = reducing the chaos
Recipes are Python code (e.g. lists) with strict
versioning for libs and packages for reproducibility
https://github.com/FredHutch/easybuild-life-sciences/tree/master/scripts
use easy_update.py to update R/Python packages to the latest and then freeze
or https://github.com/FredHutch/easybuild-life-sciences/

26
EB @ Fred Hutch
We want to :
 Build on multiple OS versions
(Ubuntu 14.04/16.04/18.04)
 Use the same process for building
in cloud native and in traditional
environments
 Use docker / singularity and well as
/app folder
https://github.com/FredHutch/ls2
We can share :

27
LS2 on Docker
Hub
Giant R container: https://hub.docker.com/r/fredhutch/ls2_r/
Python container: https://hub.docker.com/r/fredhutch/ls2_python

28
EB @ Fred Hutch
Other tools, sometimes complimentary:
See https://fosdem.org/2018/schedule/event/installing_software_for_scientists/

29
DB4Sci – lightweight DBaaS
Motivation
 Users requested too many
different database setups
(databases, instances,
versions, performance, etc)
 Enterprise setup with iscsi
storage had performance
issues with HPC
 Better backup to cloud
needed.

30
Architecture
 “Difference of opinion” :
can databases run in
containers
 Stack: NVMe, ZFS, Docker,
Flask, AD
 Install from Github

31
Features
 Encryption at Rest
MySQL & Postgres
 Cloud Backup
 AD Integration
 HPC bullet proof
 In Production
 Next: Self service
restore into new
DB
Download: http://db4sci.org
Get involved: https://github.com/FredHutch/DB4SCI

32
GPU databases – performance at a new level
technology runtime (sec) factor slower
MapD & 8 x Nvidia Pascal Titan X 0.109 1
AWS Redshift on 6 x ds2.8xlarge 1.905 17
Google BigQuery 2 18
Postgres & 1 x 4 core, 16GB, ssd 205 1881
http://tech.marksblogg.com
 Runs NY Taxi dataset on many technologies
 Benchmarks documented fully reproducible
leaders
 MapD.com
 Brytlyt.com
 Continuum Analytics
(GPU data frame)

33
Metrics and Monitoring with
Prometheus and Grafana

34
Current Metrics & Monitoring Solutions
TelemetryTOC Nodewatch
Custom In-House Tools
Prometheus + Grafana

35
What is Prometheus?
 Prometheus is a popular open-source time series database, monitoring and
alerting system from the Cloud Native Computing Foundation (CNCF).
 Very rich ecosystem of add-ons, exporters and SDKs.
 Projects adopting “Cloud Native” approach have built-in support to expose
metrics to Prometheus.
 Uses a “Pull” model by default but a “Push” gateway is available when
required.
 Very high performance (v2.0+); a single powerful server can ingest up to
800,000 samples per second.
 Multi-dimensional data model with time series data identified by metric name
and key/value pairs.
 Flexible query language PromQL

36
Prometheus
Alert ManagerBlackBox Exporter
PushGateway
Grafana
SNMP
Exporter
post
HTTP, HTTPS,
DNS, TCP, and
ICMP
Dashboards
Network and
Storage Devices
pull pull
Workstation agents
Job results, custom
Node_exporter
Wmi_exporter
pull
pull
pull
probe
External checks
scripts
Custom
Metrics
Alert
write
files
post
PromQL
Prometheus Solution Architecture

37
Self-Service Metrics Gathering and Dashboards
Prometheus
Active Directory
Grafana
1
GitHub
*.yml
Grafana_Editors
Group
Install exporters
on systems
2
Commit new or
updated config file
3
Incorporate new
targets
4
Collect metrics
from systems
6
Pull metrics
from Prometheus
5
Login and
create dashboard,
panel or alert
srv3
systems
Authentication &
authorization
Authorized Users
- targets:
- srv1.fhcrc.org:9100
labels:
app: abc
owner: mydept
View Only Users
7

38
SAS Metering Agenthttps://github.com/FredHutch/sas-metering-client
 Single binary with no dependencies
 Runs as a Windows service
 Every minute checks to see if SAS Desktop is running
 POSTs results (0 or 1) to the Prometheus push gateway
 Prometheus push gateway is open to the Internet so workstations can report in from
anywhere
 Deployed to SAS workstations (135) via SCCM
 Light resource footprint:

39
SAS Desktop Metering Dashboard

40
Mirroring 1.4PB of data to S3 with Rsync and ObjectiveFS
Initial mirror took 60 days, but was throttled to prevent overwhelming our Firewall.
Started
Mirroring
First Pass
Mirroring
Complete
S3 Standard to
IA Migration

41
On-Prem vs Cloud HPC Metrics Core and Node Utilization

42
HPC Service SLA We don’t always meet our CPU core SLA

43
HPC – CPU Cores Per Lab Gizmo + Beagle Clusters Combined

44
EC2 Instance Types Using 32 Different Instance Types!
Transitioning
from C4 to C5
Instances

45
Keeping an eye on shared (abused) interactive nodes
System Out of RAM?Who’s hogging the CPU?
Who’s hogging the RAM?
Interactive Nodes
 3 systems
 56 CPU cores
 376GB RAM

46
Automated Pipelines
Start Stop
F(x)
F(y)
 Scientific Workflows
 Application Deployment

47
Workflow
 User prepares data and job configuration
locally
 Uses command line “mqsubmit” tool to
submit jobs to the pipeline
 When complete, the user receives an
email with a link to download the results
Notes
 Proteomics mostly uses Windows platforms
 Not so easy to integrate in a Linux shop but
through a Cloud API it’s better.
 Custom pipeline, runs windows jobs from familiar
command line interface on Linux
MaxQuant Proteomics Pipeline
$ mqsubmit --mqconfig mqpar.xml --jobname job01 --department scicomp --email me@fhcrc.org
Example Job Submission:
https://github.com/FredHutch/maxquant-pipeline

48
MaxQuant Proteomics Pipeline
“I am again extremely impressed at the speed of your cloud setup! Less than 12 hours to
complete the search, and it would have easily been more than 4 days with our old setup!”
-- MaxQuant Pipeline User
91% Utilization of
128 CPU cores

49
Shiny Web Application Pipelines
Users
2 3
5
86
9
7
Git
GitHub
CircleCI
Test
Docker Hub
Rancher
Container Orchestration
1 4
Shiny App
commit
push trigger
build/test
pull access
?
Good
 After initial setup, researchers can update and re-deploy applications themselves.
 They only need to contact us if the pipeline breaks.
Bad
 Building R packages and dependencies can take a long time to compile; as long
as 40 minutes in some cases.
 Solution: use a base image with all or the most of the dependencies cooked in.

50
Statistics
 420 Repositories
 148 Users
 81 Teams
Private
268
Public
152
https://github.com/FredHutch
Git & GitHub skills are essential for developers, IT
Professionals and even Researchers now.

53
Archiving ?
 Is there a demand ?
 Are users collecting
useful metadata ?
 How many PB do
you need to be
concerned ?
1
2

Ingredients (also used for storage charge backs)
 pwalk - https://github.com/fizwit/filesystem-reporting-tools
a multi-threaded file system metadata collector
 slurm-pwalk - https://github.com/FredHutch/slurm-pwalk
parallel pwalk crawler execution and import to PostgreSQL
 storage-hotspots - https://github.com/FredHutch/storage-hotspots
a Flask-App that helps finding large folders in the database and
triggers an archiving workflow
 DB4SCI - http://db4sci.org (optional)
High performance database platform
OR better, just use

55
Data Storage News – Backup / DR in cloud
 ObjectiveFS Backup and DR moved into production in Oct 2017, moved 1PB/month
 Posix FS & Linux Scaleout NAS on S3 (think cloud gluster)
 Parallel rsync: ls -a1 | tail -n +3 | parallel -u --progress rsync -avzP {} /target
Cautions
 Staff needs familiarity with rsync and
monitoring / logging
 Avoid more than 500TB / folder
 Only 90% in S3 IAS, 10% in S3
 Limit your retention period for backups
Opportunities
 One solution for Backup / DR
 $8 per TB / month at 50% compr.
 300+ MB/s per node
 Re-using backup / DR copy for fast
“RO” data access.
+ =

56
Data Storage Landscape at Fred Hutch today

57
Systems are presented uniformly via DFS/AutoFS
SMB / CIFS
Namespace
NFS / Posix
Namespace
Does not
work well
with posix
symlinks –
Windows
wonders:
where is
/folder ?

58
We need a samba server to make all symlinks work

59
Well, if we don’t even fully use the NAS … can we go all the way and use OSS cheaply

60
A word about BeeGFS
 In Production as Scratch FS since 2014
 100% uptime (with some cheating)
 Currently ca 400TB capacity
 1000 small files/sec (NetApp: 800,
Isilon X: 280, Avere: 300)
 Infinitely scalable through distributed
metadata infrastructure
 Open source, HA and change logs
Risks
 Config defaults to XFS
instead of ZFS, ZFS
less widely tested
 No vendor phone
home system
Links
 Configuration published
https://github.com/FredHutch/chromium-zfs-beegfs
 scratch-dna benchmark in Python, C and GO:
https://github.com/FredHutch/sc-benchmark
 A Samba server joined to AD, in production:
https://github.com/FredHutch/sc-howto/

61
StorOne
 Local SSD/Disk used by
KVM VM
 High performance
 Made by storage industry
veterans
NyRiad
 Uses GPU to create large
erasure coded pool
 Linux Kernel Module creates
one large block device
 Build for largest radio telescope
Fun question: Is RAID still possible with 14TB drives
?
 And what if you think the answer is no ?

62
Thank you !
Dirk Petersen petersen at fredhutch.org
Robert McDermott rmcdermo at fredhutch.org

Scientific Computing @ Fred Hutch

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Scientific Computing @ Fred Hutch

Similar a Scientific Computing @ Fred Hutch (20)

Último

Último (20)

Scientific Computing @ Fred Hutch