At the technology meeting of the Association of Independent Research Centers (http://airi.org): An overview of recent Scientific Computing activities at Fred Hutch, Seattle
5. • Git(hub): Manage
code and config
• Containers:
Encapsulate and
version software
• Object Storage:
Cheap, resilient,
scalable, like S3
• Cloud APIs:
Secret Sauce, but
works
…. or cloud native computing ?
8. 8
HPC with AWS Batch
Opportunities
Multitenancy - Not yet designed for
many different users launching jobs
in a single AWS account
No accounting
Custom tools needed to be written:
a wrapper to mitigate
accounting issues
Tool to facilitate use of named
pipes for streaming
Store Batch events in database
(otherwise they disappear after
24h)
A dashboard
Successes
Great for scaling jobs that use
docker containers and can make
use of S3
Successful projects :
Multi-step array job (picard,
kallisto, pizzly)
Microbiome pipeline
9. 9
End users don’t have AWS console access, so we built
a custom batch console…
11. 11
Globus & S3
SaaS solution
Tag filtering
Integrated
Workflows
S3 creds need
to be kept
server side
SSO using
Okta
12. 12
HPC – Native, Hybrid and multi cloud
First try the AWS Batch with Containers and Object Store
If that does not work for all go back to traditional HPC workloads
Opportunities
Select workloads based on “mountstats”, low IO uses traditional HPC
Did we select a stack that can work in multiple clouds?
AWS Batch -
Containers & S3
Traditional HPC
in Hybrid Cloud
13.
14. 14
Traditional HPC in Cloud
Opportunities
Perpetuates legacy workflows &
postpones need for change.
Layer 2 VPNs for high IO ?
Slurm 17.11 cluster federation will
increase resource utilization
Spot market and scheduler efficiency
Path to HIPAA readyness
Successes
“sbatch -M beagle” : extremely
simple to “be in the cloud”
VPN with custom Fortinet kernel:
150MB /s nfs to on-prem vs 50MB
Consistent workflows on-prem and
multi-cloud with data & code
access.
Manual cloud bursting
Using slurm power saving api to
automatically shutdown idle nodes
15. 15
2 Projects published on pypi.org
Ongoing support from Fred Hutch staff
In production
Slurm Limiter
Dynamically adjust account limits
Increase responsiveness and util.
Ldap Caching
Fast LDAP / Idmap
Replicates AD, replaces Centrify / SSSD
16. 16
Scientific Pipelines
We have many and too many are homegrown (shell scripts, make, scons)
Lack of cloud compatibility, error checking, etc
Must pipelines be written in a language people know ? If yes: Python
Does it need to be CWL compatible ? http://www.commonwl.org/ says:
Tools to be tested at FredHutch: Luigi/SciLuigi, Airflow, Snakemake
Are tools originated in research outfits sustainable ? Toil, Cromwell ?
17. 17
HPC – a word about GPUs and machine learning
Gaming GPUs have a 10x price performance advantage for tensorflow
https://github.com/tobigithub/tensorflow-deep-learning/wiki/tf-benchmarks
But nvidia does not want you to use them:
https://www.theregister.co.uk/2018/01/03/nvidia_server_gpus/
Will we see homegrown GeForce Racks in lab spaces ??
18. 18
HPC Future – Combining Kubernetes and Slurm ?
Containers on bare medal, virtualization without performance penalty
Run Docker based and HPC workflows on same infrastructure
Risks
Nobody has really done it
Opportunities
Dynamically share infrastructure
Compatibility with Cloud based container
services
No need to use bridge-tech (singularity)
Node 1
Node 2
Node 5
Node 3
Node 4
Node 997
Node 998
Node 999
Docker
Kubernetes
LXC / LXD
Slurm
20. 20
Collaborate on building faster &
reproducible Scientific Software
Why ? run rbench using R-3.3.0:
179 secs R compiled on Ubuntu 14.04
83 secs (54% faster) EasyBuild foss-2016a R
91 secs (49% faster) EasyBuild intel-2016a R
86 secs (52% faster) Microsoft R (yes, on linux)
21. 21
So, how does it work?
The sys admin clones a git repos and builds software :
> eb R-3.4.3-foss-2016b-fh2.eb --robot
The user runs :
> ml R/3.4.3-foss-2016b-fh2
> which R
/app/easybuild/software/R/3.4.3-foss-2016b-fh2/bin/R
> R --version
R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
22. 22
Shouldn’t you be using docker, doh ?
Off course you can do this in your Docker file:
RUN apt-get -y install r-base
RUN Rscript -e "install.packages('yhatr')"
RUN Rscript -e "install.packages('ggplot2')"
RUN Rscript -e "install.packages('plyr')"
RUN Rscript -e "install.packages('reshape2')"
And then publish the docker container for reproducibility
23. 23
what if he want to use a specific
version of R or a ….
specific version of an R package? Compile ? BlackBox?
24. 24
What do others say ?
James Cuff *):
“So what about back to rolling
your own? ……To be clear this is
R on top of python on top of
Tensorflow. It’s a deep stack.”
AI tools: Significantly more
research will be needed to go ….
…..techniques aren’t documented
clearly with example code that
is also fully reproducible.
*) former Assistant Dean and Distinguished Engineer for Research Computing at Harvard
https://www.nextplatform.com/2018/04/09/deep-learning-in-r-documentation-drives-algorithms/
https://www.nextplatform.com/2018/04/18/containing-the-complexity-of-the-long-tail/
25. 25
EB = reducing the chaos
Recipes are Python code (e.g. lists) with strict
versioning for libs and packages for reproducibility
https://github.com/FredHutch/easybuild-life-sciences/tree/master/scripts
use easy_update.py to update R/Python packages to the latest and then freeze
or https://github.com/FredHutch/easybuild-life-sciences/
26. 26
EB @ Fred Hutch
We want to :
Build on multiple OS versions
(Ubuntu 14.04/16.04/18.04)
Use the same process for building
in cloud native and in traditional
environments
Use docker / singularity and well as
/app folder
https://github.com/FredHutch/ls2
We can share :
27. 27
LS2 on Docker
Hub
Giant R container: https://hub.docker.com/r/fredhutch/ls2_r/
Python container: https://hub.docker.com/r/fredhutch/ls2_python
28. 28
EB @ Fred Hutch
Other tools, sometimes complimentary:
See https://fosdem.org/2018/schedule/event/installing_software_for_scientists/
29. 29
DB4Sci – lightweight DBaaS
Motivation
Users requested too many
different database setups
(databases, instances,
versions, performance, etc)
Enterprise setup with iscsi
storage had performance
issues with HPC
Better backup to cloud
needed.
30. 30
DB4Sci – lightweight DBaaS
Architecture
“Difference of opinion” :
can databases run in
containers
Stack: NVMe, ZFS, Docker,
Flask, AD
Install from Github
31. 31
DB4Sci – lightweight DBaaS
Features
Encryption at Rest
MySQL & Postgres
Cloud Backup
AD Integration
HPC bullet proof
In Production
Next: Self service
restore into new
DB
Download: http://db4sci.org
Get involved: https://github.com/FredHutch/DB4SCI
32. 32
GPU databases – performance at a new level
technology runtime (sec) factor slower
MapD & 8 x Nvidia Pascal Titan X 0.109 1
AWS Redshift on 6 x ds2.8xlarge 1.905 17
Google BigQuery 2 18
Postgres & 1 x 4 core, 16GB, ssd 205 1881
http://tech.marksblogg.com
Runs NY Taxi dataset on many technologies
Benchmarks documented fully reproducible
leaders
MapD.com
Brytlyt.com
Continuum Analytics
(GPU data frame)
35. 35
What is Prometheus?
Prometheus is a popular open-source time series database, monitoring and
alerting system from the Cloud Native Computing Foundation (CNCF).
Very rich ecosystem of add-ons, exporters and SDKs.
Projects adopting “Cloud Native” approach have built-in support to expose
metrics to Prometheus.
Uses a “Pull” model by default but a “Push” gateway is available when
required.
Very high performance (v2.0+); a single powerful server can ingest up to
800,000 samples per second.
Multi-dimensional data model with time series data identified by metric name
and key/value pairs.
Flexible query language PromQL
37. 37
Self-Service Metrics Gathering and Dashboards
Prometheus
Active Directory
Grafana
1
GitHub
*.yml
Grafana_Editors
Group
Install exporters
on systems
2
Commit new or
updated config file
3
Incorporate new
targets
4
Collect metrics
from systems
6
Pull metrics
from Prometheus
5
Login and
create dashboard,
panel or alert
srv3
systems
Authentication &
authorization
Authorized Users
- targets:
- srv1.fhcrc.org:9100
- srv2.fhcrc.org:9100
- srv3.fhcrc.org:9182
labels:
app: abc
owner: mydept
View Only Users
7
38. 38
SAS Metering Agenthttps://github.com/FredHutch/sas-metering-client
Single binary with no dependencies
Runs as a Windows service
Every minute checks to see if SAS Desktop is running
POSTs results (0 or 1) to the Prometheus push gateway
Prometheus push gateway is open to the Internet so workstations can report in from
anywhere
Deployed to SAS workstations (135) via SCCM
Light resource footprint:
40. 40
Mirroring 1.4PB of data to S3 with Rsync and ObjectiveFS
Initial mirror took 60 days, but was throttled to prevent overwhelming our Firewall.
Started
Mirroring
First Pass
Mirroring
Complete
S3 Standard to
IA Migration
43. 43
HPC – CPU Cores Per Lab Gizmo + Beagle Clusters Combined
44. 44
EC2 Instance Types Using 32 Different Instance Types!
Transitioning
from C4 to C5
Instances
45. 45
Keeping an eye on shared (abused) interactive nodes
System Out of RAM?Who’s hogging the CPU?
Who’s hogging the RAM?
Interactive Nodes
3 systems
56 CPU cores
376GB RAM
47. 47
Workflow
User prepares data and job configuration
locally
Uses command line “mqsubmit” tool to
submit jobs to the pipeline
When complete, the user receives an
email with a link to download the results
Notes
Proteomics mostly uses Windows platforms
Not so easy to integrate in a Linux shop but
through a Cloud API it’s better.
Custom pipeline, runs windows jobs from familiar
command line interface on Linux
MaxQuant Proteomics Pipeline
$ mqsubmit --mqconfig mqpar.xml --jobname job01 --department scicomp --email me@fhcrc.org
Example Job Submission:
https://github.com/FredHutch/maxquant-pipeline
48. 48
MaxQuant Proteomics Pipeline
“I am again extremely impressed at the speed of your cloud setup! Less than 12 hours to
complete the search, and it would have easily been more than 4 days with our old setup!”
-- MaxQuant Pipeline User
91% Utilization of
128 CPU cores
49. 49
Shiny Web Application Pipelines
Users
2 3
5
86
9
7
Git
GitHub
CircleCI
Test
Docker Hub
Rancher
Container Orchestration
1 4
Shiny App
commit
push trigger
build/test
pull access
?
Good
After initial setup, researchers can update and re-deploy applications themselves.
They only need to contact us if the pipeline breaks.
Bad
Building R packages and dependencies can take a long time to compile; as long
as 40 minutes in some cases.
Solution: use a base image with all or the most of the dependencies cooked in.
50. 50
Statistics
420 Repositories
148 Users
81 Teams
Private
268
Public
152
https://github.com/FredHutch
Git & GitHub skills are essential for developers, IT
Professionals and even Researchers now.
53. 53
Archiving ?
Is there a demand ?
Are users collecting
useful metadata ?
How many PB do
you need to be
concerned ?
1
2
54. Ingredients (also used for storage charge backs)
pwalk - https://github.com/fizwit/filesystem-reporting-tools
a multi-threaded file system metadata collector
slurm-pwalk - https://github.com/FredHutch/slurm-pwalk
parallel pwalk crawler execution and import to PostgreSQL
storage-hotspots - https://github.com/FredHutch/storage-hotspots
a Flask-App that helps finding large folders in the database and
triggers an archiving workflow
DB4SCI - http://db4sci.org (optional)
High performance database platform
OR better, just use
55. 55
Data Storage News – Backup / DR in cloud
ObjectiveFS Backup and DR moved into production in Oct 2017, moved 1PB/month
Posix FS & Linux Scaleout NAS on S3 (think cloud gluster)
Parallel rsync: ls -a1 | tail -n +3 | parallel -u --progress rsync -avzP {} /target
Cautions
Staff needs familiarity with rsync and
monitoring / logging
Avoid more than 500TB / folder
Only 90% in S3 IAS, 10% in S3
Limit your retention period for backups
Opportunities
One solution for Backup / DR
$8 per TB / month at 50% compr.
300+ MB/s per node
Re-using backup / DR copy for fast
“RO” data access.
+ =
57. 57
Systems are presented uniformly via DFS/AutoFS
SMB / CIFS
Namespace
NFS / Posix
Namespace
Does not
work well
with posix
symlinks –
Windows
wonders:
where is
/folder ?
58. 58
We need a samba server to make all symlinks work
59. 59
Well, if we don’t even fully use the NAS … can we go all the way and use OSS cheaply
60. 60
A word about BeeGFS
In Production as Scratch FS since 2014
100% uptime (with some cheating)
Currently ca 400TB capacity
1000 small files/sec (NetApp: 800,
Isilon X: 280, Avere: 300)
Infinitely scalable through distributed
metadata infrastructure
Open source, HA and change logs
Risks
Config defaults to XFS
instead of ZFS, ZFS
less widely tested
No vendor phone
home system
Links
Configuration published
https://github.com/FredHutch/chromium-zfs-beegfs
scratch-dna benchmark in Python, C and GO:
https://github.com/FredHutch/sc-benchmark
A Samba server joined to AD, in production:
https://github.com/FredHutch/sc-howto/
61. 61
StorOne
Local SSD/Disk used by
KVM VM
High performance
Made by storage industry
veterans
NyRiad
Uses GPU to create large
erasure coded pool
Linux Kernel Module creates
one large block device
Build for largest radio telescope
Fun question: Is RAID still possible with 14TB drives
?
And what if you think the answer is no ?
62. 62
Thank you !
Dirk Petersen petersen at fredhutch.org
Robert McDermott rmcdermo at fredhutch.org