SlideShare a Scribd company logo
1 of 39
Cloud Transformation 2.0
Embracing the Multi-Cloud Future
March 12, 2019: Bio-IT World West
Chris Dwan (chris@dwan.org)
https://dwan.org
Conclusions
In 20123456789 , we’re all “cool with the cloud”
Premature optimization is still terrible
Make it work, make it fast, make it cheap
Experimentation and engineering are very
different practices
Great policy makes great systems
This continues to be an amazing time to be an
infrastructure / data nerd in health care / life
science
Geek Cred: My First Petabyte, 2008My first Petabyte: NASA, 2008
Geek Cred: My First Petabyte, 2008My first Petabyte: NASA, 2008
NIH circa 2008
The evolution of data transfer …
Genomic Data Production in ContextData Explosion
I did research computing at
Broad from 2014 - 2017
Geek Cred: My First Petabyte, 2008My first Exabyte: 2014
Note that this exabyte is
empty. Broad’s data is
nowhere near Exascale
Cloud Definitions
Public cloud: AWS, Azure, GCS, plus a bunch of wannabes
Private cloud: Cloud services on gear you own, which may
be hosted at a nice data center somewhere
Fog computing: On premises equipment used for cloud
stuff. It’s fog because that’s a cloud that’s close to earth.
Get it?
Hybrid cloud: Bursting to a public cloud for extra capacity.
Multi cloud: Azure for business, AWS for burst / scalability,
Google for that one weird trick.
Enterprise cloud: IT trying desperately to align with a cloud
strategy by changing the labels on the Powerpoint.
“On premises,” or “legacy,” carrot
cake still has a place, even in homes
with a cake-as-a-service strategy.
Hype-o-meter Impact-o-meter
The Cloud Is a Big Place
Global IaaS Providers
Comparing the Big Three
Uncontested heavyweight
champion in terms of scale
maturity of services and adoption.
Services based on the market.
Default offerings may not be a good
fit for odd-shaped research
computing problems.
Market dominance means little
incentive to provide discounts or
customization.
Comparing the Big Three
Focused on value-add platforms.
Enthusiastic partner and sponsor in
areas of interest to $GOOG
Potential conflicts of interest in areas
of interest to $GOOG
Like something out of Greek
mythology, consumes ecosystem
partners whole.
Uncontested heavyweight
champion in terms of scale
maturity of services and adoption.
Services based on the market.
Default offerings may not be a good
fit for odd-shaped research
computing problems.
Market dominance means little
incentive to provide discounts or
customization.
Comparing the Big Three
Your CIO already has a regular meeting
with the Microsoft enterprise sales rep.
Microsoft is already a qualified vendor
in your purchasing systems.
Decades of experience with regulatory
compliance and governance
Already provides your identity,
authorization, and (probably) office
productivity.
Strategic purchases in HPC / ML / AI
Uncontested heavyweight
champion in terms of scale
maturity of services and adoption.
Services based on the market.
Default offerings may not be a good
fit for odd-shaped research
computing problems.
Market dominance means little
incentive to provide discounts or
customization.
Focused on value-add platforms.
Enthusiastic partner and sponsor in
areas of interest to $GOOG
Potential conflicts of interest in areas
of interest to $GOOG
Like something out of Greek
mythology, consumes ecosystem
partners whole.
Your CIO already has a regular meeting
with the Microsoft enterprise sales rep.
Microsoft is already a qualified vendor
in your purchasing systems.
Decades of experience with regulatory
compliance and governance
Already provides your identity,
authorization, and (probably) office
productivity.
Strategic purchases in HPC / ML / AI
Uncontested heavyweight
champion in terms of scale
maturity of services and adoption.
Services based on the market.
Default offerings may not be a good
fit for odd-shaped research
computing problems.
Market dominance means little
incentive to provide discounts or
customization.
Focused on value-add platforms.
Enthusiastic partner and sponsor in
areas of interest to $GOOG
Potential conflicts of interest in areas
of interest to $GOOG
Like something out of Greek
mythology, consumes ecosystem
partners whole.
Comparing the Big Three
Specific Advice on The Big Three
Public cloud is an agility play, not a cost play.
AWS, GCS, and Azure have very similar capabilities and
pricing, even at scale.
Pick one and get good at it.
Don’t be afraid of running experiments.
Avoid 2nd tier cloud providers unless there is an
unambiguous business or capability reason to use
them.
Track spending, even when it’s “free.”
$$ !!
The Cloud Is a Big Place
Global IaaS ProvidersDomain Specific PaaS
The Cloud Is a Big Place
Global IaaS ProvidersDomain Specific PaaS
Your CIO is not thinking of
HPC or research computing
when articulating their
cloud strategy.
The Cloud Is a Big Place
Global IaaS Providers
Analytics Framework
Domain Specific PaaS
Analysis platforms deserve
their own slide deck.
RestaurantDeliveryTake and BakeHomemade
Metaphor: Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages
On-Premises
(legacy!)
Infrastructure as
a Service (IaaS)
Platform as a
Service (PaaS)
Software as a
Service (SaaS)
Credit: Everybody on the Internet.
RestaurantDeliveryTake and BakeHomemade
Metaphor: Pizza as a Service
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
Cheese
Tomato Sauce
Pizza Dough
Fire
Oven
Electricity / Gas
Drinks
Table
You Manage Vendor Manages
On-Premises
(legacy!)
Infrastructure as
a Service (IaaS)
Platform as a
Service (PaaS)
Software as a
Service (SaaS)
Credit: Everybody on the Internet.
The Cloud Is a Big Place
Broad Firecloud
Data PlatformGlobal IaaS Providers
Analytics Framework
Domain Specific PaaS
Data platforms are where
it’s at right now.
One common thread: “Why Not Do Both?”
UC Health System Data Warehouse
• Shared data warehouse
• AND local instances at hospitals
NIH:
• World class dedicated HPC / networks
• AND negotiated discounts with public
cloud providers
GenePattern Networks:
• Free autoscaling environment on AWS
• AND support workstation / local HPC
The Policies you Need
Appropriate usage
Human readable: Expectations of privacy and standards of
behavior.
Data Classification
Governance: Defines the major categories of data (corporate
sensitive, clinical, …) and standards for handling of each.
Written Information Security Policy (WISP)
Technical: Defines how systems will be configured to protect
sensitive data and operations.
Vendor Qualification
Business SOP to establish practices around vendor access and
management. Real world policy impact: Because bicycle
lanes are “traffic lanes,” the argument
about snow plowing is simple, which saves
lives.
Practical advice on Cloud Systems
Make it work
– Use dedicated instances (full price) until you’re sure the software works
– Don’t overthink it: Increase RAM and local disk to overcome crashing
– Tear down /rebuild the entire infrastructure from time to time, even in dev.
– All systems (yes, even cloud systems) have limits. Stop whining and learn them.
– Any time you increase throughput by an order of magnitude, your system will break.
Then make it fast
– Profiling tools are your friend, automation is not.
– Benchmark on real data. Imputed and synthetic data just echo your own assumptions back to you.
Then make it cheap
– Now you get to turn on spot instances.
– This is the first time I ever want to hear about Glacier or Infrequently Accessed tiers of data
Practice does not make perfect.
Practice makes permanent.
Attributed to Yo Yo Ma
Engineering is different than experimentation
Application Repo
Production
Infrastructure Repo
Build Test
• Development can rely on production
• Production cannot rely on development
• Reference datasets are a prod resource.
• No manual intervention in either test or prod.
Many Experiments, Few Projects
INBOX Active INBOX INBOX
Feasibility Development Operations
Active Active
No ability to predict turnaround times.
Many Experiments, Few Projects
INBOX Active INBOX INBOX
Feasibility Development Operations
Active Active
“When there is too much to do, there is a strong tendency to engage in local reprioritization, meaning that
each person in the process looks at the pile she is facing, determines which items are the most important, and
then works on those tasks first
local reprioritization creates variability. If a task happens to be prioritized by everyone, it gets done quickly.
But, that means another task has been moved to the bottom of several “to do” lists and it might take weeks or
months to get done.”
No ability to predict turnaround times.
FAIR Data (within the enterprise)
Findable
• NoSQL database of metadata and checksums
• It’s plenty for a good long time.
Accessible
• Federated identity management
• Architecture of S3 buckets and production “roles”
Interoperable
• Data standards, ontologies, strong policy framework,
including electronic consents for human subjects data
Reusable
• ”It’s much easier to go FAR than to go FAIR”
Catered
Lunch
Sense of well-being and
contentment arising from
realistic expectations
Data Lake
Open Bar
Incredible opportunities
here, and rapidly
developing data silos
The Clinical Data Ecosystem
There is an incredible wealth of
data available to support both
clinical care and research
Unfortunately, it is carved up
and isolated in technical and
social silos.
There are both good and bad
reasons for this segmentation,
and it is holding us back.
Patient Journals
Consumer products
Longitudinal Data from
other providers …
Electronic
Medical Records
Possibility of a self-normal
(N of 1) over time
Diagnostic
Imaging
Natural language processing
has strong potentialClinical Notes
Innovations in the basics of
clinical observation
Hospital Telemetry
Pressure to avoid incidental
findings prevent bias
Primary Lab Data
A Personal Story
I use a commercial service that combines
labwork with wearable data
They provide insights and coaching
I have, personally, found this
transformational in how I approach my
health.
A Personal Story
I use a commercial service that combines
labwork with wearable data
They provide insights and coaching
I have, personally, found this
transformational in how I approach my
health.
A Personal Story
I use a commercial service that combines
labwork with wearable data
They provide insights and coaching
I have, personally, found this
transformational in how I approach my
health.
A Personal Story
I use a commercial service that combines
labwork with wearable data
They provide insights and coaching
I have, personally, found this
transformational in how I approach my
health.
A Personal Story
I use a commercial service that combines
labwork with wearable data
They provide insights and coaching
I have, personally, found this
transformational in how I approach my
health.
Conclusions
In 2012345678 2019 , we’re all “cool with the
cloud”
Premature optimization is still terrible
Make it work, make it fast, make it cheap
Strong distinction between experimentation and
engineering
Great policy makes great platforms
This continues to be an amazing time to be an
infrastructure / data nerd in health care / life
science
The future is already here – it’s just
not very well distributed
William Gibson

More Related Content

What's hot

Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Josh Patterson
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudChris Dagdigian
 
2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZChris Dagdigian
 
IBM Big Data References
IBM Big Data ReferencesIBM Big Data References
IBM Big Data ReferencesRob Thomas
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogiesmark madsen
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamGreg Goltsov
 
Random Decision Forests at Scale
Random Decision Forests at ScaleRandom Decision Forests at Scale
Random Decision Forests at ScaleCloudera, Inc.
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...CloudxLab
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeIBM Analytics
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the CloudRob Thomas
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataHaluan Irsad
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionmark madsen
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingMinhazul Arefin
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Sciencesarith divakar
 

What's hot (20)

Big Data: an introduction
Big Data: an introductionBig Data: an introduction
Big Data: an introduction
 
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
Chattanooga Hadoop Meetup - Hadoop 101 - November 2014
 
Demystifying ML & AI
Demystifying ML & AIDemystifying ML & AI
Demystifying ML & AI
 
Mapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the CloudMapping Life Science Informatics to the Cloud
Mapping Life Science Informatics to the Cloud
 
2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ2015 CDC Workshop on ScienceDMZ
2015 CDC Workshop on ScienceDMZ
 
IBM Big Data References
IBM Big Data ReferencesIBM Big Data References
IBM Big Data References
 
Big Data and Bad Analogies
Big Data and Bad AnalogiesBig Data and Bad Analogies
Big Data and Bad Analogies
 
Full-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data TeamFull-Stack Data Science: How to be a One-person Data Team
Full-Stack Data Science: How to be a One-person Data Team
 
Random Decision Forests at Scale
Random Decision Forests at ScaleRandom Decision Forests at Scale
Random Decision Forests at Scale
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
 
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your lifeTop 10 ways BigInsights BigIntegrate and BigQuality will improve your life
Top 10 ways BigInsights BigIntegrate and BigQuality will improve your life
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
Briefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collectionBriefing room: An alternative for streaming data collection
Briefing room: An alternative for streaming data collection
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 

Similar to 2019 BioIt World - Post cloud legacy edition

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalVMware Tanzu Korea
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017Jeremy Maranitch
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Amazon Web Services
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceSense Corp
 
The Storage Side of Private Clouds
The Storage Side of Private CloudsThe Storage Side of Private Clouds
The Storage Side of Private CloudsDataCore Software
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationCloudera, Inc.
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Joshua Bloom
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureObjectRocket
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorialrustd
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 

Similar to 2019 BioIt World - Post cloud legacy edition (20)

Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017GigaOm-sector-roadmap-cloud-analytic-databases-2017
GigaOm-sector-roadmap-cloud-analytic-databases-2017
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
Real-world Cloud HPC at Scale, for Production Workloads (BDT212) | AWS re:Inv...
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
The Storage Side of Private Clouds
The Storage Side of Private CloudsThe Storage Side of Private Clouds
The Storage Side of Private Clouds
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)Industrial Machine Learning (at GE)
Industrial Machine Learning (at GE)
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Big Data on Azure Tutorial
Big Data on Azure TutorialBig Data on Azure Tutorial
Big Data on Azure Tutorial
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 

More from Chris Dwan

Somerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfSomerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfChris Dwan
 
2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdfChris Dwan
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit AllChris Dwan
 
Somerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetSomerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetChris Dwan
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionChris Dwan
 
#Defund thepolice
#Defund thepolice#Defund thepolice
#Defund thepoliceChris Dwan
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user trainingChris Dwan
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciencesChris Dwan
 
Somerville ufc memo tree hearing
Somerville ufc memo   tree hearingSomerville ufc memo   tree hearing
Somerville ufc memo tree hearingChris Dwan
 
2011 career-fair
2011 career-fair2011 career-fair
2011 career-fairChris Dwan
 
Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Chris Dwan
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"Chris Dwan
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPCChris Dwan
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Proposed tree protection ordinance
Proposed tree protection ordinanceProposed tree protection ordinance
Proposed tree protection ordinanceChris Dwan
 
Tree Ordinance Change Matrix
Tree Ordinance Change MatrixTree Ordinance Change Matrix
Tree Ordinance Change MatrixChris Dwan
 
Tree protection overhaul
Tree protection overhaulTree protection overhaul
Tree protection overhaulChris Dwan
 
Response from newport
Response from newportResponse from newport
Response from newportChris Dwan
 
Sacramento underpass bid_docs
Sacramento underpass bid_docsSacramento underpass bid_docs
Sacramento underpass bid_docsChris Dwan
 
Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12Chris Dwan
 

More from Chris Dwan (20)

Somerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdfSomerville Police Staffing Final Report.pdf
Somerville Police Staffing Final Report.pdf
 
2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf2023 Ward 2 community meeting.pdf
2023 Ward 2 community meeting.pdf
 
One Size Does Not Fit All
One Size Does Not Fit AllOne Size Does Not Fit All
One Size Does Not Fit All
 
Somerville FY23 Proposed Budget
Somerville FY23 Proposed BudgetSomerville FY23 Proposed Budget
Somerville FY23 Proposed Budget
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
#Defund thepolice
#Defund thepolice#Defund thepolice
#Defund thepolice
 
2009 cluster user training
2009 cluster user training2009 cluster user training
2009 cluster user training
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Somerville ufc memo tree hearing
Somerville ufc memo   tree hearingSomerville ufc memo   tree hearing
Somerville ufc memo tree hearing
 
2011 career-fair
2011 career-fair2011 career-fair
2011 career-fair
 
Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)Advocacy in the Enterprise (what works, what doesn't)
Advocacy in the Enterprise (what works, what doesn't)
 
"The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You""The Cutting Edge Can Hurt You"
"The Cutting Edge Can Hurt You"
 
Introduction to HPC
Introduction to HPCIntroduction to HPC
Introduction to HPC
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Proposed tree protection ordinance
Proposed tree protection ordinanceProposed tree protection ordinance
Proposed tree protection ordinance
 
Tree Ordinance Change Matrix
Tree Ordinance Change MatrixTree Ordinance Change Matrix
Tree Ordinance Change Matrix
 
Tree protection overhaul
Tree protection overhaulTree protection overhaul
Tree protection overhaul
 
Response from newport
Response from newportResponse from newport
Response from newport
 
Sacramento underpass bid_docs
Sacramento underpass bid_docsSacramento underpass bid_docs
Sacramento underpass bid_docs
 
Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12Somerville tree stat 2019 02 12
Somerville tree stat 2019 02 12
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

2019 BioIt World - Post cloud legacy edition

  • 1. Cloud Transformation 2.0 Embracing the Multi-Cloud Future March 12, 2019: Bio-IT World West Chris Dwan (chris@dwan.org) https://dwan.org
  • 2. Conclusions In 20123456789 , we’re all “cool with the cloud” Premature optimization is still terrible Make it work, make it fast, make it cheap Experimentation and engineering are very different practices Great policy makes great systems This continues to be an amazing time to be an infrastructure / data nerd in health care / life science
  • 3.
  • 4.
  • 5.
  • 6. Geek Cred: My First Petabyte, 2008My first Petabyte: NASA, 2008
  • 7. Geek Cred: My First Petabyte, 2008My first Petabyte: NASA, 2008
  • 9. The evolution of data transfer …
  • 10. Genomic Data Production in ContextData Explosion I did research computing at Broad from 2014 - 2017
  • 11. Geek Cred: My First Petabyte, 2008My first Exabyte: 2014 Note that this exabyte is empty. Broad’s data is nowhere near Exascale
  • 12. Cloud Definitions Public cloud: AWS, Azure, GCS, plus a bunch of wannabes Private cloud: Cloud services on gear you own, which may be hosted at a nice data center somewhere Fog computing: On premises equipment used for cloud stuff. It’s fog because that’s a cloud that’s close to earth. Get it? Hybrid cloud: Bursting to a public cloud for extra capacity. Multi cloud: Azure for business, AWS for burst / scalability, Google for that one weird trick. Enterprise cloud: IT trying desperately to align with a cloud strategy by changing the labels on the Powerpoint. “On premises,” or “legacy,” carrot cake still has a place, even in homes with a cake-as-a-service strategy. Hype-o-meter Impact-o-meter
  • 13. The Cloud Is a Big Place Global IaaS Providers
  • 14. Comparing the Big Three Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization.
  • 15. Comparing the Big Three Focused on value-add platforms. Enthusiastic partner and sponsor in areas of interest to $GOOG Potential conflicts of interest in areas of interest to $GOOG Like something out of Greek mythology, consumes ecosystem partners whole. Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization.
  • 16. Comparing the Big Three Your CIO already has a regular meeting with the Microsoft enterprise sales rep. Microsoft is already a qualified vendor in your purchasing systems. Decades of experience with regulatory compliance and governance Already provides your identity, authorization, and (probably) office productivity. Strategic purchases in HPC / ML / AI Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization. Focused on value-add platforms. Enthusiastic partner and sponsor in areas of interest to $GOOG Potential conflicts of interest in areas of interest to $GOOG Like something out of Greek mythology, consumes ecosystem partners whole.
  • 17. Your CIO already has a regular meeting with the Microsoft enterprise sales rep. Microsoft is already a qualified vendor in your purchasing systems. Decades of experience with regulatory compliance and governance Already provides your identity, authorization, and (probably) office productivity. Strategic purchases in HPC / ML / AI Uncontested heavyweight champion in terms of scale maturity of services and adoption. Services based on the market. Default offerings may not be a good fit for odd-shaped research computing problems. Market dominance means little incentive to provide discounts or customization. Focused on value-add platforms. Enthusiastic partner and sponsor in areas of interest to $GOOG Potential conflicts of interest in areas of interest to $GOOG Like something out of Greek mythology, consumes ecosystem partners whole. Comparing the Big Three
  • 18. Specific Advice on The Big Three Public cloud is an agility play, not a cost play. AWS, GCS, and Azure have very similar capabilities and pricing, even at scale. Pick one and get good at it. Don’t be afraid of running experiments. Avoid 2nd tier cloud providers unless there is an unambiguous business or capability reason to use them. Track spending, even when it’s “free.” $$ !!
  • 19. The Cloud Is a Big Place Global IaaS ProvidersDomain Specific PaaS
  • 20. The Cloud Is a Big Place Global IaaS ProvidersDomain Specific PaaS Your CIO is not thinking of HPC or research computing when articulating their cloud strategy.
  • 21. The Cloud Is a Big Place Global IaaS Providers Analytics Framework Domain Specific PaaS Analysis platforms deserve their own slide deck.
  • 22. RestaurantDeliveryTake and BakeHomemade Metaphor: Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages On-Premises (legacy!) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Credit: Everybody on the Internet.
  • 23. RestaurantDeliveryTake and BakeHomemade Metaphor: Pizza as a Service Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table Cheese Tomato Sauce Pizza Dough Fire Oven Electricity / Gas Drinks Table You Manage Vendor Manages On-Premises (legacy!) Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Credit: Everybody on the Internet.
  • 24. The Cloud Is a Big Place Broad Firecloud Data PlatformGlobal IaaS Providers Analytics Framework Domain Specific PaaS Data platforms are where it’s at right now.
  • 25. One common thread: “Why Not Do Both?” UC Health System Data Warehouse • Shared data warehouse • AND local instances at hospitals NIH: • World class dedicated HPC / networks • AND negotiated discounts with public cloud providers GenePattern Networks: • Free autoscaling environment on AWS • AND support workstation / local HPC
  • 26. The Policies you Need Appropriate usage Human readable: Expectations of privacy and standards of behavior. Data Classification Governance: Defines the major categories of data (corporate sensitive, clinical, …) and standards for handling of each. Written Information Security Policy (WISP) Technical: Defines how systems will be configured to protect sensitive data and operations. Vendor Qualification Business SOP to establish practices around vendor access and management. Real world policy impact: Because bicycle lanes are “traffic lanes,” the argument about snow plowing is simple, which saves lives.
  • 27. Practical advice on Cloud Systems Make it work – Use dedicated instances (full price) until you’re sure the software works – Don’t overthink it: Increase RAM and local disk to overcome crashing – Tear down /rebuild the entire infrastructure from time to time, even in dev. – All systems (yes, even cloud systems) have limits. Stop whining and learn them. – Any time you increase throughput by an order of magnitude, your system will break. Then make it fast – Profiling tools are your friend, automation is not. – Benchmark on real data. Imputed and synthetic data just echo your own assumptions back to you. Then make it cheap – Now you get to turn on spot instances. – This is the first time I ever want to hear about Glacier or Infrequently Accessed tiers of data
  • 28. Practice does not make perfect. Practice makes permanent. Attributed to Yo Yo Ma Engineering is different than experimentation Application Repo Production Infrastructure Repo Build Test • Development can rely on production • Production cannot rely on development • Reference datasets are a prod resource. • No manual intervention in either test or prod.
  • 29. Many Experiments, Few Projects INBOX Active INBOX INBOX Feasibility Development Operations Active Active No ability to predict turnaround times.
  • 30. Many Experiments, Few Projects INBOX Active INBOX INBOX Feasibility Development Operations Active Active “When there is too much to do, there is a strong tendency to engage in local reprioritization, meaning that each person in the process looks at the pile she is facing, determines which items are the most important, and then works on those tasks first local reprioritization creates variability. If a task happens to be prioritized by everyone, it gets done quickly. But, that means another task has been moved to the bottom of several “to do” lists and it might take weeks or months to get done.” No ability to predict turnaround times.
  • 31. FAIR Data (within the enterprise) Findable • NoSQL database of metadata and checksums • It’s plenty for a good long time. Accessible • Federated identity management • Architecture of S3 buckets and production “roles” Interoperable • Data standards, ontologies, strong policy framework, including electronic consents for human subjects data Reusable • ”It’s much easier to go FAR than to go FAIR” Catered Lunch Sense of well-being and contentment arising from realistic expectations Data Lake Open Bar
  • 32. Incredible opportunities here, and rapidly developing data silos The Clinical Data Ecosystem There is an incredible wealth of data available to support both clinical care and research Unfortunately, it is carved up and isolated in technical and social silos. There are both good and bad reasons for this segmentation, and it is holding us back. Patient Journals Consumer products Longitudinal Data from other providers … Electronic Medical Records Possibility of a self-normal (N of 1) over time Diagnostic Imaging Natural language processing has strong potentialClinical Notes Innovations in the basics of clinical observation Hospital Telemetry Pressure to avoid incidental findings prevent bias Primary Lab Data
  • 33. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 34. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 35. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 36. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 37. A Personal Story I use a commercial service that combines labwork with wearable data They provide insights and coaching I have, personally, found this transformational in how I approach my health.
  • 38. Conclusions In 2012345678 2019 , we’re all “cool with the cloud” Premature optimization is still terrible Make it work, make it fast, make it cheap Strong distinction between experimentation and engineering Great policy makes great platforms This continues to be an amazing time to be an infrastructure / data nerd in health care / life science
  • 39. The future is already here – it’s just not very well distributed William Gibson