SlideShare a Scribd company logo
1 of 18
Big Data, Bioscience
and the Cloud
Dan Sullivan
June 25, 2015
BioCatalyst: Cloud Computing in Bioscience
Oregon Bioscience Association
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
My Background
 Data Architect / Engineer
 NoSQL and relational data modeler
 Big data
 Analytics, machine learning and text mining
 Cloud computing
 Computational Biologist
 Author
 No SQL for Mere Mortals
 Contributor to TechTarget
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
Big Data Challenges in Bioscience
Volume
Velocity
Variety
Integration
Varieties of Big Data in
Bioscience
Subcellular – Genetics and
Proteomics
Cellular – Metabolic and
Signaling Pathways
Organism – Disease, Medicine,
Insurance
Populations – Epidemiology,
Social Networks
Genetics and Proteomics
• Genetic Sequencing
• Order of nucleotides in DNA
• Most DNA is common across species
• Many genes code proteins
• Some variants associated with disease
• Which ones?
• Proteomics
• Structure and function of proteins
• Variation in protein sequence and
structure associated with disease
• Which ones? In what context?
Images: http://www.masimo.it/hemoglobin/anemia.htm, https://en.wikipedia.org/wiki/DNA
Pathways
• Metabolic Pathways
• Series of chemical reactions
• Coordinated to produce
reactants
• Choreography of molecules
• Signaling Pathways
• Molecules on cell surface detect
changes in environment
• Cascade of reactions to change
state of cell
• Choreography of molecules
• How do they interact?
• Early 1950s
Korean War
autopsies
2012-2016 Genomic and Proteomic Studies
1985-1998 Pathology Studies - Pathodeterminants of
Atherosclerosis in Youth (PDAY) study
Disease - Atherosclerosis
Healthcare
• Genetics and Disease
• Post-Approval Drug Efficacy
• Discovering and Retrieving Medical
Information
• Comparative Quality
Populations
• Infectious Disease Spread
• How fast will disease spread?
• What countermeasures are
effective?
• What is the morbidity and
mortality?
• Simulation
– Synthetic population
– Model interactions
– Probabilistic
Why Cloud for Big Data in
BioScience?
• Scalability
• Access to compute and memory optimized
virtual machines
• Virtually unlimited storage
• Speed
• Many bioscience computations highly
parallel
• Minimize time to analyze, lower IT
overhead
• Cost
• AWS Spot Instances
• Google Pre-emptible VMs
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
Continuous Learning
• Coursera
• Cloud Computing Concepts
• Bioinformatics: Life Sciences on Your
Computer
• edX
• Introduction to Statistics
• Introduction to Biology
• Principles of Biochemistry
• Rackspace CloudU
• You Tube
• Big Data Vendors
• MapR
• Cloudera
• HortonWorks
• DataStax
• Data Bricks
• Trade Publications
– TechTarget
• SearchAWS
• SearchCloudComputing
• SearchCloudSecurity
– Health Data Management
– Harvard Business Review
Overview
• Background
• Varieties of Big Data in Bioscience
• Continuous learning about Big Data & Cloud
• Making Connections
LinkedIn Groups
Final Thoughts
• Great time to get into Biosciences and
Big Data
• Don’t be intimidated if it’s been a
while since you’ve studied biology –
we are all constantly learning in this
field
• Network online and in person
• Take advantage of free resources
• Courses
• Cloud
• AWS Free Tier
• MAPR Hadoop On Demand Training
• Connect with me on LinkedIn
• https://www.linkedin.com/in/dansull
ivanpdx
• Join me at a Meetup
• Dan.sullivan@cambiahealth.com

More Related Content

What's hot

Science Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren LongScience Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren LongSean Manion PhD
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesEagle Genomics
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchEagle Genomics
 
Smartness in Today’s healthcare applications
Smartness in Today’s healthcare applicationsSmartness in Today’s healthcare applications
Smartness in Today’s healthcare applicationsDr. Shivananda Koteshwar
 
PerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer, Inc.
 
Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023pberzins
 
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data  Science by Megan RisdalData Con LA 2018 Keynote - Better Collaborative Data  Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan RisdalData Con LA
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13DataDryad
 
Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataBeacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataMiro Cupak
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingMiro Cupak
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingMiro Cupak
 
Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Bryan Beecher
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life SciencesKim Kozlik
 
GENOME DATA ANALYSIS
GENOME DATA ANALYSISGENOME DATA ANALYSIS
GENOME DATA ANALYSISAmeldaAkoijam
 

What's hot (15)

Science Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren LongScience Distributed's Chain Event: Distributed Science Pilot - Lauren Long
Science Distributed's Chain Event: Distributed Science Pilot - Lauren Long
 
Validating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniquesValidating microbiome claims – including the latest DNA techniques
Validating microbiome claims – including the latest DNA techniques
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational Research
 
Smartness in Today’s healthcare applications
Smartness in Today’s healthcare applicationsSmartness in Today’s healthcare applications
Smartness in Today’s healthcare applications
 
PerkinElmer Informatics Overview
PerkinElmer Informatics OverviewPerkinElmer Informatics Overview
PerkinElmer Informatics Overview
 
Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023Berzinski Writing Sample7-091023
Berzinski Writing Sample7-091023
 
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data  Science by Megan RisdalData Con LA 2018 Keynote - Better Collaborative Data  Science by Megan Risdal
Data Con LA 2018 Keynote - Better Collaborative Data Science by Megan Risdal
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
 
Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic DataBeacon: A Protocol for Federated Discovery and Sharing of Genomic Data
Beacon: A Protocol for Federated Discovery and Sharing of Genomic Data
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data Sharing
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data Sharing
 
Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...Exploring New Methods for Protecting and Distributing Confidential Research ...
Exploring New Methods for Protecting and Distributing Confidential Research ...
 
Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...Providing support for JC Bradleys vision of open science using RSC cheminform...
Providing support for JC Bradleys vision of open science using RSC cheminform...
 
Caris Life Sciences
Caris Life SciencesCaris Life Sciences
Caris Life Sciences
 
GENOME DATA ANALYSIS
GENOME DATA ANALYSISGENOME DATA ANALYSIS
GENOME DATA ANALYSIS
 

Similar to Big data, bioscience and the cloud biocatalyst june 2015 sullivan

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Database technologies in bioinformatics
Database technologies in bioinformaticsDatabase technologies in bioinformatics
Database technologies in bioinformaticsGleb Sklyr
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)Erich Gombocz
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Amazon Web Services
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...Warren Kibbe
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemSubhendu Dey
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 
Information Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectivesInformation Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectivesErik R. Ranschaert, MD, PhD
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalNour Shublaq
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?Al Dossetter
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 

Similar to Big data, bioscience and the cloud biocatalyst june 2015 sullivan (20)

Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Database technologies in bioinformatics
Database technologies in bioinformaticsDatabase technologies in bioinformatics
Database technologies in bioinformatics
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
E.Gombocz: Semantics in a Box (SemTech 2013-04-30)
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
Life Technologies' Journey to the Cloud (ENT208) | AWS re:Invent 2013
 
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
SAMSI Precision Medicine Keynote, August 2018: Data: where Precision Oncology...
 
Next Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problemNext Gen Sequencing and Associated Big Data / AI problem
Next Gen Sequencing and Associated Big Data / AI problem
 
Hadoop Enabled Healthcare
Hadoop Enabled HealthcareHadoop Enabled Healthcare
Hadoop Enabled Healthcare
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
Information Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectivesInformation Technology and Radiology: challenges and future perspectives
Information Technology and Radiology: challenges and future perspectives
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Big data analystics
Big data analysticsBig data analystics
Big data analystics
 
Vph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_finalVph2012 20 sept12_shublaq_final
Vph2012 20 sept12_shublaq_final
 
MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?MedChemica BigData What Is That All About?
MedChemica BigData What Is That All About?
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
How to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for HealthcareHow to Architect Smarter Systems for Healthcare
How to Architect Smarter Systems for Healthcare
 

More from Dan Sullivan, Ph.D.

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryDan Sullivan, Ph.D.
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?Dan Sullivan, Ph.D.
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningDan Sullivan, Ph.D.
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured dataDan Sullivan, Ph.D.
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupDan Sullivan, Ph.D.
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyDan Sullivan, Ph.D.
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsDan Sullivan, Ph.D.
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Dan Sullivan, Ph.D.
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsDan Sullivan, Ph.D.
 

More from Dan Sullivan, Ph.D. (13)

How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
 
With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?With Automated ML, is Everyone an ML Engineer?
With Automated ML, is Everyone an ML Engineer?
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Google Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine LearningGoogle Cloud Certifications & Machine Learning
Google Cloud Certifications & Machine Learning
 
Unstructured text to structured data
Unstructured text to structured dataUnstructured text to structured data
Unstructured text to structured data
 
A first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetupA first look at tf idf-pdx data science meetup
A first look at tf idf-pdx data science meetup
 
Text mining meets neural nets
Text mining meets neural netsText mining meets neural nets
Text mining meets neural nets
 
ACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False DichotomyACID vs BASE in NoSQL: Another False Dichotomy
ACID vs BASE in NoSQL: Another False Dichotomy
 
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyTools and Techniques for Analyzing Texts: Tweets to Intellectual Property
Tools and Techniques for Analyzing Texts: Tweets to Intellectual Property
 
Modeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key PatternsModeling with Document Database: 5 Key Patterns
Modeling with Document Database: 5 Key Patterns
 
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
Sullivan GBCB Seminar Fall 2014 - Limits of RDMS for Bioinformatics v2
 
Text Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
 
Limits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in BioinformaticsLimits of RDBMS and Need for NoSQL in Bioinformatics
Limits of RDBMS and Need for NoSQL in Bioinformatics
 

Recently uploaded

Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 

Recently uploaded (20)

Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Big data, bioscience and the cloud biocatalyst june 2015 sullivan

  • 1. Big Data, Bioscience and the Cloud Dan Sullivan June 25, 2015 BioCatalyst: Cloud Computing in Bioscience Oregon Bioscience Association
  • 2. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 3. My Background  Data Architect / Engineer  NoSQL and relational data modeler  Big data  Analytics, machine learning and text mining  Cloud computing  Computational Biologist  Author  No SQL for Mere Mortals  Contributor to TechTarget
  • 4. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 5. Big Data Challenges in Bioscience Volume Velocity Variety Integration
  • 6. Varieties of Big Data in Bioscience Subcellular – Genetics and Proteomics Cellular – Metabolic and Signaling Pathways Organism – Disease, Medicine, Insurance Populations – Epidemiology, Social Networks
  • 7. Genetics and Proteomics • Genetic Sequencing • Order of nucleotides in DNA • Most DNA is common across species • Many genes code proteins • Some variants associated with disease • Which ones? • Proteomics • Structure and function of proteins • Variation in protein sequence and structure associated with disease • Which ones? In what context? Images: http://www.masimo.it/hemoglobin/anemia.htm, https://en.wikipedia.org/wiki/DNA
  • 8. Pathways • Metabolic Pathways • Series of chemical reactions • Coordinated to produce reactants • Choreography of molecules • Signaling Pathways • Molecules on cell surface detect changes in environment • Cascade of reactions to change state of cell • Choreography of molecules • How do they interact?
  • 9. • Early 1950s Korean War autopsies 2012-2016 Genomic and Proteomic Studies 1985-1998 Pathology Studies - Pathodeterminants of Atherosclerosis in Youth (PDAY) study Disease - Atherosclerosis
  • 10. Healthcare • Genetics and Disease • Post-Approval Drug Efficacy • Discovering and Retrieving Medical Information • Comparative Quality
  • 11. Populations • Infectious Disease Spread • How fast will disease spread? • What countermeasures are effective? • What is the morbidity and mortality? • Simulation – Synthetic population – Model interactions – Probabilistic
  • 12. Why Cloud for Big Data in BioScience? • Scalability • Access to compute and memory optimized virtual machines • Virtually unlimited storage • Speed • Many bioscience computations highly parallel • Minimize time to analyze, lower IT overhead • Cost • AWS Spot Instances • Google Pre-emptible VMs
  • 13. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 14. Continuous Learning • Coursera • Cloud Computing Concepts • Bioinformatics: Life Sciences on Your Computer • edX • Introduction to Statistics • Introduction to Biology • Principles of Biochemistry • Rackspace CloudU • You Tube • Big Data Vendors • MapR • Cloudera • HortonWorks • DataStax • Data Bricks • Trade Publications – TechTarget • SearchAWS • SearchCloudComputing • SearchCloudSecurity – Health Data Management – Harvard Business Review
  • 15. Overview • Background • Varieties of Big Data in Bioscience • Continuous learning about Big Data & Cloud • Making Connections
  • 17.
  • 18. Final Thoughts • Great time to get into Biosciences and Big Data • Don’t be intimidated if it’s been a while since you’ve studied biology – we are all constantly learning in this field • Network online and in person • Take advantage of free resources • Courses • Cloud • AWS Free Tier • MAPR Hadoop On Demand Training • Connect with me on LinkedIn • https://www.linkedin.com/in/dansull ivanpdx • Join me at a Meetup • Dan.sullivan@cambiahealth.com

Editor's Notes

  1. Projects with any two of these can probably be well handled by RDBMS. When all three are encountered in one project, NoSQL can often provide better performance with different levels of support for Consistency, Availability and network Partitioning (CAP Theorem)
  2. Autopsies performed during Korean War found evidence of early on set athero. Not enough time for lifestyle factors, such as high fat diet, smoking and inactivity to be sole cause of plague. Hypothesis – genetic factor influencing athero. PDAY – confirmed and expanded on earlier findings. Large collaboration of pathologists collected samples from young people who died of non-cardiovascular causes. 3,000 autopsies 15-34 year olds Aorta and LAD samples preserved in fixed formalin, paraffin embedded blocks. Liver samples also collected. GPAA - Use liver samples to sequence genomes. Proteomics collaborators have developed techniques for extracting proteins from old FFPE blocks. Makes genomic and proteomics analysis possible today.