SlideShare una empresa de Scribd logo
1 de 23
Analysing biomedical data
Paul Agapow / Translational Bioinformatics DSI-ICL / October 2017
Biomedical science
is now data science
The 4-headed
beast
● The 4 heads
○ Acquisition
○ Storage
○ Analysis
○ Sharing
● Big data 4Vs
○ Velocity
○ Volume
○ Variety
○ Veracity
The problems of biomedical data
Many ...
● Types
● Formats
● Silos
● Gaps
● Interactions
Difficult analysis
● The curse of dimensionality
● Multiple hypothesis testing &
false discovery
● Batch effects
● Life history
● Biased sampling
● Need for integrative analysis
Practical issues
● Unstructured data
● Managing big data
● Security
● Legal & privacy
Future
medicine
A mix of promise & peril
● More data
○ Genomic medicine
○ Other “omic” medicine
○ Wearables
○ EHR & digital health
● P4 medicine
○ Stratification
○ Analysis at the bedside
○ Patient participation
● Translational medicine
○ Leveraging health data for
research
Scientific data
doubles every
18 months
A new paper
is published
every 30
seconds
Most papers
are never
cited or
even read
No new principle will declare itself from below
a heap of facts. (Peter Medewar)
The analytical
challenges
Liberating
health data
● Enabling EHR for research
● Text extraction
● Unstructured to structured
data
Computationally intensive approaches
● Deep learning
● Concurrent computation
● Which to use? Which works?
● Implementation
● Interpretation
● Assisted & auto-discovery
Integrative
analysis
● The genome is not enough
● Complex interactions
● Statistical power
● Which is best?
● Interpretation
Building knowledge bases
● Extracting structured
information from
unstructured input
● Veracity
● Exploring / querying
Reproducibility
The “solutions”
Standards
● Clinical descriptions
● Measurements:
○ blood pressure
○ White cells
● Cross-study
Yes!
● Allows combining & comparing studies
● CDISC
● HPO
But!
● A lot of work
Data formats & storageYes!
● Plain text
● Open formats
● Structured formats
● Advantages:
○ Human & machine readable
○ Unambiguous
○ WYSIWYG
● Examples:
○ Open bio formats
○ CSV, TSV
No!
● Homebrew formats
● Proprietary / closed formats
● Binary formats
● Excel
Workflow systems & notebooks● Analysis as:
○ An executable recipe
○ A document or commentary
● Many candidates:
○ Workflows:
■ Snakemake
■ Nextflow
■ CWL etc.
○ Computational notebooks:
■ Jupyter / IPython
■ RMarkdown
Deep learning / machine learning
How do you know a biologist
is using deep learning in
their research?
Don’t worry, they’ll tell you.
● “Just” optimization and search techniques
● Takes a set of features and produces a
model that performs a classification or a
regression
● A series of layers that assemble features
into higher level features
● Several high-quality toolkits
● Some need for specialised hardware
(GPU)
● Interpretability
● Ground truths
● Needs lots of data
The pitfalls
Batch effects
● Technical sources of
variation
○ Reageants
○ Technician
○ Platform
○ ...
● Solutions:
○ Plot data
○ Don’t batch
○ COMBAT etc. (but
loss of information)
Omnigenics
What if every gene affected
every other gene?
● Pritchard et al 2017
● FOAF / six degrees of separation effect
● Implicated genes are a few drivers and an
enormous number of “related” loci
● Context?
The garden of forking paths
Multiple hypothesis testing
Conclusion
Taming the 4-headed beast
Acquiring: interpret EHR
Storing: data formats & systems
Analysing: statistics, correct for
batch effects, integrative analysis,
deep learning
Sharing: standards, data formats,
workflow systems

Más contenido relacionado

La actualidad más candente

Yersenia
YerseniaYersenia
Yersenia
Arooosa
 
Outbreak Investigation
Outbreak InvestigationOutbreak Investigation
Outbreak Investigation
Ultraman Taro
 

La actualidad más candente (20)

Kyasanur forest disease
Kyasanur forest diseaseKyasanur forest disease
Kyasanur forest disease
 
Emerging and re emerging diseases (part 1 of 2)
Emerging and re emerging diseases (part 1 of 2)Emerging and re emerging diseases (part 1 of 2)
Emerging and re emerging diseases (part 1 of 2)
 
Concept of clinical and genetic epidemiology and their
Concept of clinical and genetic epidemiology and theirConcept of clinical and genetic epidemiology and their
Concept of clinical and genetic epidemiology and their
 
Prion disease
Prion diseasePrion disease
Prion disease
 
Nipah Virus infection
Nipah Virus infectionNipah Virus infection
Nipah Virus infection
 
Yersenia
YerseniaYersenia
Yersenia
 
Genetic epidemiology
Genetic epidemiologyGenetic epidemiology
Genetic epidemiology
 
2. Case study and case series
2. Case study and case series2. Case study and case series
2. Case study and case series
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 
Molecular epidemiology an introduction
Molecular epidemiology  an introductionMolecular epidemiology  an introduction
Molecular epidemiology an introduction
 
Epidemiological study designs
Epidemiological study designs Epidemiological study designs
Epidemiological study designs
 
Nipah: An Introduction
Nipah: An IntroductionNipah: An Introduction
Nipah: An Introduction
 
Dengue fever- clinical features,investigations, diagnosis, treatment and prev...
Dengue fever- clinical features,investigations, diagnosis, treatment and prev...Dengue fever- clinical features,investigations, diagnosis, treatment and prev...
Dengue fever- clinical features,investigations, diagnosis, treatment and prev...
 
Outbreak Investigation
Outbreak InvestigationOutbreak Investigation
Outbreak Investigation
 
EPIDEMIOLOGY OF PANDEMIC INFLUENZA
EPIDEMIOLOGY OF PANDEMIC INFLUENZAEPIDEMIOLOGY OF PANDEMIC INFLUENZA
EPIDEMIOLOGY OF PANDEMIC INFLUENZA
 
Epidemiology, prevention, and control of plague
Epidemiology, prevention, and control of plagueEpidemiology, prevention, and control of plague
Epidemiology, prevention, and control of plague
 
Zoonosis
ZoonosisZoonosis
Zoonosis
 
Study designs, Epidemiological study design, Types of studies
Study designs, Epidemiological study design, Types of studiesStudy designs, Epidemiological study design, Types of studies
Study designs, Epidemiological study design, Types of studies
 
Nipah virus
Nipah virusNipah virus
Nipah virus
 
Case Control Study
Case Control StudyCase Control Study
Case Control Study
 

Similar a Analysing biomedical data (ers october 2017)

Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
gustavosouto
 
Oxford Lectures Part 1
Oxford Lectures Part 1Oxford Lectures Part 1
Oxford Lectures Part 1
Andrea Pasqua
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
Jisc
 
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
Ari Berman
 

Similar a Analysing biomedical data (ers october 2017) (20)

Big Data & Social Analytics presentation
Big Data & Social Analytics presentationBig Data & Social Analytics presentation
Big Data & Social Analytics presentation
 
Oxford Lectures Part 1
Oxford Lectures Part 1Oxford Lectures Part 1
Oxford Lectures Part 1
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
 
Statistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and HowStatistical Analysis of Results in Music Information Retrieval: Why and How
Statistical Analysis of Results in Music Information Retrieval: Why and How
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
 
Data Science as Scale
Data Science as ScaleData Science as Scale
Data Science as Scale
 
Thinking Big with Big Data
Thinking Big with Big DataThinking Big with Big Data
Thinking Big with Big Data
 
Data science guide
Data science guideData science guide
Data science guide
 
Think like a Digital Curator
Think like a Digital CuratorThink like a Digital Curator
Think like a Digital Curator
 
Researcher needs - a researchers perspective
Researcher needs - a researchers perspectiveResearcher needs - a researchers perspective
Researcher needs - a researchers perspective
 
BigData'18: Validation and Analysis of Hypothesis Generation Systems
BigData'18: Validation and Analysis of Hypothesis Generation SystemsBigData'18: Validation and Analysis of Hypothesis Generation Systems
BigData'18: Validation and Analysis of Hypothesis Generation Systems
 
Manage Your Data: Navigating Data Services at the UW Libraries
Manage Your Data: Navigating Data Services  at the UW LibrariesManage Your Data: Navigating Data Services  at the UW Libraries
Manage Your Data: Navigating Data Services at the UW Libraries
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016Workshop - finding and accessing data - Cambridge August 22 2016
Workshop - finding and accessing data - Cambridge August 22 2016
 
How to write bioinformatics software no one will use
How to write bioinformatics software no one will useHow to write bioinformatics software no one will use
How to write bioinformatics software no one will use
 
Minimal viable-datareuse-czi
Minimal viable-datareuse-cziMinimal viable-datareuse-czi
Minimal viable-datareuse-czi
 
Data management planning Means, goals, and cultures by Hugo Besemer
Data management planningMeans, goals, and cultures by Hugo BesemerData management planningMeans, goals, and cultures by Hugo Besemer
Data management planning Means, goals, and cultures by Hugo Besemer
 
Data management planning. Means, goals and cultures
Data management planning. Means, goals and culturesData management planning. Means, goals and cultures
Data management planning. Means, goals and cultures
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
 

Más de Paul Agapow

Más de Paul Agapow (20)

Digital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdfDigital Biomarkers, a (too) brief introduction.pdf
Digital Biomarkers, a (too) brief introduction.pdf
 
How to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdfHow to make every mistake and still have a career, Feb2024.pdf
How to make every mistake and still have a career, Feb2024.pdf
 
ML, biomedical data & trust
ML, biomedical data & trustML, biomedical data & trust
ML, biomedical data & trust
 
Where AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicineWhere AI will (and won't) revolutionize biomedicine
Where AI will (and won't) revolutionize biomedicine
 
Beyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AIBeyond Proofs of Concept for Biomedical AI
Beyond Proofs of Concept for Biomedical AI
 
Multi-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainMulti-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gain
 
ML & AI in pharma: an overview
ML & AI in pharma: an overviewML & AI in pharma: an overview
ML & AI in pharma: an overview
 
ML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the icebergML & AI in Drug development: the hidden part of the iceberg
ML & AI in Drug development: the hidden part of the iceberg
 
Machine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledgeMachine learning, health data & the limits of knowledge
Machine learning, health data & the limits of knowledge
 
AI in Healthcare
AI in HealthcareAI in Healthcare
AI in Healthcare
 
The End of the Drug Development Casino?
The End of the Drug Development Casino?The End of the Drug Development Casino?
The End of the Drug Development Casino?
 
Get yourself a better bioinformatics job
Get yourself a better bioinformatics jobGet yourself a better bioinformatics job
Get yourself a better bioinformatics job
 
Interpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical ResearchInterpreting Complex Real World Data for Pharmaceutical Research
Interpreting Complex Real World Data for Pharmaceutical Research
 
Filling the gaps in translational research
Filling the gaps in translational researchFilling the gaps in translational research
Filling the gaps in translational research
 
Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)Bioinformatics! (What is it good for?)
Bioinformatics! (What is it good for?)
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Machine Learning for Preclinical Research
Machine Learning for Preclinical ResearchMachine Learning for Preclinical Research
Machine Learning for Preclinical Research
 
AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)AI for Precision Medicine (Pragmatic preclinical data science)
AI for Precision Medicine (Pragmatic preclinical data science)
 
Patient subtypes: real or not?
Patient subtypes: real or not?Patient subtypes: real or not?
Patient subtypes: real or not?
 
Big biomedical data is a lie
Big biomedical data is a lieBig biomedical data is a lie
Big biomedical data is a lie
 

Último

Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Majestic ⟟  9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Majestic ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Vadodara Just Call 8617370543 Top Class Call Girl Service Available
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
 
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Kakinada Just Call 9907093804 Top Class Call Girl Service Available
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
 

Analysing biomedical data (ers october 2017)

  • 1. Analysing biomedical data Paul Agapow / Translational Bioinformatics DSI-ICL / October 2017
  • 3. The 4-headed beast ● The 4 heads ○ Acquisition ○ Storage ○ Analysis ○ Sharing ● Big data 4Vs ○ Velocity ○ Volume ○ Variety ○ Veracity
  • 4. The problems of biomedical data Many ... ● Types ● Formats ● Silos ● Gaps ● Interactions Difficult analysis ● The curse of dimensionality ● Multiple hypothesis testing & false discovery ● Batch effects ● Life history ● Biased sampling ● Need for integrative analysis Practical issues ● Unstructured data ● Managing big data ● Security ● Legal & privacy
  • 5. Future medicine A mix of promise & peril ● More data ○ Genomic medicine ○ Other “omic” medicine ○ Wearables ○ EHR & digital health ● P4 medicine ○ Stratification ○ Analysis at the bedside ○ Patient participation ● Translational medicine ○ Leveraging health data for research
  • 6. Scientific data doubles every 18 months A new paper is published every 30 seconds Most papers are never cited or even read No new principle will declare itself from below a heap of facts. (Peter Medewar)
  • 8.
  • 9. Liberating health data ● Enabling EHR for research ● Text extraction ● Unstructured to structured data
  • 10. Computationally intensive approaches ● Deep learning ● Concurrent computation ● Which to use? Which works? ● Implementation ● Interpretation ● Assisted & auto-discovery
  • 11. Integrative analysis ● The genome is not enough ● Complex interactions ● Statistical power ● Which is best? ● Interpretation
  • 12. Building knowledge bases ● Extracting structured information from unstructured input ● Veracity ● Exploring / querying
  • 15. Standards ● Clinical descriptions ● Measurements: ○ blood pressure ○ White cells ● Cross-study Yes! ● Allows combining & comparing studies ● CDISC ● HPO But! ● A lot of work
  • 16. Data formats & storageYes! ● Plain text ● Open formats ● Structured formats ● Advantages: ○ Human & machine readable ○ Unambiguous ○ WYSIWYG ● Examples: ○ Open bio formats ○ CSV, TSV No! ● Homebrew formats ● Proprietary / closed formats ● Binary formats ● Excel
  • 17. Workflow systems & notebooks● Analysis as: ○ An executable recipe ○ A document or commentary ● Many candidates: ○ Workflows: ■ Snakemake ■ Nextflow ■ CWL etc. ○ Computational notebooks: ■ Jupyter / IPython ■ RMarkdown
  • 18. Deep learning / machine learning How do you know a biologist is using deep learning in their research? Don’t worry, they’ll tell you. ● “Just” optimization and search techniques ● Takes a set of features and produces a model that performs a classification or a regression ● A series of layers that assemble features into higher level features ● Several high-quality toolkits ● Some need for specialised hardware (GPU) ● Interpretability ● Ground truths ● Needs lots of data
  • 20. Batch effects ● Technical sources of variation ○ Reageants ○ Technician ○ Platform ○ ... ● Solutions: ○ Plot data ○ Don’t batch ○ COMBAT etc. (but loss of information)
  • 21. Omnigenics What if every gene affected every other gene? ● Pritchard et al 2017 ● FOAF / six degrees of separation effect ● Implicated genes are a few drivers and an enormous number of “related” loci ● Context?
  • 22. The garden of forking paths Multiple hypothesis testing
  • 23. Conclusion Taming the 4-headed beast Acquiring: interpret EHR Storing: data formats & systems Analysing: statistics, correct for batch effects, integrative analysis, deep learning Sharing: standards, data formats, workflow systems

Notas del editor

  1. Scientific data doubles every 18 months A new paper is published every 30 seconds Most papers are never cited (or even read)