SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Automated HypothesisTesting with
Large Scale Scientific Workflows
Yolanda Gil
Daniel Garijo
Rajiv Mayani
Varun Ratnakar
Information Sciences Institute
& Department of Computer Science
University of Southern California
http://www.isi.edu
Parag Mallick
Ravali Adusumilli
Hunter Boyce
Stanford School of Medicine
Canary Center for Early Cancer Detection
Stanford University
http://mallicklab.stanford.edu
http://www.disk-project.org
Talk Outline
๏ Motivation
๏ Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Scientific Data AnalysisToday:
Inefficient, Incomplete, Irreproducible
๏ Data analysis is time consuming
๏ Not systematic
๏ Not updated when new data/methods
become available
๏ Hard/impractical to reproduce prior
work
๏ Overall process is manually done:
inefficient and error-prone
๏ Analytic knowledge is
compartmentalised
New
hypothesis
Formulate
line of inquiry
(data + method)
Retrieve
data
Run
workflows
(methods)
Meta-analysis
of results
Our Focus: Cancer Multi-Omics
๏ Data Availability and Complexity:
• The multi-omic domain is filled with multiple levels of heterogeneous data that is
regularly expanding in volume and complexity through projects likeThe Cancer
Genome AtlasTCGA and and the associated Clinical ProteomicTumor Analysis
Consortium (CPTAC)
Our Focus: Cancer Multi-Omics
๏ Analytic Complexity:
• Multi-omic analysis requires the
use of dozens of interconnected
tools each of which may require
substantial domain knowledge. MAQ	
BWA	
BWA-SW	(SE	
only)		
PERM	
SOAPv2	
MOSAIK	
NOVOALIGN	
SAMTOOLS	
PICARD	
GATK	
PICARD	
SAMTOOLS	
IGVtools	
Domain Knowledge is isolated
Our Focus: Cancer Multi-Omics
๏ Multiple types and complexities
of hypotheses:
• Hypotheses span the range from
single-gene/single dataset to
multi-gene/multi-ome/multi-
dataset
• Is this protein is found in this sample ?
• Is this gene is found in this sample ?
• Is this protein is associated with a
certain cancer ?
• Which proteins are associated with a
certain cancer ?
• ..
• ..
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Our Approach: Hypotheses-Driven Discovery
๏ Represent scientist
hypotheses
๏ Formulate lines of inquiry
that express how a type of
hypothesis can be pursued by
data analysis workflows
๏ Design a meta-analysis that
examines the results of lines of
inquiry and either validates or
revises the original hypotheses
๏ Develop an intelligent agent
that can report and explain
new findings to the scientist
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Representing Hypotheses
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Representing Hypotheses
Requirements from Omics
๏ Graph-based hypothesis
representation
• Entities are nodes
• Relationships are links
๏ Annotations on graphs
• Represent qualifications of hypotheses:
confidence and evidence
๏ Representing hypothesis evolution
• Graph versioning
Graph representation in RDF
๏ Standard semantic web language
๏ Scalable reasoners available
๏ Qualifications and provenance
through triple reification
๏ Versioning through multiple
named graphs
Representing Hypotheses
Representing Hypotheses
Biology
ontology
Hypothesis
ontology
hyp:expressedIn
user:TCGA-AA-3561-01A-22
User data
definitions
hyp:associatedWith
bio:ColonCancer
Graph Hy1
Graph Hy2
bio:PRKCDBP
bio:PRKCDBP
Lifecycle of a hypothesis
Biology
ontology
Hypothesis
ontology
hyp:expressedIn
user:TCGA-AA-3561-01A-22
User data
definitions
hyp:associatedWith
bio:ColonCancer
Graph Hy1
Graph Hy2
bio:PRKCDBP
bio:PRKCDBP
1. Initial Hypothesis, Data & Workflows
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Hypothesis Statement Hy1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
2. Running workflows on Data
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Workflow Execution
W1
hasWorkflowTemplate
used
Hypothesis Statement Hy1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Qualifications of Hy1'Provenance of Hy1'
Hypothesis Statement Hy1
3. Meta reasoning about workflow results
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Data Available
Workflows Available
Proteomics
Proteogenomics
XX_3561Proteome_VU.zip
(MassSpecData)
producedData TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX2
(Experiment)
experimentedOn
Workflow Execution
W1
hasWorkflowTemplate
used
Meta-Workflow Execution
MW1
used
Revised Hypothesis Statement Hy1'
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
hasConfidenceValue
0
Statement Hy1'-S1
hasProvenance
producedused
produced
revisionOf
4. New Data becomes available
Workflows Available
Proteomics
Proteogenomics
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
XX_3561_DD.zip
(RNASeqData)
5. New Multi-Workflows are also run
Workflows Available
Proteomics
Proteogenomics
used
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
Workflow Execution
W2
XX_3561_DD.zip
(RNASeqData)
Workflow Execution
W1
used
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
Qualifications of Ha1'
hasProvenance
Provenance of Ha1'
6. Hypothesis Revision
Workflows Available
Proteomics
Proteogenomics
used
used
Revised Hypothesis Statement Ha1'
PRKCDBP
Mutated
expressedIn
TCGA-AA-3561-01A-22
hasConfidenceValue
0.98
Statement Ha1'-S1
producedused
Data Available
XX_3561Proteome_VU.zip
(MassSpecData)
producedData
producedData
experimentedOn
experimentedOn
TCGA-AA-3561
(Patient)
collectedFromTCGA-AA-3561-01A-22
(Sample)
AA_3561_EX1
(Experiment)
AA_3561_EX2
(Experiment)
Workflow Execution
W2
XX_3561_DD.zip
(RNASeqData)
Workflow Execution
W1
used used
produced
Meta-Workflow Execution
MW2
Hypothesis Statement Ha1
PRKCDBP
expressedIn
TCGA-AA-3561-01A-22
revisionOf
Representing Lines of Inquiry & Data analysis workflows
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Data Query Pattern
DataFile ?d
Hypothesis Pattern
Lines of Inquiry
๏ Capture how to setup potential analyses that can be pursued to test a certain type of
hypothesis
bio:Protein ?p
hyp:expressedIn
bio:Sample ?s
producedData
Patient ?pcollectedFromSample ?sExperiment ?e
experimentedOn
Data Analytic Workflows
ProteomicsProteogenomics
DataFile ?d
Meta-workflowsComparisonConfidence estimation Benchmarking
Example Multi-omics Workflow (Zhang et. al replication)
Automated Workflow Generation in WINGS by Reasoning about
Semantic Constraints
Example: all input data must be from human species, i.e. must have HS in metadata
Workflow system uses this constraint to select datasets that have HS in their metadata so they are valid
Representing Hypotheses
Hypothesis
Lines of Inquiry
Specify relevant analytic methods (workflows),
type of data needed, and how to combine results
Query to
retrieve Data
Data Analysis
Workflows
Workflow
Bindings
Meta-Workflows
Confidence
Estimation
Benchmarking
Revised hypothesis &
interesting findings
Meta-workflows:
1) Comparison Meta-Workflows
Variant
Detection
Custom
Protein DB
Protein
Identification
Protein
Identification
Custom DB Reference DB
Protein IDs Protein IDs
Similarity
ScoreData Dependent:
•  Peptide Level
•  Protein Level
•  Scan Level
Comparison
Meta-Workflow
๏ Goals:
• Compare results amongst multiple workflows
• Measure the global similarity amongst multiple workflows
• Provide users with explanation of workflow-dependent
differences in results
Meta-workflows:
2) Benchmark Meta-Workflows
๏ Goals:
• Evaluation of workflow performance
• Training of confidence estimation models (probabilistic)
Probabilistic Models
Benchmark
Meta-Workflow
ROC, True/False
Positive Rate
Meta-workflows:
3) Confidence estimation Meta-Workflows
๏ Goals:
• Composite results from multiple workflows
• Estimate confidence of the workflow result
• Use estimated confidence to update hypothesis
Protein
Identification
Protein
Identification
Custom DB Reference DB
Protein IDs Protein IDs
Probabilistic
Model
Estimate Confidence
Update Hypothesis
Benchmark
Meta-Workflow
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
DISK Walkthrough: Initial Hypothesis
๏ Initial hypothesis is provided by the user
• PRKCDBP protein is expressed in a patient sample
DISK Walkthrough: Lines of Inquiry
๏ Line of inquiry suggests to find data from different experiments done with the
patient’s sample, then run multi-omic workflows, and then combine evidence into
confidence score
General hypothesis pattern
Data query pattern: search for different experiments
that produced omics data (eg type RNASeq and
MassSpecData)
Data analysis workflows to run on genomics and
proteomics data (more omics in the future)
Meta-workflows to assess confidence on the
hypothesis based on workflow results
DISK Walkthrough: Data & Workflows
To test a hypothesis that a protein is present in a patient’s sample:
๏ Retrieve mass spec and RNASeq data
๏ Use workflows
• Wf1: Proteome only
• Wf2: ProteoGenomic
DISK Walkthrough: Meta-Workflows
๏ After running the workflows, meta-
workflow analyse the results and generate a
confidence value
DISK Walkthrough: Revised Hypothesis
๏ The hypothesis is revised and given a confidence value:
• A mutation of the protein PRKCDBP has been expressed in the patient’s sample
TCGA-AA-3561-01A-22 with a confidence 0.9887
DISK Walkthrough: Provenance Details
๏ Hypothesis provenance stores information about workflows run and the data used
• Workflow execution provenance is published by WINGS in the prov standard.
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
DISK:Automated DIscovery of Scientific Knowledge
Workflow
Constraints
Workflow
Reasoning
Open
Publication of
Results as
Linked Data
Workflow
Provenance
WINGS Intelligent Workflow System
Lines of Inquiry
Interactive
Discovery
Agent
Hypothesis EvaluationHypotheses
Revised
hypotheses
& interesting
findings
Analytic Workflows
Data Retrieval
Workflow
Binding
Meta-Workflows
Confidence
Estimation
Benchmarking
Formulate
Lines of
Inquiry
Meta-Analysis
of Results
Data
Repository
Our Initial Focus: Reproduce Seminal Omics Analysis
[Zhang et al 2014]
๏ Replicated [Zhang et al 2014] Proteogenomic analysis of Colo-rectal cancer
๏ Successfully reproduced paper findings comparing results at multiple levels (final figure,
supplementary tables, etc.)
๏ Took months and direct conversations with authors to replicate paper figures and
supplemental figures
๏ Application of analysis approach to new cancer type now takes minutes
• Useful whenTCGA is integrated
๏ Expanded analysis to
• compare how sensitive findings were to workflow details
0
2
4
6
−1.0 −0.5 0.0 0.5 1.0
spearman correlation
density
Correlation between mRNA−protein abundance
(within samples)
0
1
2
−4 −3 −2 −1 0
spearman correlation
density
Correlation between mRNA−protein variation
(across samples)
Impact on Cancer Multi-Omics
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Related Work
1) Discovery Systems
๏ [Lenat 1976]
๏ [Lindsay et al 1980]
๏ [Langley 1981]
๏ [Falkenhainer 1985]
๏ [Kulkarni and Simon 1988]
๏ [Cheeseman et al 1989]
๏ [Zytkow et al 1990]
๏ [Simon 1996]
๏ [Valdes-Perez 1997]
๏ [Todorovski et al 2000]
๏ [Schmidt and Lipson 2009]
Related Work:
2) Hypothesis Representation as Graphs
๏ Existing vocabularies are related but need to be extended to represent hypotheses in
DISK
• SWAN [Gao et al 2006]
• EXPO [Soldatova and King 2006]
• Nanopublications [Groth et al 2010]
• Ovopublications [Callahan and Dumontier 2013]
• Micropublications [Clark et al 2014]
• LSC
• BEL
Talk Outline
๏ Motivation
๏ Our Approach & Research Challenges
1. Representing Hypotheses
2. Representing Lines of Inquiry
3. Meta-analysis to review workflow results
๏ DISK Scenario walkthrough
๏ Results in cancer multi-omics
๏ Related work
๏ Contributions and Future Work
Contributions
๏ Represent scientist hypotheses
• Hypothesis ontology includes revisions & provenance
๏ Formulate lines of inquiry that express how a type of hypothesis can be
pursued with a data analysis workflow
• Lines of inquiry outline what type of data and workflows to use, and customize
them to the hypotheses at hand
๏ Design a meta-analysis to assess the results of lines of inquiry and revise the
original hypotheses
• Meta-analysis workflows assess diverse evidence
Ongoing & Future Work
๏ Ongoing work:
• Interactive Discovery Agent that explains interesting findings
• Continuous analysis of data (TCGA/CPTAC) as it grows
• Extending and generalizing meta-workflows
• Using DISK in geosciences: Subsurface water resource modeling
๏ Future challenges:
• More complex hypotheses about several entities
• Incorporate evidence over time
• Designing domain-independent meta-workflows
• Resource-bound hypothesis exploration
Thank you

Más contenido relacionado

La actualidad más candente

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisEva Durall
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Breakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsBreakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsStatistics Solutions
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020Eero Siljander
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionAndrea Miller-Nesbitt
 
Chapter 15 Social Research
Chapter 15 Social ResearchChapter 15 Social Research
Chapter 15 Social Researcharpsychology
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryHamed Taherdoost
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSHamed Taherdoost
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
 
Qualitative data analysis pdf
Qualitative data analysis pdfQualitative data analysis pdf
Qualitative data analysis pdfAyuni Abdullah
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalDavidMaxwell77
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisAileen Buckley
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisVishwas N
 

La actualidad más candente (19)

Data Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data AnalysisData Visualization in Exploratory Data Analysis
Data Visualization in Exploratory Data Analysis
 
Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013Kenett On Information NYU-Poly 2013
Kenett On Information NYU-Poly 2013
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Breakdown of Regression Models for Dissertations
Breakdown of Regression Models for DissertationsBreakdown of Regression Models for Dissertations
Breakdown of Regression Models for Dissertations
 
Data analysis
Data analysisData analysis
Data analysis
 
Es credit scoring_2020
Es credit scoring_2020Es credit scoring_2020
Es credit scoring_2020
 
Applying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extractionApplying ‘best fit’ frameworks to systematic review data extraction
Applying ‘best fit’ frameworks to systematic review data extraction
 
Chapter 15 Social Research
Chapter 15 Social ResearchChapter 15 Social Research
Chapter 15 Social Research
 
Exploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and TheoryExploratory Factor Analysis; Concepts and Theory
Exploratory Factor Analysis; Concepts and Theory
 
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKSTECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
 
Data analysis aug-11
Data analysis aug-11Data analysis aug-11
Data analysis aug-11
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
Qualitative data analysis pdf
Qualitative data analysis pdfQualitative data analysis pdf
Qualitative data analysis pdf
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Invited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information RetrievalInvited Lecture on Interactive Information Retrieval
Invited Lecture on Interactive Information Retrieval
 
Dealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysisDealing with incomplete data for mapping and spatial analysis
Dealing with incomplete data for mapping and spatial analysis
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 

Similar a Automated Hypothesis Testing with Large Scale Scientific Workflows

GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfStanleyChivandire1
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...alessio_ferrari
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceJohn McDonald
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineeringalessio_ferrari
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedSri Ambati
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networkingStenio Fernandes
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"James Neill
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...Rasha
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsMartin Kretzer
 
ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013Laura Pasquini
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetCongChen35
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 adrianheilbut
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Paolo Missier
 
A step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysisA step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysisPhd Assistance
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda ProvenanceVlad Vega
 
Rearch methodology
Rearch methodologyRearch methodology
Rearch methodologyYedu Dharan
 
Quantitative Method
Quantitative MethodQuantitative Method
Quantitative Methodzahraa Aamir
 
Quantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxQuantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxKainatJameel
 

Similar a Automated Hypothesis Testing with Large Scale Scientific Workflows (20)

GBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdfGBS MSCBDA - Dissertation Guidelines.pdf
GBS MSCBDA - Dissertation Guidelines.pdf
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical EvidenceStatistics for Librarians: How to Use and Evaluate Statistical Evidence
Statistics for Librarians: How to Use and Evaluate Statistical Evidence
 
Case Study Research in Software Engineering
Case Study Research in Software EngineeringCase Study Research in Software Engineering
Case Study Research in Software Engineering
 
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford MedMachine Learning in Modern Medicine with Erin LeDell at Stanford Med
Machine Learning in Modern Medicine with Erin LeDell at Stanford Med
 
Data analytics in computer networking
Data analytics in computer networkingData analytics in computer networking
Data analytics in computer networking
 
Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"Review of "Survey Research Methods & Design in Psychology"
Review of "Survey Research Methods & Design in Psychology"
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
 
Research methodology (2)
Research methodology (2)Research methodology (2)
Research methodology (2)
 
Quantitative Research: Surveys and Experiments
Quantitative Research: Surveys and ExperimentsQuantitative Research: Surveys and Experiments
Quantitative Research: Surveys and Experiments
 
ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013ATPI Dissertation Proposal Rubric 2013
ATPI Dissertation Proposal Rubric 2013
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013 grizzly - informal overview - pydata boston 2013
grizzly - informal overview - pydata boston 2013
 
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
Invited talk @Aberdeen, '07: Modelling and computing the quality of informati...
 
A step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysisA step-by-step guide for conducting statistical data analysis
A step-by-step guide for conducting statistical data analysis
 
Panda Provenance
Panda ProvenancePanda Provenance
Panda Provenance
 
Rearch methodology
Rearch methodologyRearch methodology
Rearch methodology
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Quantitative Method
Quantitative MethodQuantitative Method
Quantitative Method
 
Quantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptxQuantitative and Qualitative research-100120032723-phpapp01.pptx
Quantitative and Qualitative research-100120032723-phpapp01.pptx
 

Más de dgarijo

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesdgarijo
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Futuredgarijo
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Softwaredgarijo
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationdgarijo
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasetsdgarijo
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadatadgarijo
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...dgarijo
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Datadgarijo
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...dgarijo
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Sciencedgarijo
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...dgarijo
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologiesdgarijo
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narrativesdgarijo
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Softwaredgarijo
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineeringdgarijo
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesdgarijo
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overviewdgarijo
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsdgarijo
 

Más de dgarijo (20)

FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principlesFOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
 
FAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the FutureFAIR Workflows: A step closer to the Scientific Paper of the Future
FAIR Workflows: A step closer to the Scientific Paper of the Future
 
Towards Reusable Research Software
Towards Reusable Research SoftwareTowards Reusable Research Software
Towards Reusable Research Software
 
SOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentationSOMEF: a metadata extraction framework from software documentation
SOMEF: a metadata extraction framework from software documentation
 
A Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed DatasetsA Template-Based Approach for Annotating Long-Tailed Datasets
A Template-Based Approach for Annotating Long-Tailed Datasets
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 
Towards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software MetadataTowards Knowledge Graphs of Reusable Research Software Metadata
Towards Knowledge Graphs of Reusable Research Software Metadata
 
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...Scientific Software Registry Collaboration Workshop: From Software Metadata r...
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
 
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular DataWDPlus: Leveraging Wikidata to Link and Extend Tabular Data
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
 
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven ScienceCapturing Context in Scientific Experiments: Towards Computer-Driven Science
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
 
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
 
WIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
 
Towards Automating Data Narratives
Towards Automating Data NarrativesTowards Automating Data Narratives
Towards Automating Data Narratives
 
OntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific SoftwareOntoSoft: A Distributed Semantic Registry for Scientific Software
OntoSoft: A Distributed Semantic Registry for Scientific Software
 
OEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology EngineeringOEG tools for supporting Ontology Engineering
OEG tools for supporting Ontology Engineering
 
Software Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciencesSoftware Metadata: Describing "dark software" in GeoSciences
Software Metadata: Describing "dark software" in GeoSciences
 
Reproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An OverviewReproducibility Using Semantics: An Overview
Reproducibility Using Semantics: An Overview
 
PhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflowsPhD Thesis: Mining abstractions in scientific workflows
PhD Thesis: Mining abstractions in scientific workflows
 

Último

4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 

Último (20)

Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 

Automated Hypothesis Testing with Large Scale Scientific Workflows

  • 1. Automated HypothesisTesting with Large Scale Scientific Workflows Yolanda Gil Daniel Garijo Rajiv Mayani Varun Ratnakar Information Sciences Institute & Department of Computer Science University of Southern California http://www.isi.edu Parag Mallick Ravali Adusumilli Hunter Boyce Stanford School of Medicine Canary Center for Early Cancer Detection Stanford University http://mallicklab.stanford.edu http://www.disk-project.org
  • 2. Talk Outline ๏ Motivation ๏ Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 3. Scientific Data AnalysisToday: Inefficient, Incomplete, Irreproducible ๏ Data analysis is time consuming ๏ Not systematic ๏ Not updated when new data/methods become available ๏ Hard/impractical to reproduce prior work ๏ Overall process is manually done: inefficient and error-prone ๏ Analytic knowledge is compartmentalised New hypothesis Formulate line of inquiry (data + method) Retrieve data Run workflows (methods) Meta-analysis of results
  • 4. Our Focus: Cancer Multi-Omics ๏ Data Availability and Complexity: • The multi-omic domain is filled with multiple levels of heterogeneous data that is regularly expanding in volume and complexity through projects likeThe Cancer Genome AtlasTCGA and and the associated Clinical ProteomicTumor Analysis Consortium (CPTAC)
  • 5. Our Focus: Cancer Multi-Omics ๏ Analytic Complexity: • Multi-omic analysis requires the use of dozens of interconnected tools each of which may require substantial domain knowledge. MAQ BWA BWA-SW (SE only) PERM SOAPv2 MOSAIK NOVOALIGN SAMTOOLS PICARD GATK PICARD SAMTOOLS IGVtools Domain Knowledge is isolated
  • 6. Our Focus: Cancer Multi-Omics ๏ Multiple types and complexities of hypotheses: • Hypotheses span the range from single-gene/single dataset to multi-gene/multi-ome/multi- dataset • Is this protein is found in this sample ? • Is this gene is found in this sample ? • Is this protein is associated with a certain cancer ? • Which proteins are associated with a certain cancer ? • .. • ..
  • 7. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 8. Our Approach: Hypotheses-Driven Discovery ๏ Represent scientist hypotheses ๏ Formulate lines of inquiry that express how a type of hypothesis can be pursued by data analysis workflows ๏ Design a meta-analysis that examines the results of lines of inquiry and either validates or revises the original hypotheses ๏ Develop an intelligent agent that can report and explain new findings to the scientist Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 9. Representing Hypotheses Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings Representing Hypotheses
  • 10. Requirements from Omics ๏ Graph-based hypothesis representation • Entities are nodes • Relationships are links ๏ Annotations on graphs • Represent qualifications of hypotheses: confidence and evidence ๏ Representing hypothesis evolution • Graph versioning Graph representation in RDF ๏ Standard semantic web language ๏ Scalable reasoners available ๏ Qualifications and provenance through triple reification ๏ Versioning through multiple named graphs Representing Hypotheses
  • 12. Lifecycle of a hypothesis Biology ontology Hypothesis ontology hyp:expressedIn user:TCGA-AA-3561-01A-22 User data definitions hyp:associatedWith bio:ColonCancer Graph Hy1 Graph Hy2 bio:PRKCDBP bio:PRKCDBP
  • 13. 1. Initial Hypothesis, Data & Workflows Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Hypothesis Statement Hy1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 14. 2. Running workflows on Data Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Workflow Execution W1 hasWorkflowTemplate used Hypothesis Statement Hy1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 15. Qualifications of Hy1'Provenance of Hy1' Hypothesis Statement Hy1 3. Meta reasoning about workflow results PRKCDBP expressedIn TCGA-AA-3561-01A-22 Data Available Workflows Available Proteomics Proteogenomics XX_3561Proteome_VU.zip (MassSpecData) producedData TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX2 (Experiment) experimentedOn Workflow Execution W1 hasWorkflowTemplate used Meta-Workflow Execution MW1 used Revised Hypothesis Statement Hy1' PRKCDBP expressedIn TCGA-AA-3561-01A-22 hasConfidenceValue 0 Statement Hy1'-S1 hasProvenance producedused produced revisionOf
  • 16. 4. New Data becomes available Workflows Available Proteomics Proteogenomics Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22 Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) XX_3561_DD.zip (RNASeqData)
  • 17. 5. New Multi-Workflows are also run Workflows Available Proteomics Proteogenomics used Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) Workflow Execution W2 XX_3561_DD.zip (RNASeqData) Workflow Execution W1 used Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22
  • 18. Qualifications of Ha1' hasProvenance Provenance of Ha1' 6. Hypothesis Revision Workflows Available Proteomics Proteogenomics used used Revised Hypothesis Statement Ha1' PRKCDBP Mutated expressedIn TCGA-AA-3561-01A-22 hasConfidenceValue 0.98 Statement Ha1'-S1 producedused Data Available XX_3561Proteome_VU.zip (MassSpecData) producedData producedData experimentedOn experimentedOn TCGA-AA-3561 (Patient) collectedFromTCGA-AA-3561-01A-22 (Sample) AA_3561_EX1 (Experiment) AA_3561_EX2 (Experiment) Workflow Execution W2 XX_3561_DD.zip (RNASeqData) Workflow Execution W1 used used produced Meta-Workflow Execution MW2 Hypothesis Statement Ha1 PRKCDBP expressedIn TCGA-AA-3561-01A-22 revisionOf
  • 19. Representing Lines of Inquiry & Data analysis workflows Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 20. Data Query Pattern DataFile ?d Hypothesis Pattern Lines of Inquiry ๏ Capture how to setup potential analyses that can be pursued to test a certain type of hypothesis bio:Protein ?p hyp:expressedIn bio:Sample ?s producedData Patient ?pcollectedFromSample ?sExperiment ?e experimentedOn Data Analytic Workflows ProteomicsProteogenomics DataFile ?d Meta-workflowsComparisonConfidence estimation Benchmarking
  • 21. Example Multi-omics Workflow (Zhang et. al replication)
  • 22. Automated Workflow Generation in WINGS by Reasoning about Semantic Constraints Example: all input data must be from human species, i.e. must have HS in metadata Workflow system uses this constraint to select datasets that have HS in their metadata so they are valid
  • 23. Representing Hypotheses Hypothesis Lines of Inquiry Specify relevant analytic methods (workflows), type of data needed, and how to combine results Query to retrieve Data Data Analysis Workflows Workflow Bindings Meta-Workflows Confidence Estimation Benchmarking Revised hypothesis & interesting findings
  • 24. Meta-workflows: 1) Comparison Meta-Workflows Variant Detection Custom Protein DB Protein Identification Protein Identification Custom DB Reference DB Protein IDs Protein IDs Similarity ScoreData Dependent: •  Peptide Level •  Protein Level •  Scan Level Comparison Meta-Workflow ๏ Goals: • Compare results amongst multiple workflows • Measure the global similarity amongst multiple workflows • Provide users with explanation of workflow-dependent differences in results
  • 25. Meta-workflows: 2) Benchmark Meta-Workflows ๏ Goals: • Evaluation of workflow performance • Training of confidence estimation models (probabilistic) Probabilistic Models Benchmark Meta-Workflow ROC, True/False Positive Rate
  • 26. Meta-workflows: 3) Confidence estimation Meta-Workflows ๏ Goals: • Composite results from multiple workflows • Estimate confidence of the workflow result • Use estimated confidence to update hypothesis Protein Identification Protein Identification Custom DB Reference DB Protein IDs Protein IDs Probabilistic Model Estimate Confidence Update Hypothesis Benchmark Meta-Workflow
  • 27. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 28. DISK Walkthrough: Initial Hypothesis ๏ Initial hypothesis is provided by the user • PRKCDBP protein is expressed in a patient sample
  • 29. DISK Walkthrough: Lines of Inquiry ๏ Line of inquiry suggests to find data from different experiments done with the patient’s sample, then run multi-omic workflows, and then combine evidence into confidence score General hypothesis pattern Data query pattern: search for different experiments that produced omics data (eg type RNASeq and MassSpecData) Data analysis workflows to run on genomics and proteomics data (more omics in the future) Meta-workflows to assess confidence on the hypothesis based on workflow results
  • 30. DISK Walkthrough: Data & Workflows To test a hypothesis that a protein is present in a patient’s sample: ๏ Retrieve mass spec and RNASeq data ๏ Use workflows • Wf1: Proteome only • Wf2: ProteoGenomic
  • 31. DISK Walkthrough: Meta-Workflows ๏ After running the workflows, meta- workflow analyse the results and generate a confidence value
  • 32. DISK Walkthrough: Revised Hypothesis ๏ The hypothesis is revised and given a confidence value: • A mutation of the protein PRKCDBP has been expressed in the patient’s sample TCGA-AA-3561-01A-22 with a confidence 0.9887
  • 33. DISK Walkthrough: Provenance Details ๏ Hypothesis provenance stores information about workflows run and the data used • Workflow execution provenance is published by WINGS in the prov standard.
  • 34. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 35. DISK:Automated DIscovery of Scientific Knowledge Workflow Constraints Workflow Reasoning Open Publication of Results as Linked Data Workflow Provenance WINGS Intelligent Workflow System Lines of Inquiry Interactive Discovery Agent Hypothesis EvaluationHypotheses Revised hypotheses & interesting findings Analytic Workflows Data Retrieval Workflow Binding Meta-Workflows Confidence Estimation Benchmarking Formulate Lines of Inquiry Meta-Analysis of Results Data Repository
  • 36. Our Initial Focus: Reproduce Seminal Omics Analysis [Zhang et al 2014]
  • 37. ๏ Replicated [Zhang et al 2014] Proteogenomic analysis of Colo-rectal cancer ๏ Successfully reproduced paper findings comparing results at multiple levels (final figure, supplementary tables, etc.) ๏ Took months and direct conversations with authors to replicate paper figures and supplemental figures ๏ Application of analysis approach to new cancer type now takes minutes • Useful whenTCGA is integrated ๏ Expanded analysis to • compare how sensitive findings were to workflow details 0 2 4 6 −1.0 −0.5 0.0 0.5 1.0 spearman correlation density Correlation between mRNA−protein abundance (within samples) 0 1 2 −4 −3 −2 −1 0 spearman correlation density Correlation between mRNA−protein variation (across samples) Impact on Cancer Multi-Omics
  • 38. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 39. Related Work 1) Discovery Systems ๏ [Lenat 1976] ๏ [Lindsay et al 1980] ๏ [Langley 1981] ๏ [Falkenhainer 1985] ๏ [Kulkarni and Simon 1988] ๏ [Cheeseman et al 1989] ๏ [Zytkow et al 1990] ๏ [Simon 1996] ๏ [Valdes-Perez 1997] ๏ [Todorovski et al 2000] ๏ [Schmidt and Lipson 2009]
  • 40. Related Work: 2) Hypothesis Representation as Graphs ๏ Existing vocabularies are related but need to be extended to represent hypotheses in DISK • SWAN [Gao et al 2006] • EXPO [Soldatova and King 2006] • Nanopublications [Groth et al 2010] • Ovopublications [Callahan and Dumontier 2013] • Micropublications [Clark et al 2014] • LSC • BEL
  • 41. Talk Outline ๏ Motivation ๏ Our Approach & Research Challenges 1. Representing Hypotheses 2. Representing Lines of Inquiry 3. Meta-analysis to review workflow results ๏ DISK Scenario walkthrough ๏ Results in cancer multi-omics ๏ Related work ๏ Contributions and Future Work
  • 42. Contributions ๏ Represent scientist hypotheses • Hypothesis ontology includes revisions & provenance ๏ Formulate lines of inquiry that express how a type of hypothesis can be pursued with a data analysis workflow • Lines of inquiry outline what type of data and workflows to use, and customize them to the hypotheses at hand ๏ Design a meta-analysis to assess the results of lines of inquiry and revise the original hypotheses • Meta-analysis workflows assess diverse evidence
  • 43. Ongoing & Future Work ๏ Ongoing work: • Interactive Discovery Agent that explains interesting findings • Continuous analysis of data (TCGA/CPTAC) as it grows • Extending and generalizing meta-workflows • Using DISK in geosciences: Subsurface water resource modeling ๏ Future challenges: • More complex hypotheses about several entities • Incorporate evidence over time • Designing domain-independent meta-workflows • Resource-bound hypothesis exploration