NETTAB 2013

Alejandra Gonzalez-Beltran
Alejandra Gonzalez-BeltranData Management Team Lead, Software Engineering Group
Bio-GraphIIn: a graph-based,
integrative and semantically enabled
repository for life science
experimental data
Alejandra González-Beltrán, PhD
Oxford e-Research Centre, University of Oxford
alejandra.gonzalezbeltran@oerc.ox.ac.uk
@alegonbel

NETTAB 2013

October 16-18, 2013

Venice Lido, Italy
Experimental workflow
Planning
Use existing
data
Publication

Data Collection

Data
Scientist
Data
Management

Visualization

Analysis

Perform new
experiment
Experimental workflow
Planning

data
+
metadata

Use existing
data
Publication

Data Collection

Data
Scientist
Data
Management

Visualization

Analysis

Perform new
experiment
Experimental workflow
Planning

data
+
metadata

Use existing
data
Publication

Data Collection

Perform new
experiment

Data
Scientist
Data
Management

Visualization

Analysis

y
lit
ibi
uc
d
ro
ep
eR
nc
cie
S
Experimental workflow

Planning

Planning
Use existing
data

Publication

Data Collection

Perform new
experiment

Use existing
data
Publication

Data
Scientist

Data
Scientist
Data
Management

Visualization

Analysis

Data Collection

Data
Management

Visualization

Analysis

ity
il
ab
us
Re
ta
Da

Perform new
experiment
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary and future work
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary and future work
Motivation 1/4

retrospective data submissions
Planning
Use existing
data
Publication

Data Collection

Data
Scientist
Data
Management

Visualization

Analysis

Perform new
experiment
Motivation 1/4

retrospective data submissions
Planning

metadata

Use existing
data

Publication

Data Collection

Data
Scientist
Data
Management

Visualization

retrospective

Analysis

Perform new
experiment
Motivation 1/4

retrospective data submissions
Planning

metadata

Use existing
data

Publication

Data Collection

Perform new
experiment

Data
Scientist
Data
Management

Visualization

retrospective

Analysis

Metadata edits to repositories are not
straightforward, often requiring deleting the
submission and re-submitting the data
Motivation 1/4

prospective

retrospective data submissions
Planning

metadata

metadata

Publication

Use existing
data
Data Collection

metadata
metadata

Data
Scientist
Data
Management

Visualization

metadata

Analysis

metadata

Perform new
experiment
Motivation 1/4

prospective

retrospective data submissions
Planning

metadata

metadata

Publication

Use existing
data
Data Collection

metadata
metadata

Perform new
experiment

Data
Scientist
Data
Management

Visualization

metadata

Analysis

metadata

Support incremental data deposition
+ metadata edits
Motivation 2/4

heterogeneous experimental data

Data Collection
Motivation 3/4

fragmentation of formats and databases

Publication
Motivation 4/4

semantic queries leading to integrative analysis

•

Visualization

Analysis

support for rich and uniform query interface across studies,
enabling integrative data analysis to provide new insights at
systems biology level

•

e.g. find all data files associated with samples from a particular
organism (e.g. Homo Sapiens) and particular tissue type (e.g.
liver)

•

allow to select a set of samples/data files through browsing,
semantic filtering

•

provide links to analysis and visualisation platforms
life science
experiments repo
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary
The Investigation/Study/Assay (

) infrastructure

generic format for experimental
description and data exchange

community engagement

open source software tools

12
investigation

investigation

high level concept to link
related studies

study
the central unit, containing
information on the subject
under study, its characteristics
and any treatments applied.
a study has associated assays

assay
test performed either on
material taken from the subject or on the whole initial
subject, which produce qualitative or quantitative measurements (data)

assay(s)

assay(s)

pointers to data file
names/location

external files in
native or other formats

data

data

• environmental health
• environmental genomics
• metabolomics
• metagenomics
• nanotechnology
• proteomics

• stem cell discovery
• system biology
• transcriptomics
• toxicogenomics
• communities
working to build a
library of cellular
signatures
Experimental workflow - graph representation
H1.sample1

H1.sample1.labeled

...

Scanning

h1-s1.cel

...

Labeling

Scanning

h1-s2.cel

...

Scanning

h2-s1.cel

H1
H. Sapiens
35 Years

H2
H. Sapiens
33 Years

H1.sample2
H2.sample1

Labeling

H2.sample1.labeled
Experimental workflow - graph representation
Labeling

H1.sample1.labeled

...

Scanning

h1-s1.cel

...

H1.sample1

Scanning

h1-s2.cel

...

Scanning

h2-s1.cel

H1
H. Sapiens
35 Years

H2

H1.sample2
Labeling

H2.sample1

H2.sample1.labeled

H. Sapiens
33 Years

Spreadsheets for end-users
...
H1

H. Sapiens

35

Years

H1.sample1

H1

H. Sapiens

35

Years

H1.sample2

H2

H. Sapiens

33

Years

H2.sample1

Labeling

H1.sample1.labeled
H2.sample1.labeled

h1-s1.cel

Scanning
Labeling

Scanning

h1-s2.cel

Scanning

h2-s1.cel

vocabulary for the description of the experimental workflow
Experimental workflow - graph representation
Labeling

H1.sample1.labeled

...

Scanning

h1-s1.cel

...

H1.sample1

Scanning

h1-s2.cel

...

Scanning

h2-s1.cel

H1
H. Sapiens
35 Years

H2

H1.sample2
Labeling

H2.sample1

H2.sample1.labeled

H. Sapiens
33 Years

Spreadsheets for end-users
...
H1

H. Sapiens

35

Years

H1.sample1

H1

H. Sapiens

35

Years

H1.sample2

H2

H. Sapiens

33

Years

H2.sample1

Labeling

H1.sample1.labeled
H2.sample1.labeled

h1-s1.cel

Scanning
Labeling

Scanning

h1-s2.cel

Scanning

h2-s1.cel

vocabulary for the description of the experimental workflow
syntactic interoperability
across biological experiments of different types
Machine-readable representation
Graph + Semantics
obi:material
entity obi:material
sample
tax:homo
sapiens
H1.sample1

obi:material
processing

d
i fie
c
spe _of
s_
bi:i nput
o i
_

labeling1obi:

obi:processed
material

scanning1

d
is_
i fie
c
_o spe
utp ci
spe _of
_
ut fied
i:is put
_o H1.sample1. b in
o
f
_

labeled

isa:raw data
file

ob
i:i
_o s_spe
utp ci
ut fied
_o
f

d labeling2obi
scanning2
ob
:is_
ifie
c
i:is
e f
_o spe
fied
sp _o
ci
_
_o _spe
utp ci
fie H1.sample2. s_spe _of
i:is put
utp ci
ut
d
ob in
: i ut
ut fied
_o
i
_
_o
f
ob inp
H1.sample2
labeled
_
f

isa:executes

H1

ives
bfo:der
from
bfo:
der
_fro ives
m

obi:planned
process

labeling protocol
obi:protocol

semantic interoperability
across biological experiments of different types

h1-s1.cel

h1-s2.cel
architecture)

ISA-TAB
parser!

graph!
analysis!

mappings between the ISA-TAB
syntax and ontologies

isa2owl mapping!
parser!

Configuration!
file!

Resource Description Framework
(RDF)
ISA$OBI'mapping'
Ontology for Biomedical
Investigations
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary
Bio-GraphIIn Requirements
Bio-GraphIIn (pronounced “bio-graphene”) stands for Biological Graph
Investigation Index

BioInvestigation Index (BII)
Bio-GraphIIn Requirements

•
•
•
•
•

support prospective annotation of experiments

•

support Create Read Update Delete (CRUD)
operations

manage heterogeneous biological and biomedical metadata

•

relying on ISA-TAB

support data integration & semantic queries

•

relying on ISA2OWL

links to analysis and visualisation platforms
take advantage of experimental design information,
improving metadata such as including study groups
Functionality provided by existing repositories
& Bio-GrapIIn requirements
Browsing/
Searching

Programmatic
submission

Programmatic
access

SampleTAB

browse/search

X
(email
submission)

REST API

X

X

YES

MAGETAB

browse/filter/
search/
advanced
search

MAGE-TAB
spreadsheet/
MIAMExpress

REST API

X

X

X*

SRA-XML

browse/text/
sequence/
advance search

Webin, REST

REST API

X

X

X

mass
PRIDE
spectromet PRIDE-ML
inspector/
ry
PRIDE Biomart

X
(FTP upload)

Java API

X

X

X

Data Types
BioSample
DB

Format

sample
info

ArrayExpress
sequencing
/GEO

SRA/ENA

PRIDE

BII

Bio-GraphIIn

next
generation
sequencing

All

All

CRUD
Community
operations
curation

RDF

ISA-TAB

browse/text
search/filtering

X

SOAP web
services

X

X

X

ISA-TAB

browse/filter/
search/
advanced
search

YES
(upload, REST)

REST API

YES

YES

YES

*We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF
Functionality provided by existing repositories
& Bio-GrapIIn requirements
Browsing/
Searching

Programmatic
submission

Programmatic
access

SampleTAB

browse/search

X
(email
submission)

REST API

X

X

YES

MAGETAB

browse/filter/
search/
advanced
search

MAGE-TAB
spreadsheet/
MIAMExpress

REST API

X

X

X*

SRA-XML

browse/text/
sequence/
advance search

Webin, REST

REST API

X

X

X

mass
PRIDE
spectromet PRIDE-ML
inspector/
ry
PRIDE Biomart

X
(FTP upload)

Java API

X

X

X

Data Types
BioSample
DB

Format

sample
info

ArrayExpress
sequencing
/GEO

SRA/ENA

PRIDE

BII

Bio-GraphIIn

next
generation
sequencing

All

All

CRUD
Community
operations
curation

RDF

ISA-TAB

browse/text
search/filtering

X

SOAP web
services

X

X

X

ISA-TAB

browse/filter/
search/
advanced
sing
search
row pe

YES
(upload, REST)

REST API

YES

YES

YES

b oty
rot
p

e
typ
oto
pr

e
typ
o
rot
p

e
typ
o
rot
p

*We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF

e
typ
o
rot
p
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary
NETTAB 2013
semantic representation of the graph,
rich queries over common
semantic framework enabling
integration with other repositories
independence from underlying
graph technology
property graphs
http://www.tinkerpop.com/

independence from underlying
graph technology
NETTAB 2013
R SPARQL package
http://www.r-bloggers.com/sparql-with-r-in-less-than-5-minutes/

http://refinery-platform.org/

Django-based analysis and visualisation
platform, relies on ISA-TAB metadata
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary and future work
SPARQL queries
SELECT DISTINCT
?i_id ?s_id ?s_title ?organism
WHERE {
?study rdf:type obi:0000066. obi:investigation_title
?study rdfs:label ?s_id.
?s_title_iri rdf:type obi:0001622. obi:investigation
?s_title_iri iao:0000219 ?study. iao:denotes
?s_title_iri isa:00000089 ?s_title.
?source rdf:type bfo:0000040. bfo:material_entity
?source obi:0000295 ?study. obi:is_specified_input_of
OPTIONAL {
?study bfo:0000050 ?investigation. bfo:part_of
?investigation rdf:type obi:0000011. obi:planned_process
?investigation rdfs:label ?i_id.
}
OPTIONAL {
?source rdf:type ?organism_iri.
?organism_iri rdf:type obi:0100026. obi:organism
?organism_iri rdfs:label ?organism.
}
OPTIONAL {
?source bfo:0000053 ?characteristic.
?characteristic rdf:type bfo:0000005.bfo:dependent continuant
?characteristic rdfs:comment ?comment.
?characteristic rdfs:label ?organism.
FILTER regex(str(?comment), "organism")
}
}

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bfo: <http://purl.obolibrary.org/obo/BFO_>
PREFIX iao: <http://purl.obolibrary.org/obo/IAO_>
PREFIX obi: <http://purl.obolibrary.org/obo/OBI_>
PREFIX tax: <http://purl.obolibrary.org/obo/NCBITaxon_>
PREFIX isa: <http://purl.org/isa-tools/ISA_>
PREFIX ro: <http://purl.obolibrary.org/obo/RO_>

Considering theoretical results on SPARQL
to improve query performance, such as
AND-OPT well-designed graph patterns
Pérez et al, Semantics and complexity of
SPARQL, ACM Trans Database Syst. 2009
Letelier et al. Static analysis and
optimization of semantic web queries
PODS 2012.
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary and future work
NETTAB 2013
investigation studies

assays
measurement

technology
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
NETTAB 2013
http://bii.oerc.ox.ac.uk
http://bii.oerc.ox.ac.uk
Outline

•

•
•
•
•
•
•

Motivation for an integrative and semanticallyenabled metadata repository in life sciences
retrospective data submissions
heterogeneous experimental data
fragmentation of formats and databases
semantic queries leading to integrative analysis

•
•
•
•

Context: the ISA infrastructure
Bio-GraphIIn requirements
Bio-GraphIIn design & architecture
Bio-GraphIIn graph queries
Bio-GraphIIn prototype
Summary and future work
Summary and future work

•

Bio-GraphIIn - the new integrative and semantically -enabled
repository for the ISA infrastructure: motivation,
requirements, design & architecture, prototype

•

Support for data integration, uniform semantic queries
across experiments enabled by a common semantic
framework (ISA2OWL)

•

More work required on

•

Querying: performance analysis, support for ad hoc
queries

•
•

Extension/improvement of prototype
Interfaces to services (e.g. BioPortal) and analysis/
visualisation platforms (e.g. R/Bioconductor & Refinery)
funders
Thanks for your attention!
Questions?
You can email us...
isatools@googlegroups.com
View our website
http://www.isa-tools.org
View our Git repo & contribute
http://github.com/ISA-tools
View our blog
http://isatools.wordpress.com
Follow us on Twitter
@isatools
1 de 61

Más contenido relacionado

La actualidad más candente(20)

Similar a NETTAB 2013(20)

Más de Alejandra Gonzalez-Beltran(12)

The Software Sustainability Institute FellowshipThe Software Sustainability Institute Fellowship
The Software Sustainability Institute Fellowship
Alejandra Gonzalez-Beltran322 vistas
CMSO Minimal reporting requirementsCMSO Minimal reporting requirements
CMSO Minimal reporting requirements
Alejandra Gonzalez-Beltran368 vistas
Datasets with bioschemasDatasets with bioschemas
Datasets with bioschemas
Alejandra Gonzalez-Beltran332 vistas
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
Alejandra Gonzalez-Beltran433 vistas
ISA commons - overview and latest developmentsISA commons - overview and latest developments
ISA commons - overview and latest developments
Alejandra Gonzalez-Beltran521 vistas
Metadata for Interoperable BioscienceMetadata for Interoperable Bioscience
Metadata for Interoperable Bioscience
Alejandra Gonzalez-Beltran487 vistas
COPO kick-off meetingCOPO kick-off meeting
COPO kick-off meeting
Alejandra Gonzalez-Beltran976 vistas
BCU 2013BCU 2013
BCU 2013
Alejandra Gonzalez-Beltran394 vistas
SELENfest 2012SELENfest 2012
SELENfest 2012
Alejandra Gonzalez-Beltran591 vistas

Último(20)

Use of Probiotics in Aquaculture.pptxUse of Probiotics in Aquaculture.pptx
Use of Probiotics in Aquaculture.pptx
AKSHAY MANDAL72 vistas
231112 (WR) v1  ChatGPT OEB 2023.pdf231112 (WR) v1  ChatGPT OEB 2023.pdf
231112 (WR) v1 ChatGPT OEB 2023.pdf
WilfredRubens.com118 vistas
STYP infopack.pdfSTYP infopack.pdf
STYP infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego159 vistas
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdfCWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
SukhwinderSingh895865480 vistas
GSoC 2024GSoC 2024
GSoC 2024
DeveloperStudentClub1056 vistas
Streaming Quiz 2023.pdfStreaming Quiz 2023.pdf
Streaming Quiz 2023.pdf
Quiz Club NITW97 vistas
Lecture: Open InnovationLecture: Open Innovation
Lecture: Open Innovation
Michal Hron94 vistas
ACTIVITY BOOK key water sports.pptxACTIVITY BOOK key water sports.pptx
ACTIVITY BOOK key water sports.pptx
Mar Caston Palacio275 vistas
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 Breakdown
WestHatch53 vistas
Psychology KS4Psychology KS4
Psychology KS4
WestHatch54 vistas
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptx
DrHafizKosar87 vistas
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1460 vistas
Nico Baumbach IMR Media ComponentNico Baumbach IMR Media Component
Nico Baumbach IMR Media Component
InMediaRes1368 vistas
Industry4wrd.pptxIndustry4wrd.pptx
Industry4wrd.pptx
BC Chew157 vistas
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Debapriya Chakraborty479 vistas

NETTAB 2013

  • 1. Bio-GraphIIn: a graph-based, integrative and semantically enabled repository for life science experimental data Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of Oxford alejandra.gonzalezbeltran@oerc.ox.ac.uk @alegonbel NETTAB 2013 October 16-18, 2013 Venice Lido, Italy
  • 2. Experimental workflow Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  • 3. Experimental workflow Planning data + metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  • 4. Experimental workflow Planning data + metadata Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization Analysis y lit ibi uc d ro ep eR nc cie S
  • 5. Experimental workflow Planning Planning Use existing data Publication Data Collection Perform new experiment Use existing data Publication Data Scientist Data Scientist Data Management Visualization Analysis Data Collection Data Management Visualization Analysis ity il ab us Re ta Da Perform new experiment
  • 6. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • 7. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • 8. Motivation 1/4 retrospective data submissions Planning Use existing data Publication Data Collection Data Scientist Data Management Visualization Analysis Perform new experiment
  • 9. Motivation 1/4 retrospective data submissions Planning metadata Use existing data Publication Data Collection Data Scientist Data Management Visualization retrospective Analysis Perform new experiment
  • 10. Motivation 1/4 retrospective data submissions Planning metadata Use existing data Publication Data Collection Perform new experiment Data Scientist Data Management Visualization retrospective Analysis Metadata edits to repositories are not straightforward, often requiring deleting the submission and re-submitting the data
  • 11. Motivation 1/4 prospective retrospective data submissions Planning metadata metadata Publication Use existing data Data Collection metadata metadata Data Scientist Data Management Visualization metadata Analysis metadata Perform new experiment
  • 12. Motivation 1/4 prospective retrospective data submissions Planning metadata metadata Publication Use existing data Data Collection metadata metadata Perform new experiment Data Scientist Data Management Visualization metadata Analysis metadata Support incremental data deposition + metadata edits
  • 14. Motivation 3/4 fragmentation of formats and databases Publication
  • 15. Motivation 4/4 semantic queries leading to integrative analysis • Visualization Analysis support for rich and uniform query interface across studies, enabling integrative data analysis to provide new insights at systems biology level • e.g. find all data files associated with samples from a particular organism (e.g. Homo Sapiens) and particular tissue type (e.g. liver) • allow to select a set of samples/data files through browsing, semantic filtering • provide links to analysis and visualisation platforms life science experiments repo
  • 16. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  • 17. The Investigation/Study/Assay ( ) infrastructure generic format for experimental description and data exchange community engagement open source software tools 12
  • 18. investigation investigation high level concept to link related studies study the central unit, containing information on the subject under study, its characteristics and any treatments applied. a study has associated assays assay test performed either on material taken from the subject or on the whole initial subject, which produce qualitative or quantitative measurements (data) assay(s) assay(s) pointers to data file names/location external files in native or other formats data data • environmental health • environmental genomics • metabolomics • metagenomics • nanotechnology • proteomics • stem cell discovery • system biology • transcriptomics • toxicogenomics • communities working to build a library of cellular signatures
  • 19. Experimental workflow - graph representation H1.sample1 H1.sample1.labeled ... Scanning h1-s1.cel ... Labeling Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H. Sapiens 33 Years H1.sample2 H2.sample1 Labeling H2.sample1.labeled
  • 20. Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow
  • 21. Experimental workflow - graph representation Labeling H1.sample1.labeled ... Scanning h1-s1.cel ... H1.sample1 Scanning h1-s2.cel ... Scanning h2-s1.cel H1 H. Sapiens 35 Years H2 H1.sample2 Labeling H2.sample1 H2.sample1.labeled H. Sapiens 33 Years Spreadsheets for end-users ... H1 H. Sapiens 35 Years H1.sample1 H1 H. Sapiens 35 Years H1.sample2 H2 H. Sapiens 33 Years H2.sample1 Labeling H1.sample1.labeled H2.sample1.labeled h1-s1.cel Scanning Labeling Scanning h1-s2.cel Scanning h2-s1.cel vocabulary for the description of the experimental workflow syntactic interoperability across biological experiments of different types
  • 22. Machine-readable representation Graph + Semantics obi:material entity obi:material sample tax:homo sapiens H1.sample1 obi:material processing d i fie c spe _of s_ bi:i nput o i _ labeling1obi: obi:processed material scanning1 d is_ i fie c _o spe utp ci spe _of _ ut fied i:is put _o H1.sample1. b in o f _ labeled isa:raw data file ob i:i _o s_spe utp ci ut fied _o f d labeling2obi scanning2 ob :is_ ifie c i:is e f _o spe fied sp _o ci _ _o _spe utp ci fie H1.sample2. s_spe _of i:is put utp ci ut d ob in : i ut ut fied _o i _ _o f ob inp H1.sample2 labeled _ f isa:executes H1 ives bfo:der from bfo: der _fro ives m obi:planned process labeling protocol obi:protocol semantic interoperability across biological experiments of different types h1-s1.cel h1-s2.cel
  • 23. architecture) ISA-TAB parser! graph! analysis! mappings between the ISA-TAB syntax and ontologies isa2owl mapping! parser! Configuration! file! Resource Description Framework (RDF)
  • 25. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  • 26. Bio-GraphIIn Requirements Bio-GraphIIn (pronounced “bio-graphene”) stands for Biological Graph Investigation Index BioInvestigation Index (BII)
  • 27. Bio-GraphIIn Requirements • • • • • support prospective annotation of experiments • support Create Read Update Delete (CRUD) operations manage heterogeneous biological and biomedical metadata • relying on ISA-TAB support data integration & semantic queries • relying on ISA2OWL links to analysis and visualisation platforms take advantage of experimental design information, improving metadata such as including study groups
  • 28. Functionality provided by existing repositories & Bio-GrapIIn requirements Browsing/ Searching Programmatic submission Programmatic access SampleTAB browse/search X (email submission) REST API X X YES MAGETAB browse/filter/ search/ advanced search MAGE-TAB spreadsheet/ MIAMExpress REST API X X X* SRA-XML browse/text/ sequence/ advance search Webin, REST REST API X X X mass PRIDE spectromet PRIDE-ML inspector/ ry PRIDE Biomart X (FTP upload) Java API X X X Data Types BioSample DB Format sample info ArrayExpress sequencing /GEO SRA/ENA PRIDE BII Bio-GraphIIn next generation sequencing All All CRUD Community operations curation RDF ISA-TAB browse/text search/filtering X SOAP web services X X X ISA-TAB browse/filter/ search/ advanced search YES (upload, REST) REST API YES YES YES *We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF
  • 29. Functionality provided by existing repositories & Bio-GrapIIn requirements Browsing/ Searching Programmatic submission Programmatic access SampleTAB browse/search X (email submission) REST API X X YES MAGETAB browse/filter/ search/ advanced search MAGE-TAB spreadsheet/ MIAMExpress REST API X X X* SRA-XML browse/text/ sequence/ advance search Webin, REST REST API X X X mass PRIDE spectromet PRIDE-ML inspector/ ry PRIDE Biomart X (FTP upload) Java API X X X Data Types BioSample DB Format sample info ArrayExpress sequencing /GEO SRA/ENA PRIDE BII Bio-GraphIIn next generation sequencing All All CRUD Community operations curation RDF ISA-TAB browse/text search/filtering X SOAP web services X X X ISA-TAB browse/filter/ search/ advanced sing search row pe YES (upload, REST) REST API YES YES YES b oty rot p e typ oto pr e typ o rot p e typ o rot p *We are referring to the ArrayExpress repository not to the Expression Atlas, which is available in RDF e typ o rot p
  • 30. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary
  • 32. semantic representation of the graph, rich queries over common semantic framework enabling integration with other repositories
  • 37. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • 38. SPARQL queries SELECT DISTINCT ?i_id ?s_id ?s_title ?organism WHERE { ?study rdf:type obi:0000066. obi:investigation_title ?study rdfs:label ?s_id. ?s_title_iri rdf:type obi:0001622. obi:investigation ?s_title_iri iao:0000219 ?study. iao:denotes ?s_title_iri isa:00000089 ?s_title. ?source rdf:type bfo:0000040. bfo:material_entity ?source obi:0000295 ?study. obi:is_specified_input_of OPTIONAL { ?study bfo:0000050 ?investigation. bfo:part_of ?investigation rdf:type obi:0000011. obi:planned_process ?investigation rdfs:label ?i_id. } OPTIONAL { ?source rdf:type ?organism_iri. ?organism_iri rdf:type obi:0100026. obi:organism ?organism_iri rdfs:label ?organism. } OPTIONAL { ?source bfo:0000053 ?characteristic. ?characteristic rdf:type bfo:0000005.bfo:dependent continuant ?characteristic rdfs:comment ?comment. ?characteristic rdfs:label ?organism. FILTER regex(str(?comment), "organism") } } PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX bfo: <http://purl.obolibrary.org/obo/BFO_> PREFIX iao: <http://purl.obolibrary.org/obo/IAO_> PREFIX obi: <http://purl.obolibrary.org/obo/OBI_> PREFIX tax: <http://purl.obolibrary.org/obo/NCBITaxon_> PREFIX isa: <http://purl.org/isa-tools/ISA_> PREFIX ro: <http://purl.obolibrary.org/obo/RO_> Considering theoretical results on SPARQL to improve query performance, such as AND-OPT well-designed graph patterns Pérez et al, Semantics and complexity of SPARQL, ACM Trans Database Syst. 2009 Letelier et al. Static analysis and optimization of semantic web queries PODS 2012.
  • 39. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • 58. Outline • • • • • • • Motivation for an integrative and semanticallyenabled metadata repository in life sciences retrospective data submissions heterogeneous experimental data fragmentation of formats and databases semantic queries leading to integrative analysis • • • • Context: the ISA infrastructure Bio-GraphIIn requirements Bio-GraphIIn design & architecture Bio-GraphIIn graph queries Bio-GraphIIn prototype Summary and future work
  • 59. Summary and future work • Bio-GraphIIn - the new integrative and semantically -enabled repository for the ISA infrastructure: motivation, requirements, design & architecture, prototype • Support for data integration, uniform semantic queries across experiments enabled by a common semantic framework (ISA2OWL) • More work required on • Querying: performance analysis, support for ad hoc queries • • Extension/improvement of prototype Interfaces to services (e.g. BioPortal) and analysis/ visualisation platforms (e.g. R/Bioconductor & Refinery)
  • 61. Thanks for your attention! Questions? You can email us... isatools@googlegroups.com View our website http://www.isa-tools.org View our Git repo & contribute http://github.com/ISA-tools View our blog http://isatools.wordpress.com Follow us on Twitter @isatools