SlideShare a Scribd company logo
1 of 18
Download to read offline
Use of Open Linked Data in
Bioinformatics Space:
A Case Study
Remzi Çelebi
Department of Computer Engineering,
Ege University
İzmir, Turkey
remzi.celebi@ege.edu.tr
Özgür Gümüş
Department of Computer Engineering,
Ege University
İzmir, Turkey
ozgur.gumus@ege.edu.tr
Yeşim Aydın Son
Department of Health Informatics,
Middle East Technical University
Ankara, Turkey
yesim@metu.edu.tr
Outline
●

Semantic Web – very brief intro
–

●

RDF, SPARQL, Linked Data

Use Case Senarios

●

Conclusion

●

Future work
Semantic Web
●

●

Semantic Web , the next generation of web,
is considered as an extension of the current
web and provides a framework for
integration of the data from heterogeneous
resources.
The semantic web enables machines to
perform more of the tedious work involved in
finding, combining and extracting information
on the web.
Semantic Web – Open Linked Data
●

●

●

Open linked data is a new approach, which utilizes
the semantic web technology to publish, integrate
and analyze open data on web.
Open linked data suggests that the data on web
should be linked and be open for use of practical
applications.
It provides two kinds of advantages: ability to search
multiple datasets through a single framework and
ability to search relationships and paths of
relationships that go across different datasets.
Semantic Web Technologies - RDF
●

●

●

Resource Description Framework (RDF) is the
most fundamental way of describing resources
and relationships between them in the Semantic
Web.
An RDF triple is a statement about a resource in
the form of subject-predicate-object expression.
RDF can be represented in variety of formats,
including XML and JSON.
Uniform Resource Identifier - URI
●

"The generic set of all names/addresses that
are short strings that refer to resources"
–

●

URLs (Uniform Resource Locators) are a particular
type of URI, used for resources that can be
accessed on the WWW (e.g., web pages)

In RDF, URIs typically look like “normal” URLs,
often with fragment identifiers to point at
specific parts of a document:

Example:
Shorthand notation gene:BRCA1
●

The PREFIX keyword is used to describe short form of resources
PREFIX gene: http://www.bio2rdf.org/gene:

7
SPARQL
●

●

●

SPARQL is a query language to retrieve and
manipulate data in RDF format.
A SPARQL endpoint is a service which
provides a SPARQL-queryable interface to a
set of RDF statements stored in a triple-store.
SPARQL searches for all subgraphs that
match the graph described by the triples in the
query.
SELECT * WHERE { ?subject ?predicate ?object . }
8
Semantic Web for Health Care
and Bioinformatics
●

●

There is a big data cloud including the
information about genes, proteins, gene
networks, protein-protein interactions,
genetic variations, chemical compounds,
diseases and drugs in diverse formats.
The complexity of life sciences comes
from the integration and the analysis of
enormous amount of data obtained by
research from these variety of domains.
9
Bio2RDF Project
●

●

●

Creating a knowledge space of RDF documents
linked together with normalized URIs and sharing
a common ontology.
Documents from public bioinformatics databases
such as KEGG, PDB, MGI, HGNC and several of
NCBI’s databases are available in RDF format
through a unique URL in the form of
http://bio2rdf.org/namespace:id.
Bio2RDF has created a RDF warehouse that
serves over 70 million triples describing the human 10
and mouse genomes.
Bio2RDF
●

Bio2RDF is unique in several ways from previous
efforts that has been provisioning life sciences
with linked data such as Neurocommons,
LinkedLifeData, W3C HCLS, Chem2Bio2RDF and
BioLOD,
–

First, Bio2RDF gives unique linked data
vocabulary and topology.

–

Second, Bio2RDF produces syntactically
interoperable linked data across all datasets by
defining a set of basic guidelines.

–

Third, the community can benefit from Bio2RDF
infrastructure with an expandable global network of
mirrors that host Bio2RDF datasets and a
federated network of SPARQL end-points.

–

Finally, Bio2RDF is open source and freely
available for use, modify or redistribute.

11
Use Case Scenario
●

As a case study, to reveal the capabilities and
benefits of Bio2RDF project, we defined the
following question:
For a given pathway, what are the
diseases associated to the individual
genes in the pathway?

●

To get the answer of this question, a set of
data sources are required, CDT, OMIM, NCBI
Gene. These datasets can be queried on the
web as part of Bio2RDF project.
12
Use Case Scenario
a) Query-1

CTD for gene-pathway
information

PREFIX ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:>
SELECT ?geneID
WHERE {

?geneID
}

ctd_vocabulary:pathway

<http://bio2rdf.org/kegg:04520> .
b) Query-2

OMIM for gene-disease
association

PREFIX omim_vocabulary: http://bio2rdf.org/omim_vocabulary:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?gene ?pheno
WHERE {

}

?gene omim_vocabulary:phenotype ?pheno .
?pheno rdf:type omim_vocabulary:Phenotype .
?gene rdf:type omim_vocabulary:Gene .
c) Query-3

PREFIX geneid_vocabulary: <http://bio2rdf.org/geneid_vocabulary:>
SELECT ?geneID ?ensemblID
WHERE {
}

NCBI Gene for conversion of
geneid to ENSEMBL id

?geneID geneid_vocabulary:has_ensembl_gene_identifier ?ensemblID .

13
Merged Query
PREFIX
PREFIX
PREFIX
PREFIX

omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:>
geneid_vocabulary: <http://bio2rdf.org/geneid_vocabulary:>

SELECT ?geneID ?pheno
WHERE {
?geneID ctd_vocabulary:pathway <http://bio2rdf.org/kegg:04520> .
?gene omim_vocabulary:xref ?geneID .
?gene omim_vocabulary:phenotype ?pheno.
?pheno rdf:type omim_vocabulary:Phenotype .
?gene rdf:type omim_vocabulary:Gene .
?geneID geneid_vocabulary:has_ensembl_gene_identifier ?ensemblID .
}

Federated Query
PREFIX
PREFIX
PREFIX
PREFIX
SELECT

omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:>
geneid_vocabulary: <http://bio2rdf.org/geneid_vocabulary:>
?ensemblID ?pheno

WHERE {
SERVICE <http://cu.ctd.bio2rdf.org/sparql>
{
?geneID ctd_vocabulary:pathway <http://bio2rdf.org/kegg:04520> .
}
SERVICE <http://cu.omim.bio2rdf.org/sparql>
{
?gene omim_vocabulary:xref ?geneID .
?gene omim_vocabulary:phenotype ?pheno.
?pheno rdf:type omim_vocabulary:Phenotype .
?gene rdf:type omim_vocabulary:Gene .
Figure 2: a) Merged Query and b) Federated Query for the question defined
}
SERVICE <http://cu.gene.bio2rdf.org/sparql>
{
?geneID geneid_vocabulary:has_ensembl_gene_identifier ?ensemblID
}
}

14
Results
●

When the results from both

BioMart (after providing KEGG Gene IDs) and
Bio2RDF (all-in-one-step) searches are
compared for the gene ID-OMIM ID matches
●

●

Bio2RDF matched 27 unique ENSEMBL gene
IDs from KEGG04520 pathway with 59 OMIM
IDs, whereas BioMart results only included 50 of
OMIM IDs for the same query, without any
additional matches.
The difference between the result set is likely to
be due to the version of the OMIM searched by
both services. Validity of all results is confirmed
through current build of the OMIM database.

15
More Use Cases
Finding important genes (hub genes) through
pathway related disease

PREFIX ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:>
PREFIX omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?symbol count(distinct ?pathway) as ?indirect_num
WHERE
<http://bio2rdf.org/mesh:D006349> ctd_vocabulary:pathway ?pathway .
?geneid ctd_vocabulary:pathway
?pathway .
?geneid rdf:type
ctd_vocabulary:Gene .
?geneid ctd_vocabulary:gene-symbol ?symbol .
}
GROUP BY ?symbol
ORDER BY DESC( ?indirect_num )

16
More Use Cases
Finding diseases related given SNP by rsid through gene association
PREFIX ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:>
PREFIX omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:>
PREFIX pharmgkb_vocabulary: <http://bio2rdf.org/pharmgkb_vocabulary:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT distinct ?disease_label
WHERE {
?assoc rdf:type pharmgkb_vocabulary:Disease-Gene-Association .
?assoc pharmgkb_vocabulary:disease ?disease .
?disease rdfs:label ?disease_label .
?assoc pharmgkb_vocabulary:gene ?gene .
?rsid pharmgkb_vocabulary:gene ?gene .
FILTER regex( str(?rsid), "rs1801253" ) .
}

17
Conclusion
●

●

Through the use case (pathway-gene-disease)
build here, we have showed that with Bio2RDF
datasets, different queries can be flexibly build,
merged and run in a federated fashion in order
to correctly retrieve data in a single run, which
is not possible to get from another single
database or service.
In this paper, a use case involving to query
multiple distant data sources which are
semantically available through Bio2RDF is
defined. Also, the results are compared and
validated by traditional search techniques.

18
Future works
●

This work will continue in two directions:
–

first direction will be developing a web
interface that helps the researchers to
query multiple data sources by using
some visual query templates without
RDF and/or SPARQL knowledge

–

second direction will be developing a
monitoring system that helps the
researchers to be aware of updates
about data related to their research
from multiple data sources.

19

More Related Content

What's hot

Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahuKAUSHAL SAHU
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discoverySyed Ahmad Chan Bukhari, PhD
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathwaysJeff Kiefer
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)ZoufishanY
 
FAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataIan Fore
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Elia Brodsky
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.Elena Sügis
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)Pistoia Alliance
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
How to submit a sequence in NCBI
How to submit a sequence in NCBIHow to submit a sequence in NCBI
How to submit a sequence in NCBIMinhaz Ahmed
 
Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Akash Arora
 

What's hot (20)

Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
A semantic framework for biomedical image discovery
A semantic framework for biomedical image discoveryA semantic framework for biomedical image discovery
A semantic framework for biomedical image discovery
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 
Psb tutorial cancer_pathways
Psb tutorial cancer_pathwaysPsb tutorial cancer_pathways
Psb tutorial cancer_pathways
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
FAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic DataFAIR as a Working Principle for Cancer Genomic Data
FAIR as a Working Principle for Cancer Genomic Data
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Introduction to Bioinformatics.
 Introduction to Bioinformatics. Introduction to Bioinformatics.
Introduction to Bioinformatics.
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
How to submit a sequence in NCBI
How to submit a sequence in NCBIHow to submit a sequence in NCBI
How to submit a sequence in NCBI
 
Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research
 

Similar to Use of Open Linked Data in Bioinformatics Space: A Case Study on Pathway-Gene-Disease Relationships

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoptionguest262aaa
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Dag Endresen
 
Microarrays Databases.pptx
Microarrays Databases.pptxMicroarrays Databases.pptx
Microarrays Databases.pptxMuzzamilahmed14
 
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...Dag Endresen
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology Bin Chen
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Mathew Varghese
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BigData_Europe
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
 
Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)Dag Endresen
 
Towards enhancing the user experience of ChIP-Seq data analysis web tools
Towards enhancing the user experience of ChIP-Seq data  analysis web toolsTowards enhancing the user experience of ChIP-Seq data  analysis web tools
Towards enhancing the user experience of ChIP-Seq data analysis web toolsIJECEIAES
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)Dag Endresen
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticssarwat bashir
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and PathwaysMichel Dumontier
 

Similar to Use of Open Linked Data in Bioinformatics Space: A Case Study on Pathway-Gene-Disease Relationships (20)

2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
 
Semantic Web Adoption
Semantic Web AdoptionSemantic Web Adoption
Semantic Web Adoption
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Linked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter HaaseLinked Data and Semantic Web Application Development by Peter Haase
Linked Data and Semantic Web Application Development by Peter Haase
 
Microarrays Databases.pptx
Microarrays Databases.pptxMicroarrays Databases.pptx
Microarrays Databases.pptx
 
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...Global Information Systems for Plant Genetic Resources, SeedNet training cour...
Global Information Systems for Plant Genetic Resources, SeedNet training cour...
 
Towards semantic systems chemical biology
Towards semantic systems chemical biology Towards semantic systems chemical biology
Towards semantic systems chemical biology
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...A consistent and efficient graphical User Interface Design and Querying Organ...
A consistent and efficient graphical User Interface Design and Querying Organ...
 
Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)Global Information Systems for Plant Genetic Resources (2009)
Global Information Systems for Plant Genetic Resources (2009)
 
Towards enhancing the user experience of ChIP-Seq data analysis web tools
Towards enhancing the user experience of ChIP-Seq data  analysis web toolsTowards enhancing the user experience of ChIP-Seq data  analysis web tools
Towards enhancing the user experience of ChIP-Seq data analysis web tools
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
DisGeNET Tutorial SWAT4LS 2015-12-07
DisGeNET Tutorial SWAT4LS 2015-12-07DisGeNET Tutorial SWAT4LS 2015-12-07
DisGeNET Tutorial SWAT4LS 2015-12-07
 
Pathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformaticsPathways and genomes databases in bioinformatics
Pathways and genomes databases in bioinformatics
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
 

Recently uploaded

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 

Recently uploaded (20)

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 

Use of Open Linked Data in Bioinformatics Space: A Case Study on Pathway-Gene-Disease Relationships

  • 1. Use of Open Linked Data in Bioinformatics Space: A Case Study Remzi Çelebi Department of Computer Engineering, Ege University İzmir, Turkey remzi.celebi@ege.edu.tr Özgür Gümüş Department of Computer Engineering, Ege University İzmir, Turkey ozgur.gumus@ege.edu.tr Yeşim Aydın Son Department of Health Informatics, Middle East Technical University Ankara, Turkey yesim@metu.edu.tr
  • 2. Outline ● Semantic Web – very brief intro – ● RDF, SPARQL, Linked Data Use Case Senarios ● Conclusion ● Future work
  • 3. Semantic Web ● ● Semantic Web , the next generation of web, is considered as an extension of the current web and provides a framework for integration of the data from heterogeneous resources. The semantic web enables machines to perform more of the tedious work involved in finding, combining and extracting information on the web.
  • 4. Semantic Web – Open Linked Data ● ● ● Open linked data is a new approach, which utilizes the semantic web technology to publish, integrate and analyze open data on web. Open linked data suggests that the data on web should be linked and be open for use of practical applications. It provides two kinds of advantages: ability to search multiple datasets through a single framework and ability to search relationships and paths of relationships that go across different datasets.
  • 5. Semantic Web Technologies - RDF ● ● ● Resource Description Framework (RDF) is the most fundamental way of describing resources and relationships between them in the Semantic Web. An RDF triple is a statement about a resource in the form of subject-predicate-object expression. RDF can be represented in variety of formats, including XML and JSON.
  • 6. Uniform Resource Identifier - URI ● "The generic set of all names/addresses that are short strings that refer to resources" – ● URLs (Uniform Resource Locators) are a particular type of URI, used for resources that can be accessed on the WWW (e.g., web pages) In RDF, URIs typically look like “normal” URLs, often with fragment identifiers to point at specific parts of a document: Example: Shorthand notation gene:BRCA1 ● The PREFIX keyword is used to describe short form of resources PREFIX gene: http://www.bio2rdf.org/gene: 7
  • 7. SPARQL ● ● ● SPARQL is a query language to retrieve and manipulate data in RDF format. A SPARQL endpoint is a service which provides a SPARQL-queryable interface to a set of RDF statements stored in a triple-store. SPARQL searches for all subgraphs that match the graph described by the triples in the query. SELECT * WHERE { ?subject ?predicate ?object . } 8
  • 8. Semantic Web for Health Care and Bioinformatics ● ● There is a big data cloud including the information about genes, proteins, gene networks, protein-protein interactions, genetic variations, chemical compounds, diseases and drugs in diverse formats. The complexity of life sciences comes from the integration and the analysis of enormous amount of data obtained by research from these variety of domains. 9
  • 9. Bio2RDF Project ● ● ● Creating a knowledge space of RDF documents linked together with normalized URIs and sharing a common ontology. Documents from public bioinformatics databases such as KEGG, PDB, MGI, HGNC and several of NCBI’s databases are available in RDF format through a unique URL in the form of http://bio2rdf.org/namespace:id. Bio2RDF has created a RDF warehouse that serves over 70 million triples describing the human 10 and mouse genomes.
  • 10. Bio2RDF ● Bio2RDF is unique in several ways from previous efforts that has been provisioning life sciences with linked data such as Neurocommons, LinkedLifeData, W3C HCLS, Chem2Bio2RDF and BioLOD, – First, Bio2RDF gives unique linked data vocabulary and topology. – Second, Bio2RDF produces syntactically interoperable linked data across all datasets by defining a set of basic guidelines. – Third, the community can benefit from Bio2RDF infrastructure with an expandable global network of mirrors that host Bio2RDF datasets and a federated network of SPARQL end-points. – Finally, Bio2RDF is open source and freely available for use, modify or redistribute. 11
  • 11. Use Case Scenario ● As a case study, to reveal the capabilities and benefits of Bio2RDF project, we defined the following question: For a given pathway, what are the diseases associated to the individual genes in the pathway? ● To get the answer of this question, a set of data sources are required, CDT, OMIM, NCBI Gene. These datasets can be queried on the web as part of Bio2RDF project. 12
  • 12. Use Case Scenario a) Query-1 CTD for gene-pathway information PREFIX ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:> SELECT ?geneID WHERE { ?geneID } ctd_vocabulary:pathway <http://bio2rdf.org/kegg:04520> . b) Query-2 OMIM for gene-disease association PREFIX omim_vocabulary: http://bio2rdf.org/omim_vocabulary:> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?gene ?pheno WHERE { } ?gene omim_vocabulary:phenotype ?pheno . ?pheno rdf:type omim_vocabulary:Phenotype . ?gene rdf:type omim_vocabulary:Gene . c) Query-3 PREFIX geneid_vocabulary: <http://bio2rdf.org/geneid_vocabulary:> SELECT ?geneID ?ensemblID WHERE { } NCBI Gene for conversion of geneid to ENSEMBL id ?geneID geneid_vocabulary:has_ensembl_gene_identifier ?ensemblID . 13
  • 13. Merged Query PREFIX PREFIX PREFIX PREFIX omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:> rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:> geneid_vocabulary: <http://bio2rdf.org/geneid_vocabulary:> SELECT ?geneID ?pheno WHERE { ?geneID ctd_vocabulary:pathway <http://bio2rdf.org/kegg:04520> . ?gene omim_vocabulary:xref ?geneID . ?gene omim_vocabulary:phenotype ?pheno. ?pheno rdf:type omim_vocabulary:Phenotype . ?gene rdf:type omim_vocabulary:Gene . ?geneID geneid_vocabulary:has_ensembl_gene_identifier ?ensemblID . } Federated Query PREFIX PREFIX PREFIX PREFIX SELECT omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:> rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:> geneid_vocabulary: <http://bio2rdf.org/geneid_vocabulary:> ?ensemblID ?pheno WHERE { SERVICE <http://cu.ctd.bio2rdf.org/sparql> { ?geneID ctd_vocabulary:pathway <http://bio2rdf.org/kegg:04520> . } SERVICE <http://cu.omim.bio2rdf.org/sparql> { ?gene omim_vocabulary:xref ?geneID . ?gene omim_vocabulary:phenotype ?pheno. ?pheno rdf:type omim_vocabulary:Phenotype . ?gene rdf:type omim_vocabulary:Gene . Figure 2: a) Merged Query and b) Federated Query for the question defined } SERVICE <http://cu.gene.bio2rdf.org/sparql> { ?geneID geneid_vocabulary:has_ensembl_gene_identifier ?ensemblID } } 14
  • 14. Results ● When the results from both BioMart (after providing KEGG Gene IDs) and Bio2RDF (all-in-one-step) searches are compared for the gene ID-OMIM ID matches ● ● Bio2RDF matched 27 unique ENSEMBL gene IDs from KEGG04520 pathway with 59 OMIM IDs, whereas BioMart results only included 50 of OMIM IDs for the same query, without any additional matches. The difference between the result set is likely to be due to the version of the OMIM searched by both services. Validity of all results is confirmed through current build of the OMIM database. 15
  • 15. More Use Cases Finding important genes (hub genes) through pathway related disease PREFIX ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:> PREFIX omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT ?symbol count(distinct ?pathway) as ?indirect_num WHERE <http://bio2rdf.org/mesh:D006349> ctd_vocabulary:pathway ?pathway . ?geneid ctd_vocabulary:pathway ?pathway . ?geneid rdf:type ctd_vocabulary:Gene . ?geneid ctd_vocabulary:gene-symbol ?symbol . } GROUP BY ?symbol ORDER BY DESC( ?indirect_num ) 16
  • 16. More Use Cases Finding diseases related given SNP by rsid through gene association PREFIX ctd_vocabulary: <http://bio2rdf.org/ctd_vocabulary:> PREFIX omim_vocabulary: <http://bio2rdf.org/omim_vocabulary:> PREFIX pharmgkb_vocabulary: <http://bio2rdf.org/pharmgkb_vocabulary:> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT distinct ?disease_label WHERE { ?assoc rdf:type pharmgkb_vocabulary:Disease-Gene-Association . ?assoc pharmgkb_vocabulary:disease ?disease . ?disease rdfs:label ?disease_label . ?assoc pharmgkb_vocabulary:gene ?gene . ?rsid pharmgkb_vocabulary:gene ?gene . FILTER regex( str(?rsid), "rs1801253" ) . } 17
  • 17. Conclusion ● ● Through the use case (pathway-gene-disease) build here, we have showed that with Bio2RDF datasets, different queries can be flexibly build, merged and run in a federated fashion in order to correctly retrieve data in a single run, which is not possible to get from another single database or service. In this paper, a use case involving to query multiple distant data sources which are semantically available through Bio2RDF is defined. Also, the results are compared and validated by traditional search techniques. 18
  • 18. Future works ● This work will continue in two directions: – first direction will be developing a web interface that helps the researchers to query multiple data sources by using some visual query templates without RDF and/or SPARQL knowledge – second direction will be developing a monitoring system that helps the researchers to be aware of updates about data related to their research from multiple data sources. 19