SlideShare una empresa de Scribd logo
Bio2RDF


 Providing named entity based search with a
common biological database naming scheme

                                        BioSearch08

                                        Peter Ansell




                       real world                                          1
                                    R
a university for the                                   CRICOS No. 00213J
Introduction
• Bio2RDF is a set of query services and RDF versions
  of biological databases that provide query resolution
  based on URI's and common formats for URI's so
  that a reference to a given database can always be
  recognised based on the URI




                       real world                                          2
                                    R
a university for the                                   CRICOS No. 00213J
real world                           3
                                    R
a university for the                    CRICOS No. 00213J
Entity based link detection

• Reverse links
      o   http://bio2rdf.org/links/namespace:identifier
      o   Example: http://bio2rdf.org/links/geneid:12345
      o   Finds all of the items which have linked back to the
          Entrez Geneid for “capping protein (actin filament)
          muscle Z-line, beta”
• Namespace specific reverse links
    – http://bio2rdf.org/linksns/targetNamespace/names
      pace:identifier
     o http://bio2rdf.org/linksns/uniprot/geneid:12345
     o Only finds items linked from the UniProt database



                        real world                                       4
                                     R
 a university for the                                CRICOS No. 00213J
Complete full text search
• Overall RDF database search
    – http://bio2rdf.org/search/searchTerm
• Provides an efficient multi database full text
  search functionality




                        real world                               5
                                     R
 a university for the                        CRICOS No. 00213J
Namespace specific search
• Namespace specific RDF database search
    – http://bio2rdf.org/searchns/namespace:searchTer
      m
• Live search, converted to RDF using
  Bio2RDF URI's
    – This method is preferred to RDF database search
      for a small number of very large databases such
      as Swoogle and Pubmed which have their own
      search engines implemented



                        real world                               6
                                     R
 a university for the                        CRICOS No. 00213J
Integration with text mining
• The live search option could be one place to provide
  an interchange point between Text Mining tools and
  the Biological databases that are provided by
  Bio2RDF
• Results from text mining recognition tools can be
  provided in RDF form, or can be rdfised in some way
  to contain Bio2RDF URI's that link to the rest of the
  Bio2RDF databases
• Alternatively, some basic text mining can be
  performed using fulltext search




                       real world                                  7
                                    R
a university for the                           CRICOS No. 00213J
Cross-database queries
• Cross-database queries with SPARQL
  currently require both of the databases to
  exist within the same SPARQL endpoint
• While this is not available on the public
  endpoints, a user can setup their own
  database relatively quickly and load in their
  desired databases and setup a new query
  type to execute on that endpoint only



                        real world                            8
                                     R
 a university for the                     CRICOS No. 00213J
Example cross database query
• An example of this might be resolving the
  Pubmed articles relating to a GO term.
  Endpoint http://localhost:8890/sparql loaded
  with PubMed, Entrez Geneid, and GO
• If abstracts were loaded into the endpoint
  they could also be used
• SPARQL = CONSTRUCT ... WHERE ...
  ?geneid geneid:xGo ?myGoTerm .
  ?geneid geneid:xPubMed ?pubmed .

                        real world                           9
                                     R
 a university for the                    CRICOS No. 00213J

Más contenido relacionado

La actualidad más candente

Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Jackie Wirz, PhD
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Eleanor Howe
 
UNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterUNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium Poster
Nichole Leacock
 

La actualidad más candente (20)

BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-EuropeBHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
 
Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
 
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
 
Collaborative Genomic Data Analyses in the Cloud
Collaborative Genomic Data Analyses in the CloudCollaborative Genomic Data Analyses in the Cloud
Collaborative Genomic Data Analyses in the Cloud
 
Science Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonScience Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman Johnson
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
0032-Ijabpt-Imed pub
0032-Ijabpt-Imed pub0032-Ijabpt-Imed pub
0032-Ijabpt-Imed pub
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
 
Publishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataPublishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked Data
 
UNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterUNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium Poster
 
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
2 donat agosti-1
2 donat agosti-12 donat agosti-1
2 donat agosti-1
 

Destacado (7)

Is a Biological Database Really Different than a Biological Journal?
Is a Biological Database Really Different than a Biological Journal?Is a Biological Database Really Different than a Biological Journal?
Is a Biological Database Really Different than a Biological Journal?
 
Protein networks: A basis for large-scale data mining
Protein networks: A basis for large-scale data miningProtein networks: A basis for large-scale data mining
Protein networks: A basis for large-scale data mining
 
Bio2RDF Distributed Querying model
Bio2RDF Distributed Querying modelBio2RDF Distributed Querying model
Bio2RDF Distributed Querying model
 
Data integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsData integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactions
 
The pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized dataThe pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
 
Systems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data miningSystems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data mining
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 

Similar a Providing named entity based search with a common biological database naming scheme

BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
Michel Dumontier
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
Remzi Çelebi
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Michel Dumontier
 
Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databases
Shuichi Kawashima
 

Similar a Providing named entity based search with a common biological database naming scheme (20)

Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Introduction to BioHackathon 2014
Introduction to BioHackathon 2014Introduction to BioHackathon 2014
Introduction to BioHackathon 2014
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
 
2013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r22013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r2
 
Sharing of germplasm data sets, at the TDWG 2006 conference
Sharing of germplasm data sets, at the TDWG 2006 conferenceSharing of germplasm data sets, at the TDWG 2006 conference
Sharing of germplasm data sets, at the TDWG 2006 conference
 
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungiDr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
 
Building Data
Building DataBuilding Data
Building Data
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databases
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 

Último

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Server-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at PricelineServer-Driven User Interface (SDUI) at Priceline
Server-Driven User Interface (SDUI) at Priceline
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 

Providing named entity based search with a common biological database naming scheme

  • 1. Bio2RDF Providing named entity based search with a common biological database naming scheme BioSearch08 Peter Ansell real world 1 R a university for the CRICOS No. 00213J
  • 2. Introduction • Bio2RDF is a set of query services and RDF versions of biological databases that provide query resolution based on URI's and common formats for URI's so that a reference to a given database can always be recognised based on the URI real world 2 R a university for the CRICOS No. 00213J
  • 3. real world 3 R a university for the CRICOS No. 00213J
  • 4. Entity based link detection • Reverse links o http://bio2rdf.org/links/namespace:identifier o Example: http://bio2rdf.org/links/geneid:12345 o Finds all of the items which have linked back to the Entrez Geneid for “capping protein (actin filament) muscle Z-line, beta” • Namespace specific reverse links – http://bio2rdf.org/linksns/targetNamespace/names pace:identifier o http://bio2rdf.org/linksns/uniprot/geneid:12345 o Only finds items linked from the UniProt database real world 4 R a university for the CRICOS No. 00213J
  • 5. Complete full text search • Overall RDF database search – http://bio2rdf.org/search/searchTerm • Provides an efficient multi database full text search functionality real world 5 R a university for the CRICOS No. 00213J
  • 6. Namespace specific search • Namespace specific RDF database search – http://bio2rdf.org/searchns/namespace:searchTer m • Live search, converted to RDF using Bio2RDF URI's – This method is preferred to RDF database search for a small number of very large databases such as Swoogle and Pubmed which have their own search engines implemented real world 6 R a university for the CRICOS No. 00213J
  • 7. Integration with text mining • The live search option could be one place to provide an interchange point between Text Mining tools and the Biological databases that are provided by Bio2RDF • Results from text mining recognition tools can be provided in RDF form, or can be rdfised in some way to contain Bio2RDF URI's that link to the rest of the Bio2RDF databases • Alternatively, some basic text mining can be performed using fulltext search real world 7 R a university for the CRICOS No. 00213J
  • 8. Cross-database queries • Cross-database queries with SPARQL currently require both of the databases to exist within the same SPARQL endpoint • While this is not available on the public endpoints, a user can setup their own database relatively quickly and load in their desired databases and setup a new query type to execute on that endpoint only real world 8 R a university for the CRICOS No. 00213J
  • 9. Example cross database query • An example of this might be resolving the Pubmed articles relating to a GO term. Endpoint http://localhost:8890/sparql loaded with PubMed, Entrez Geneid, and GO • If abstracts were loaded into the endpoint they could also be used • SPARQL = CONSTRUCT ... WHERE ... ?geneid geneid:xGo ?myGoTerm . ?geneid geneid:xPubMed ?pubmed . real world 9 R a university for the CRICOS No. 00213J