SlideShare una empresa de Scribd logo
1 de 38
Phyloinformatics and the Semantic Web Rutger Vos
Outline What is phyloinformatics and why should you care? How we got here and where we are now How the semantic web can help Projects that apply the semantic web to phyloinformatics Examples of linked data Where to next
What is Phyloinformatics? Phylogenetics: “The systematic study of organism relationships based on evolutionary similarities and differences.” Informatics: “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
Why should you care? Firstly,  “Nothing in evolution makes sense except in the light of phylogeny” Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile? But if that doesn’t convince you…
As a consumer of phylogenetic data The “New Biology” is coming: “Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009) Presumably, this will involve retrieving and classifying.
As a consumer of phylogenetic data Or maybe for you phylogeny is simply a nuisance: Functional prediction Comparative analysis Ortholog finding Etc. But it would still be nice to have that out of the way painlessly…
As a producer of phylogenetic data Many journals require proper storage of data described in a manuscript. Funding agencies require dissemination and sharing of research results.
The Past Everything was closed: Idiosyncratic, private data  “pay-walls” Closed source software No accessible publishing medium
The Present Science is opening up: Open data Open access publishing Open source software Publishing is now accessible to everyone, online
Our current nightmare Documents,  documents everywhere
The current web makes sense to us
But not to a machine
What was informatics again? “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
This is too hard O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
Let’s delegate that
Instead of linked documents
A web of linked concepts
Concepts connected by statements
Concepts are defined in ontologies “An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”
Expressing concepts in data syntax
Concepts are linked Linked by statements called “triples” A triple is a statement subject predicate object Any part of a triple may have to be uniquely identifiable. For this we use URLs.
An applied example Triple 1 	Subject: <http://example.org/data/tree1> 	Predicate: <http://example.org/terms/hasLikelihood> 	Object: 2342.323 i.e. -lnL(tree1) = 2342.323 Triple 2 	Subject: <http://example.org/data/tree2> 	Predicate: <http://example.org/terms/hasLikelihood> 	Object: 2341.184 i.e. -lnL(tree2) = 2341.184
What’s the better tree? The ontology defines what a likelihood is and how to compare negative log likelihoods. Hence, automated reasoning can conclude that tree2 is the better tree.
URLs for phylogenetics PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.
The EvoInfo “stack”
TreeBASE
External links Study Taxon variant Taxon
A simple example TreeBASE maps  to uBio using skos:closeMatch... …and uBio to ToL  using gla:mapping
Another Example, UniProt sequences Standard tools can rewrite these linkout URLs  Result is a corresponding list of UniProt records TreeBASE stores NCBI taxonomy identifiers
Another Example, Geocoding TreeBASE uses DarwinCore for lat/lon annotations
Many online data repositories
Challenges Fragile: many services offline in Japan Data gets bigger and bigger Many concepts not yet in ontologies Many data still “locked in” in publications
The Future
The cloud Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo) Data will be stored in the cloud (Big Table, FreeBase)
Interpreting locked in knowledge Text and images meant for humans are being processed by machines. Examples: Taxon name mining (BHL) Gene name and function mining Tree figure processing Automated annotation
Summary Phyloinformatics is moving from closed to open to linked data Concepts and syntax are increasingly formalized and machine readable Automated queries across integrated resources will enable synthetic research Still lots to do to deploy these technologies and unlock legacy data
Acknowledgements Thank you for your attention! Also, many thanks to: 	The Pagel lab at UoR 	The EvoInfo group 	Val Tannen 	Wayne Maddison 	William Piel 	Hilmar Lapp ArlinStoltzfus

Más contenido relacionado

La actualidad más candente

Knowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKnowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKaitlin Thaney
 
Mozilla Science Lab 101
Mozilla Science Lab 101Mozilla Science Lab 101
Mozilla Science Lab 101Kaitlin Thaney
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsDuncan Hull
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data viaNeuroscience Information Framework
 
Data Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCData Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCKaitlin Thaney
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...Maryann Martone
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Making the web work for science - eResearch nz
Making the web work for science - eResearch nzMaking the web work for science - eResearch nz
Making the web work for science - eResearch nzKaitlin Thaney
 
Week 8 DRP sem 2 09
Week 8 DRP sem 2 09Week 8 DRP sem 2 09
Week 8 DRP sem 2 09guest992d811
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1guest2426e1d
 
Principle Violations: Revisiting the Dublin Core 1:1 Principle
Principle Violations:  Revisiting the Dublin Core 1:1 PrinciplePrinciple Violations:  Revisiting the Dublin Core 1:1 Principle
Principle Violations: Revisiting the Dublin Core 1:1 PrincipleRichard Urban
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Mathieu d'Aquin
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"hypertext2007
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open SciencePhilip Bourne
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsCarole Goble
 
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Mathieu d'Aquin
 
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectKnowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectUgo Eccli
 

La actualidad más candente (20)

Knowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKnowledge Sharing - aCCCeso
Knowledge Sharing - aCCCeso
 
Mozilla Science Lab 101
Mozilla Science Lab 101Mozilla Science Lab 101
Mozilla Science Lab 101
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Data Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCData Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWC
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Making the web work for science - eResearch nz
Making the web work for science - eResearch nzMaking the web work for science - eResearch nz
Making the web work for science - eResearch nz
 
Week 8 DRP sem 2 09
Week 8 DRP sem 2 09Week 8 DRP sem 2 09
Week 8 DRP sem 2 09
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
 
Principle Violations: Revisiting the Dublin Core 1:1 Principle
Principle Violations:  Revisiting the Dublin Core 1:1 PrinciplePrinciple Violations:  Revisiting the Dublin Core 1:1 Principle
Principle Violations: Revisiting the Dublin Core 1:1 Principle
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
 
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectKnowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
 

Destacado

IPR Training Program
IPR Training ProgramIPR Training Program
IPR Training ProgramIleague
 
Retail Saa S 2011 1
Retail Saa S 2011 1Retail Saa S 2011 1
Retail Saa S 2011 1tgeyskens
 
Computer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจComputer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจthanapat yeekhaday
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveRutger Vos
 
Application of Computer in Government
Application of Computer in GovernmentApplication of Computer in Government
Application of Computer in Governmentthanapat yeekhaday
 

Destacado (7)

IPR Training Program
IPR Training ProgramIPR Training Program
IPR Training Program
 
Retail Saa S 2011 1
Retail Saa S 2011 1Retail Saa S 2011 1
Retail Saa S 2011 1
 
Introduction to Computer
Introduction to ComputerIntroduction to Computer
Introduction to Computer
 
Computer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจComputer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจ
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Application of Computer in Government
Application of Computer in GovernmentApplication of Computer in Government
Application of Computer in Government
 
Biomechatronics
BiomechatronicsBiomechatronics
Biomechatronics
 

Similar a Phyloinformatics and the Semantic Web: Unlocking Evolutionary Data

The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Sciencedrnigam
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)Duncan Hull
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Mark Wilkinson
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence IJORCS
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataRyan Johnson
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsDereck Downing
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960mare34
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSilvia Puglisi
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainAngelo Salatino
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebValentina Presutti
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Access Innovations, Inc.
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 

Similar a Phyloinformatics and the Semantic Web: Unlocking Evolutionary Data (20)

Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked data
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In Bioinformatics
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
020610
020610020610
020610
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Más de Rutger Vos

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Rutger Vos
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over EvolutieRutger Vos
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course BiodiversiteitRutger Vos
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data scienceRutger Vos
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Rutger Vos
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterflyRutger Vos
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningRutger Vos
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Rutger Vos
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataRutger Vos
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Rutger Vos
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenRutger Vos
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationRutger Vos
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline introRutger Vos
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsRutger Vos
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Rutger Vos
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentRutger Vos
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRutger Vos
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLRutger Vos
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB NaturalisRutger Vos
 

Más de Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 

Último

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Último (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Phyloinformatics and the Semantic Web: Unlocking Evolutionary Data

  • 1. Phyloinformatics and the Semantic Web Rutger Vos
  • 2. Outline What is phyloinformatics and why should you care? How we got here and where we are now How the semantic web can help Projects that apply the semantic web to phyloinformatics Examples of linked data Where to next
  • 3. What is Phyloinformatics? Phylogenetics: “The systematic study of organism relationships based on evolutionary similarities and differences.” Informatics: “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
  • 4. Why should you care? Firstly, “Nothing in evolution makes sense except in the light of phylogeny” Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile? But if that doesn’t convince you…
  • 5. As a consumer of phylogenetic data The “New Biology” is coming: “Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009) Presumably, this will involve retrieving and classifying.
  • 6. As a consumer of phylogenetic data Or maybe for you phylogeny is simply a nuisance: Functional prediction Comparative analysis Ortholog finding Etc. But it would still be nice to have that out of the way painlessly…
  • 7. As a producer of phylogenetic data Many journals require proper storage of data described in a manuscript. Funding agencies require dissemination and sharing of research results.
  • 8. The Past Everything was closed: Idiosyncratic, private data “pay-walls” Closed source software No accessible publishing medium
  • 9. The Present Science is opening up: Open data Open access publishing Open source software Publishing is now accessible to everyone, online
  • 10. Our current nightmare Documents, documents everywhere
  • 11. The current web makes sense to us
  • 12. But not to a machine
  • 13. What was informatics again? “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
  • 14.
  • 15. This is too hard O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
  • 17. Instead of linked documents
  • 18. A web of linked concepts
  • 19. Concepts connected by statements
  • 20. Concepts are defined in ontologies “An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”
  • 21. Expressing concepts in data syntax
  • 22. Concepts are linked Linked by statements called “triples” A triple is a statement subject predicate object Any part of a triple may have to be uniquely identifiable. For this we use URLs.
  • 23. An applied example Triple 1 Subject: <http://example.org/data/tree1> Predicate: <http://example.org/terms/hasLikelihood> Object: 2342.323 i.e. -lnL(tree1) = 2342.323 Triple 2 Subject: <http://example.org/data/tree2> Predicate: <http://example.org/terms/hasLikelihood> Object: 2341.184 i.e. -lnL(tree2) = 2341.184
  • 24. What’s the better tree? The ontology defines what a likelihood is and how to compare negative log likelihoods. Hence, automated reasoning can conclude that tree2 is the better tree.
  • 25. URLs for phylogenetics PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.
  • 28. External links Study Taxon variant Taxon
  • 29. A simple example TreeBASE maps to uBio using skos:closeMatch... …and uBio to ToL using gla:mapping
  • 30. Another Example, UniProt sequences Standard tools can rewrite these linkout URLs Result is a corresponding list of UniProt records TreeBASE stores NCBI taxonomy identifiers
  • 31. Another Example, Geocoding TreeBASE uses DarwinCore for lat/lon annotations
  • 32. Many online data repositories
  • 33. Challenges Fragile: many services offline in Japan Data gets bigger and bigger Many concepts not yet in ontologies Many data still “locked in” in publications
  • 35. The cloud Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo) Data will be stored in the cloud (Big Table, FreeBase)
  • 36. Interpreting locked in knowledge Text and images meant for humans are being processed by machines. Examples: Taxon name mining (BHL) Gene name and function mining Tree figure processing Automated annotation
  • 37. Summary Phyloinformatics is moving from closed to open to linked data Concepts and syntax are increasingly formalized and machine readable Automated queries across integrated resources will enable synthetic research Still lots to do to deploy these technologies and unlock legacy data
  • 38. Acknowledgements Thank you for your attention! Also, many thanks to: The Pagel lab at UoR The EvoInfo group Val Tannen Wayne Maddison William Piel Hilmar Lapp ArlinStoltzfus

Notas del editor

  1. Thank for invitation-Thank for showing up given other lecture-Introduce self-Talk title
  2. -Mention figure on the right
  3. -Mention dobzhansky
  4. Here’s an example that uses the Yahoo! Pipes tool to turns the list of NCBI taxon identifiers that TreeBASE stores for a given study into a list of all UniProt sequence records for those taxa.
  5. This example shows that with a minimal amount of JavaScript coding a google map can be added to a web page (first code block), and the taxa for a given study can be mapped onto it using the DarwinCore coordinate annotations that TreeBASE stores.