SlideShare una empresa de Scribd logo
1 de 38
Phyloinformatics and the Semantic Web Rutger Vos
Outline What is phyloinformatics and why should you care? How we got here and where we are now How the semantic web can help Projects that apply the semantic web to phyloinformatics Examples of linked data Where to next
What is Phyloinformatics? Phylogenetics: “The systematic study of organism relationships based on evolutionary similarities and differences.” Informatics: “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
Why should you care? Firstly,  “Nothing in evolution makes sense except in the light of phylogeny” Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile? But if that doesn’t convince you…
As a consumer of phylogenetic data The “New Biology” is coming: “Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009) Presumably, this will involve retrieving and classifying.
As a consumer of phylogenetic data Or maybe for you phylogeny is simply a nuisance: Functional prediction Comparative analysis Ortholog finding Etc. But it would still be nice to have that out of the way painlessly…
As a producer of phylogenetic data Many journals require proper storage of data described in a manuscript. Funding agencies require dissemination and sharing of research results.
The Past Everything was closed: Idiosyncratic, private data  “pay-walls” Closed source software No accessible publishing medium
The Present Science is opening up: Open data Open access publishing Open source software Publishing is now accessible to everyone, online
Our current nightmare Documents,  documents everywhere
The current web makes sense to us
But not to a machine
What was informatics again? “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
This is too hard O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
Let’s delegate that
Instead of linked documents
A web of linked concepts
Concepts connected by statements
Concepts are defined in ontologies “An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”
Expressing concepts in data syntax
Concepts are linked Linked by statements called “triples” A triple is a statement subject predicate object Any part of a triple may have to be uniquely identifiable. For this we use URLs.
An applied example Triple 1 	Subject: <http://example.org/data/tree1> 	Predicate: <http://example.org/terms/hasLikelihood> 	Object: 2342.323 i.e. -lnL(tree1) = 2342.323 Triple 2 	Subject: <http://example.org/data/tree2> 	Predicate: <http://example.org/terms/hasLikelihood> 	Object: 2341.184 i.e. -lnL(tree2) = 2341.184
What’s the better tree? The ontology defines what a likelihood is and how to compare negative log likelihoods. Hence, automated reasoning can conclude that tree2 is the better tree.
URLs for phylogenetics PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.
The EvoInfo “stack”
TreeBASE
External links Study Taxon variant Taxon
A simple example TreeBASE maps  to uBio using skos:closeMatch... …and uBio to ToL  using gla:mapping
Another Example, UniProt sequences Standard tools can rewrite these linkout URLs  Result is a corresponding list of UniProt records TreeBASE stores NCBI taxonomy identifiers
Another Example, Geocoding TreeBASE uses DarwinCore for lat/lon annotations
Many online data repositories
Challenges Fragile: many services offline in Japan Data gets bigger and bigger Many concepts not yet in ontologies Many data still “locked in” in publications
The Future
The cloud Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo) Data will be stored in the cloud (Big Table, FreeBase)
Interpreting locked in knowledge Text and images meant for humans are being processed by machines. Examples: Taxon name mining (BHL) Gene name and function mining Tree figure processing Automated annotation
Summary Phyloinformatics is moving from closed to open to linked data Concepts and syntax are increasingly formalized and machine readable Automated queries across integrated resources will enable synthetic research Still lots to do to deploy these technologies and unlock legacy data
Acknowledgements Thank you for your attention! Also, many thanks to: 	The Pagel lab at UoR 	The EvoInfo group 	Val Tannen 	Wayne Maddison 	William Piel 	Hilmar Lapp ArlinStoltzfus

Más contenido relacionado

La actualidad más candente

E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
guest2426e1d
 

La actualidad más candente (20)

Knowledge Sharing - aCCCeso
Knowledge Sharing - aCCCesoKnowledge Sharing - aCCCeso
Knowledge Sharing - aCCCeso
 
Mozilla Science Lab 101
Mozilla Science Lab 101Mozilla Science Lab 101
Mozilla Science Lab 101
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Data Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCData Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWC
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Making the web work for science - eResearch nz
Making the web work for science - eResearch nzMaking the web work for science - eResearch nz
Making the web work for science - eResearch nz
 
Week 8 DRP sem 2 09
Week 8 DRP sem 2 09Week 8 DRP sem 2 09
Week 8 DRP sem 2 09
 
E Research Chapter 1
E Research Chapter 1E Research Chapter 1
E Research Chapter 1
 
Principle Violations: Revisiting the Dublin Core 1:1 Principle
Principle Violations:  Revisiting the Dublin Core 1:1 PrinciplePrinciple Violations:  Revisiting the Dublin Core 1:1 Principle
Principle Violations: Revisiting the Dublin Core 1:1 Principle
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
Hypertext2007 Carole Goble Keynote - "The Return of the Prodigal Web"
 
The Future of Open Science
The Future of Open ScienceThe Future of Open Science
The Future of Open Science
 
Building the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of ScientistsBuilding the FAIR Research Commons: A Data Driven Society of Scientists
Building the FAIR Research Commons: A Data Driven Society of Scientists
 
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
Exposing Humanities Data for Reuse and Linking - RED, linked data and the sem...
 
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago ProjectKnowledge Isles in an Open Archipelago. The Open Archipelago Project
Knowledge Isles in an Open Archipelago. The Open Archipelago Project
 

Destacado (7)

IPR Training Program
IPR Training ProgramIPR Training Program
IPR Training Program
 
Retail Saa S 2011 1
Retail Saa S 2011 1Retail Saa S 2011 1
Retail Saa S 2011 1
 
Introduction to Computer
Introduction to ComputerIntroduction to Computer
Introduction to Computer
 
Computer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจComputer กับกระบวนการทางธุรกิจ
Computer กับกระบวนการทางธุรกิจ
 
Modeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspectiveModeling the biosphere: the natural historian's perspective
Modeling the biosphere: the natural historian's perspective
 
Application of Computer in Government
Application of Computer in GovernmentApplication of Computer in Government
Application of Computer in Government
 
Biomechatronics
BiomechatronicsBiomechatronics
Biomechatronics
 

Similar a Phyloinformatics and the Semantic Web

How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
drnigam
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
mare34
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
Silvia Puglisi
 

Similar a Phyloinformatics and the Semantic Web (20)

Presentationonline
PresentationonlinePresentationonline
Presentationonline
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
How Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open ScienceHow Bio Ontologies Enable Open Science
How Bio Ontologies Enable Open Science
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
The Future of Research (Science and Technology)
The Future of Research (Science and Technology)The Future of Research (Science and Technology)
The Future of Research (Science and Technology)
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence Ontology Based Information Extraction for Disease Intelligence
Ontology Based Information Extraction for Disease Intelligence
 
Metadata in the age of data curation and linked data
Metadata in the age of data curation and linked dataMetadata in the age of data curation and linked data
Metadata in the age of data curation and linked data
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In Bioinformatics
 
The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960The seven-deadly-sins-of-bioinformatics3960
The seven-deadly-sins-of-bioinformatics3960
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
Applying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domainApplying machine learning techniques to big data in the scholarly domain
Applying machine learning techniques to big data in the scholarly domain
 
Looking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic WebLooking for Commonsense in the Semantic Web
Looking for Commonsense in the Semantic Web
 
020610
020610020610
020610
 
Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks Using Taxonomies to Create People Directories and Author Networks
Using Taxonomies to Create People Directories and Author Networks
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Más de Rutger Vos

Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
Rutger Vos
 

Más de Rutger Vos (20)

Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?Anna Karenina on hooves - what makes an animal fit for domestication?
Anna Karenina on hooves - what makes an animal fit for domestication?
 
10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie10 Misverstanden Over Evolutie
10 Misverstanden Over Evolutie
 
Crash Course Biodiversiteit
Crash Course BiodiversiteitCrash Course Biodiversiteit
Crash Course Biodiversiteit
 
Natural history research as a replicable data science
Natural history research as a replicable data scienceNatural history research as a replicable data science
Natural history research as a replicable data science
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
Onderzoek bio-informatica Naturalis. Raad voor Cultuur 2017.
 
Robot eye for the butterfly
Robot eye for the butterflyRobot eye for the butterfly
Robot eye for the butterfly
 
Taxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learningTaxonomic classification of digitized specimens using machine learning
Taxonomic classification of digitized specimens using machine learning
 
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
Self-Updating Platform for the Estimation of Rates of Speciation, Migration A...
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?Hoe leer je een robot soorten te herkennen?
Hoe leer je een robot soorten te herkennen?
 
Kunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proevenKunnen we een tomaat van 400 jaar oud proeven
Kunnen we een tomaat van 400 jaar oud proeven
 
PhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integrationPhyloTastic: names-based phyloinformatic data integration
PhyloTastic: names-based phyloinformatic data integration
 
SUPERSMART pipeline intro
SUPERSMART pipeline introSUPERSMART pipeline intro
SUPERSMART pipeline intro
 
Reconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomicsReconstructing paleoenvironments using metagenomics
Reconstructing paleoenvironments using metagenomics
 
Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...Synthesising disparate data resources to obtain composite estimates of geophy...
Synthesising disparate data resources to obtain composite estimates of geophy...
 
The Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environmentThe Galaxy bioinformatics workflow environment
The Galaxy bioinformatics workflow environment
 
Retrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collectionsRetrieving useful information from connected specimen- and data collections
Retrieving useful information from connected specimen- and data collections
 
NeXML - phylogenetic data as XML
NeXML - phylogenetic data as XMLNeXML - phylogenetic data as XML
NeXML - phylogenetic data as XML
 
Vos at NCB Naturalis
Vos at NCB NaturalisVos at NCB Naturalis
Vos at NCB Naturalis
 

Último

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Phyloinformatics and the Semantic Web

  • 1. Phyloinformatics and the Semantic Web Rutger Vos
  • 2. Outline What is phyloinformatics and why should you care? How we got here and where we are now How the semantic web can help Projects that apply the semantic web to phyloinformatics Examples of linked data Where to next
  • 3. What is Phyloinformatics? Phylogenetics: “The systematic study of organism relationships based on evolutionary similarities and differences.” Informatics: “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
  • 4. Why should you care? Firstly, “Nothing in evolution makes sense except in the light of phylogeny” Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile? But if that doesn’t convince you…
  • 5. As a consumer of phylogenetic data The “New Biology” is coming: “Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009) Presumably, this will involve retrieving and classifying.
  • 6. As a consumer of phylogenetic data Or maybe for you phylogeny is simply a nuisance: Functional prediction Comparative analysis Ortholog finding Etc. But it would still be nice to have that out of the way painlessly…
  • 7. As a producer of phylogenetic data Many journals require proper storage of data described in a manuscript. Funding agencies require dissemination and sharing of research results.
  • 8. The Past Everything was closed: Idiosyncratic, private data “pay-walls” Closed source software No accessible publishing medium
  • 9. The Present Science is opening up: Open data Open access publishing Open source software Publishing is now accessible to everyone, online
  • 10. Our current nightmare Documents, documents everywhere
  • 11. The current web makes sense to us
  • 12. But not to a machine
  • 13. What was informatics again? “The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”
  • 14.
  • 15. This is too hard O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.
  • 17. Instead of linked documents
  • 18. A web of linked concepts
  • 19. Concepts connected by statements
  • 20. Concepts are defined in ontologies “An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”
  • 21. Expressing concepts in data syntax
  • 22. Concepts are linked Linked by statements called “triples” A triple is a statement subject predicate object Any part of a triple may have to be uniquely identifiable. For this we use URLs.
  • 23. An applied example Triple 1 Subject: <http://example.org/data/tree1> Predicate: <http://example.org/terms/hasLikelihood> Object: 2342.323 i.e. -lnL(tree1) = 2342.323 Triple 2 Subject: <http://example.org/data/tree2> Predicate: <http://example.org/terms/hasLikelihood> Object: 2341.184 i.e. -lnL(tree2) = 2341.184
  • 24. What’s the better tree? The ontology defines what a likelihood is and how to compare negative log likelihoods. Hence, automated reasoning can conclude that tree2 is the better tree.
  • 25. URLs for phylogenetics PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.
  • 28. External links Study Taxon variant Taxon
  • 29. A simple example TreeBASE maps to uBio using skos:closeMatch... …and uBio to ToL using gla:mapping
  • 30. Another Example, UniProt sequences Standard tools can rewrite these linkout URLs Result is a corresponding list of UniProt records TreeBASE stores NCBI taxonomy identifiers
  • 31. Another Example, Geocoding TreeBASE uses DarwinCore for lat/lon annotations
  • 32. Many online data repositories
  • 33. Challenges Fragile: many services offline in Japan Data gets bigger and bigger Many concepts not yet in ontologies Many data still “locked in” in publications
  • 35. The cloud Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo) Data will be stored in the cloud (Big Table, FreeBase)
  • 36. Interpreting locked in knowledge Text and images meant for humans are being processed by machines. Examples: Taxon name mining (BHL) Gene name and function mining Tree figure processing Automated annotation
  • 37. Summary Phyloinformatics is moving from closed to open to linked data Concepts and syntax are increasingly formalized and machine readable Automated queries across integrated resources will enable synthetic research Still lots to do to deploy these technologies and unlock legacy data
  • 38. Acknowledgements Thank you for your attention! Also, many thanks to: The Pagel lab at UoR The EvoInfo group Val Tannen Wayne Maddison William Piel Hilmar Lapp ArlinStoltzfus

Notas del editor

  1. Thank for invitation-Thank for showing up given other lecture-Introduce self-Talk title
  2. -Mention figure on the right
  3. -Mention dobzhansky
  4. Here’s an example that uses the Yahoo! Pipes tool to turns the list of NCBI taxon identifiers that TreeBASE stores for a given study into a list of all UniProt sequence records for those taxa.
  5. This example shows that with a minimal amount of JavaScript coding a google map can be added to a web page (first code block), and the taxa for a given study can be mapped onto it using the DarwinCore coordinate annotations that TreeBASE stores.