Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Phyloinformatics and the Semantic Web

Departmental seminar to the University of Bath, 21 March 2011.

  • Sé el primero en comentar

Phyloinformatics and the Semantic Web

  1. 1. Phyloinformatics and the Semantic Web<br />Rutger Vos<br />
  2. 2. Outline<br />What is phyloinformatics and why should you care?<br />How we got here and where we are now<br />How the semantic web can help<br />Projects that apply the semantic web to phyloinformatics<br />Examples of linked data<br />Where to next<br />
  3. 3. What is Phyloinformatics?<br />Phylogenetics:<br />“The systematic study of organism relationships based on evolutionary similarities and differences.”<br />Informatics:<br />“The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”<br />
  4. 4. Why should you care?<br />Firstly, <br />“Nothing in evolution makes sense except in the light of phylogeny”<br />Surely, “gathering, manipulating, storing, retrieving and classifying” such information is worthwhile?<br />But if that doesn’t convince you…<br />
  5. 5. As a consumer of phylogenetic data<br />The “New Biology” is coming:<br />“Major advances will take place via integration and synthesis, rather than decomposition and reduction” (Committee on a New Biology for the 21st Century, 2009)<br />Presumably, this will involve retrieving and classifying.<br />
  6. 6. As a consumer of phylogenetic data<br />Or maybe for you phylogeny is simply a nuisance:<br />Functional prediction<br />Comparative analysis<br />Ortholog finding<br />Etc.<br />But it would still be nice to have that out of the way painlessly…<br />
  7. 7. As a producer of phylogenetic data<br />Many journals require proper storage of data described in a manuscript.<br />Funding agencies require dissemination and sharing of research results.<br />
  8. 8. The Past<br />Everything was closed:<br />Idiosyncratic, private data<br /> “pay-walls”<br />Closed source software<br />No accessible publishing medium<br />
  9. 9. The Present<br />Science is opening up:<br />Open data<br />Open access publishing<br />Open source software<br />Publishing is now accessible to everyone, online<br />
  10. 10. Our current nightmare<br />Documents, <br />documents everywhere<br />
  11. 11. The current web makes sense to us<br />
  12. 12. But not to a machine <br />
  13. 13. What was informatics again?<br />“The sciences concerned with gathering, manipulating, storing, retrieving, and classifying recorded information.”<br />
  14. 14.
  15. 15. This is too hard<br />O. R. P. Bininda-Emonds, M. Cardillo, K. E. Jones, R. D. E. MacPhee, R. M. D. Beck, R. Grenyer, S. A. Price, R. A. Vos, J. L. Gittlemanand A. Purvis, 2007. The delayed rise of present-day mammals. Nature 446: 507-512.<br />
  16. 16. Let’s delegate that<br />
  17. 17. Instead of linked documents<br />
  18. 18. A web of linked concepts<br />
  19. 19. Concepts connected by statements<br />
  20. 20. Concepts are defined in ontologies<br />“An ontology is a formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts. It is used to reason about the properties of that domain, and may be used to describe the domain.”<br />
  21. 21. Expressing concepts in data syntax<br />
  22. 22. Concepts are linked<br />Linked by statements called “triples”<br />A triple is a statement<br />subject<br />predicate<br />object<br />Any part of a triple may have to be uniquely identifiable. For this we use URLs.<br />
  23. 23. An applied example<br />Triple 1<br /> Subject: <http://example.org/data/tree1><br /> Predicate: <http://example.org/terms/hasLikelihood><br /> Object: 2342.323<br />i.e. -lnL(tree1) = 2342.323<br />Triple 2<br /> Subject: <http://example.org/data/tree2><br /> Predicate: <http://example.org/terms/hasLikelihood><br /> Object: 2341.184<br />i.e. -lnL(tree2) = 2341.184<br />
  24. 24. What’s the better tree?<br />The ontology defines what a likelihood is and how to compare negative log likelihoods.<br />Hence, automated reasoning can conclude that tree2 is the better tree. <br />
  25. 25. URLs for phylogenetics<br />PhyloWS doesn’t just provide an anchor to identify phylogenetic data, it also enables searching and retrieval.<br />
  26. 26. The EvoInfo “stack”<br />
  27. 27. TreeBASE<br />
  28. 28. External links<br />Study<br />Taxon<br />variant<br />Taxon<br />
  29. 29. A simple example<br />TreeBASE maps <br />to uBio using skos:closeMatch...<br />…and uBio to ToL <br />using gla:mapping<br />
  30. 30. Another Example, UniProt sequences<br />Standard tools can rewrite these linkout URLs <br />Result is a corresponding list of UniProt records<br />TreeBASE stores NCBI taxonomy identifiers<br />
  31. 31. Another Example, Geocoding<br />TreeBASE uses DarwinCore for lat/lon annotations<br />
  32. 32. Many online data repositories<br />
  33. 33. Challenges<br />Fragile: many services offline in Japan<br />Data gets bigger and bigger<br />Many concepts not yet in ontologies<br />Many data still “locked in” in publications<br />
  34. 34. The Future<br />
  35. 35. The cloud<br />Software will be run on a number of “virtual” platforms (Amazon, Google apps, Yahoo)<br />Data will be stored in the cloud (Big Table, FreeBase)<br />
  36. 36. Interpreting locked in knowledge<br />Text and images meant for humans are being processed by machines. Examples:<br />Taxon name mining (BHL)<br />Gene name and function mining<br />Tree figure processing<br />Automated annotation<br />
  37. 37. Summary<br />Phyloinformatics is moving from closed to open to linked data<br />Concepts and syntax are increasingly formalized and machine readable<br />Automated queries across integrated resources will enable synthetic research<br />Still lots to do to deploy these technologies and unlock legacy data<br />
  38. 38. Acknowledgements<br />Thank you for your attention!<br />Also, many thanks to:<br /> The Pagel lab at UoR<br /> The EvoInfo group<br /> Val Tannen<br /> Wayne Maddison<br /> William Piel<br /> Hilmar Lapp<br />ArlinStoltzfus<br />

×