Presented at the Biodiversity Information Standards (Taxonomic Databases Working Group) 2013 meeting in Florence, Italy on 31 October 2013. Essentially, an introduction to aspects of the back end of the new trait repository of Encyclopedia of Life.
2. The road to TraitBank
In second year of 2 year project:
Marine
Expert Audience
Conservation science
Virtuoso triple store
<EOL taxon id> <hasAvgBodyMass in g> <value>
<EOL taxon id> <preysOn> <scientific name>
Beta testing NOW for public launch early 2014
21 datasets with 2.8 million data records for 520,000 taxa
Harvest, display, curate, search, download
MOST DATA NOT BORN SEMANTIC
From text mining
From literature tables
From data papers
From databases
3. Term URIs from existing ontologies
•
•
•
•
•
•
•
e.g. those registered in bioportal.bioontologies.org
Statistics from Semantic Science Integrated Ontology
Units Ontology
Environments Ontology EnvO
Gene Ontology
ETHAN (Natural history, with Joel Sachs)
Vertebrate Trait Ontology
Plant Trait Ontology
• Where necessary: request terms
• Last resort: create provisional terms with
http://eol.org/schema/terms/xxxx
• Of course, also using unique EOL taxon identifiers, which
we’ve mapped to identifiers of other projects
4. Known URIs tool
Only light reasoning so far– just to infer inverse
relationships like “eats” and “is eaten by”
5. GLoBI http://globalbioticinteractions.wordpress.com/
Jorrit Poelen, Chris Mungall, James Simon GoMexSi
14 datasets with 25k taxa, 422k interactions, for 3k locations
alpha version of ingestion, normalization, aggregation
alpha version of web API
alpha version of data exports
6. GLoBI ontology work
https://github.com/jhpoelen/eol-globidata/tree/master/eol-globi-ontology
Interaction processes from Gene Ontology
Relations from OBO Relations Ontology
Life cycle stages and body parts from UBERON
Observation and specimen terms from various
Behaviors from NeuroBehaviorOntology and
Habitat keywords from Environment Ontology
New terms:
/eats, /interactsWith, /preysUpon, /hasHost, /hosts,
/parasitizes
8. To do
• Term evaluation and recommendations
• Map similar terms
• Map terms to upper ontology like Species
Profile Model
• Leverage reasoning for data validation
To access to the Beta test, happening NOW
Send your EOL login to:
@cydparr parrc@si.edu
Notas del editor
EOL's TraitBank™ aggregates and manages attribute (trait) data across the tree of life in a Virtuoso triple store. Attributes of organisms include morphological descriptors, life history characteristics, habitat preferences, and interactions with other organisms. In this talk we focus on how we add to and improve semantics of both data and metadata in order to improve interoperability across the domains of morphology, ecology, and genomics. At least initially, most data aggregated by TraitBank will not have been "born semantic." Wherever possible, for each dataset, staff will select Uniform Resource Identifiers (URIs) for terms in existing ontologies (e.g. those registered in bioportal.bioontologies.org) to anchor the type of the attribute (e.g. habitat from the Environments Ontology). We also use terms from ontologies or other controlled vocabularies for value of attributes (e.g. a particular type of habitat) as well as for most metadata describing the context of the measurement (e.g. life stage, geographic scope). As large datasets are ingested we will propose new terms if needed to managers of existing ontologies. Using a customized interface we ensure and can share good definitions and labels for terms that don't yet have them. We also use this interface to promote good practice when others choose URIs for directly-added data. However, we will remain flexible and allow new community-generated terms. We anticipate iterative processes to relate new terms to each other and to existing ontologies. Our usage of semantic reasoning will initially be quite light, limited to units conversion and inverse relationships. Eventually it could be expanded to infer values based on phylogeny. A prime example of the approach of reusing ontologies is the Global Biotic Interactions group (GLoBI, http://globalbioticinteractions.wordpress.com/) which reuses and extends classes and relations from existing biomedical and genomic ontologies. In particular Globi.owl draws interaction processes from the Gene Ontology, taxonomic ranks from the Open Biomedical Ontology (OBO) taxrank ontology, relations from the OBO Relations Ontology, life cycle stages and body parts from UBERON, observation and specimen terms from various ontologies, behaviors from NeuroBehaviorOntology and habitat keywords from Environment Ontology. GLoBI standardizes data then flows it to EOL. Though challenges remain to be addressed, the ultimate goal is to expose semantically-annotated, contextualized data so that it can contribute to 1) phylogenetic analyses aimed at understanding evolutionary responses and evolutionary history, 2) facilitation of new species discovery, 3) metagenomic analyses aimed at integrated understanding of ecosystem processes, and 4) Global biotic models.
Starting with marine dataIn the most simplistic view, we’ll be storing triplesThis data will be organized on a data tab, sorting out the data into the 35 or so “topics” that we currently have text chapters for, and we will also allow powerful downloading and searching capabilityFinally we’ll be setting up ways for other applications to grab the data and do interesting things with it. We already have a tool for making field guides,The approach here builds on our innovations for EOL and adds some proven technology called the “semantic web” to our domain. The next step takes this chain of innovation even further.