SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Text Mining and Environmental 
Metadata Suggestion 
Evangelos Pafilis 
Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC) 
Hellenic Centre for Marine Research (HCMR), Heraklio Crete, Greece 
pafilis@hcmr.gr, http://epafilis.info 
ENA – 1st Dec 2014 – EBI, UK
Species – Environments 
ENA – 1st Dec 2014 – EBI, UK
Comparative Αnalysis 
• Location 
• Environment 
• Time Period 
ENA – 1st Dec 2014 – EBI, UK 
? 
Coral Reefs 
Image from http://theresilientearth.com/
Not Trivial 
ENA – 1st Dec 2014 – EBI, UK
Slide by Dr. P. Yilmaz, http://www.arb-silva.de/projects/contextual-data/
Essential Context Information 
Metadata 
Meta- = Μετά (“after”) 
=> data “after” data 
=> data describing data 
ENA – 1st Dec 2014 – EBI, UK
a clear definition, that can be interpreted 
in many, sometimes conflicting, ways 
ENA – 1st Dec 2014 – EBI, UK
a clear definition, that can be interpreted 
in many, sometimes conflicting, ways 
Essential Context Information 
ENA – 1st Dec 2014 – EBI, UK
Community Standards 
• Standards (such as MiXS, MIMARKS) 
see http://gensc.org/gc_wiki/index.php/GSC_Publications 
for a comprehensive list of publications 
• capture genomic/metagenomic and other type of sequence contextual information 
• Including detailed guidelines on how to annotate a sample 
(e.g. Yilmaz P et al. (2011) The ISME journal 5: 1565–1567) 
ENA – 1st Dec 2014 – EBI, UK 
http://gensc.org/
P. Yilmaz et al., Nat Biotech 29, 415–420 (2011)
source: http://wiki.gensc.org/index.php?title=MIMARKS
http://www.tomorrowstarted.com/2013/01/how-a-key-works/.html 
ENA – 1st Dec 2014 – EBI, UK
• Project descriptions 
• Scientific-content web pages 
• Full text scientific articles 
• Literature abstracts 
• In-house documents 
ENA – 1st Dec 2014 – EBI, UK
Microbes are key players in both healthy and 
degraded coral reefs. A combination of 
metagenomics, microscopy, culturing, and 
water chemistry were used to characterize 
microbial communities on four coral atolls in 
the Northern Line Islands, central Pacific. 
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 
(“Project Description”) 
ENA – 1st Dec 2014 – EBI, UK
Looking up terms: 
Intensive, learning curve 
ENA – 1st Dec 2014 – EBI, UK
Literature Mining 
ENA – 1st Dec 2014 – EBI, UK
processing text 
to extract facts of interest 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS: ENVO term identification in text 
terrestrial, aquatic, 
marine, lagoon, coral reef, 
sediment, freshwater, soil 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS: ENVO term identification in text 
Microbes are key players in both healthy and 
degraded coral reefs. A combination of 
metagenomics, microscopy, culturing, and 
water chemistry were used to characterize 
microbial communities on four coral atolls in 
the Northern Line Islands, central Pacific. 
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 
(“Project Description”) 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS: ENVO term identification in text 
ID: ENVO:00000150 
Name: coral reef 
Microbes are key players in both healthy and 
degraded coral reefs. A combination of 
metagenomics, microscopy, culturing, and 
water chemistry were used to characterize 
microbial communities on four coral atolls in 
the Northern Line Islands, central Pacific. 
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 
(“Project Description”) 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS: ENVO term identification in text 
ID: ENVO:00000150 
Name: coral reef 
Microbes are key players in both healthy and 
degraded coral reefs. A combination of 
metagenomics, microscopy, culturing, and 
water chemistry were used to characterize 
microbial communities on four coral atolls in 
the Northern Line Islands, central Pacific. 
Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 
(“Project Description”) 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS 
http://environments.hcmr.gr 
http://environments-eol.blogspot.gr/ 
ENA – 1st Dec 2014 – EBI, UK 
● Dictionary based 
● Open source 
● Environment Ontology 
● fast performance 
● 4000 PubMed abstracts / 
second * 
● Based on SPECIES name recognition 
tagger (Pafilis et al, PLOS ONE) 
● E600 gold standard: ENVO-based 
corpus of EOL Species pages 
● Recognition Accuracy – Mention Level: 
- F1: 82.0% 
87.1% of the TPs: exact id 
among predicted ones 
● Submitted preprint: http://biorxiv.org/ 
content/early/2014/11/13/011403 
Pafilis E et al. (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of 
Taxonomic Names in Text. PLoS ONE 8(6): e65390, *: based a single-thread run on an Intel 2,27GHz, 24 
GB RAM processing a set of 536,052 abstracts
ENVO: source of environment descriptor 
names and synonyms 
http://environmentontology.org 
~1600 terms, June 2013 
ENA – 1st Dec 2014 – EBI, UK 
biome 
environmental 
feature 
environmental 
material 
environmental 
condition 
… 
… 
… 
… 
habitat … 
Based on slides by Dr. Pier Luigi Buttigier, AWI, Bremenhaven, Germany
ENVIRONMENTS – Improving Accuracy 
● Increasing matches in text 
● orthographic variation supported 
e.g. freshwater, fresh water, and fresh-water 
● Case-insensitive matching 
● Synonym generation to reflect the way environment descriptive 
terms are mentioned in text (both generic and ENVO specific) 
Action Example 
● Preventing overmatching (i.e. avoiding increased FP) 
● „stopword-list” (e.g. spring, well, range) 
ENA – 1st Dec 2014 – EBI, UK 
Add a variant in which 
non-informative words 
have been removed 
epipelagic zone → epipelagic 
estuarine biome → estuarine 
Plural form addition sediment → sediments 
Adjective form addition lagoon → lagoonal
Scope 
ENVO parts Not included: 
species 
tissues 
foods 
Limitations – Known Issues 
negation not supported 
conflicts with anatomy terms 
(e.g. mouth, blowhole) 
ENA – 1st Dec 2014 – EBI, UK
ENVIRONMENTS – Sample Output 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 
eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 
eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477 
ENA – 1st Dec 2014 – EBI, UK 
File Name 
Start 
coord 
End 
coord 
Match 
text ENVO ID 
Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 
of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
ENVIRONMENTS – Sample Output 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 
eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 
eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 
eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 
eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477 
ENA – 1st Dec 2014 – EBI, UK 
File Name 
Start 
coord 
End 
coord 
Match 
text ENVO ID 
Traversing all 
IS_A, PART_OF 
Relationships in ENVO 
Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 
of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
Download 
ENA – 1st Dec 2014 – EBI, UK 
ENVIRONMENTS 
• Home Page: http://environments.hcmr.gr/ 
• Tagger Software: 
http://download.jensenlab.org/environments_tagger.tar.gz
other forms of access 
ENA – 1st Dec 2014 – EBI, UK
ENA – 1st Dec 2014 – EBI, UK 
http://eol.org/info/discover_what
ENA – 1st Dec 2014 – EBI, UK 
ID: ENVO:00000150 
Name: coral reef 
ENVIRONMENTS 
ACTION ES1103 
Interactive Curation 
http://www.ncbi.nlm.nih.gov/pubmed/18301735
Interactive Curation 
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
http://www.ncbi.nlm.nih.gov/pubmed/18301735
Interactive Curation 
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
http://www.ncbi.nlm.nih.gov/pubmed/18301735
Interactive Curation 
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
http://www.ncbi.nlm.nih.gov/pubmed/18301735
Interactive Curation 
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
http://www.ncbi.nlm.nih.gov/pubmed/18301735
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
Not only ENVO terms
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
http://www.ncbi.nlm.nih.gov/pubmed/18301735
What else is being identified? 
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103 
ready you to discover!
ENA – 1st Dec 2014 – EBI, UK 
ACTION ES1103
Summary 
! Importance of standardized metadata and annotations 
! ENVO: Standardized hierarchically organized descriptions of 
environment types 
! Literature, project and other scientific content web pages may 
describe the environment context of a metagenomics sample 
ENA – 1st Dec 2014 – EBI, UK 
! ENVIRONMENTS: 
! Dictionary-based environment descriptive term identification 
! Ontological Community standards, e.g. ENVO: name source 
! Command line application 
! Browser extensions, a user-friendly interface 
! Highly Interactive 
! Can be used while browsing the web 
! Extract ENVO from a selected part of a web page 
! Extended for: 
! Organism, diseases, and tissue mention identification
Digging-out Information 
http://hartpurylrc.Photo by Dr Chatzinikolaou E files.wordpress.com 
ENA – 1st Dec 2014 – EBI, UK
BioCreative: Metagenomics Track 
Critical Assessment of Information Extraction in Biology 
• Preparing a Metagenomics Track as part of the BioCreative 2015 challenge 
• Aim: improve the environmental-context annotation of sequences in major 
metagenomics repositories. 
• Track coordinator: Dr. L. Hirschman, MITRE 
• BioCreative (www.biocreative.org) 
ENA – 1st Dec 2014 – EBI, UK
Biodiversity – Genomics 
ENVIRONMENTS-EOL 
http://environments-eol.blogspot.com/ 
Encyclopedia of Life (EOL) http://www.eol.org 
• process EOL taxon pages 
• extract environmental context (ENVO terms) 
• EOL Taxon Page: Quick Facts, Data tab 
• integrated in Traitbank 
• large scale biological questions 
Rubenstein Fellowship 2013 
In collab: Jennifer Hammock, Patrick Leary, Katja 
Schulz, Cyndy Parr 
Hexanchus griseus EOL page, http://eol.org/pages/212027 
SEQenv http://environments.hcmr.gr/seqenv.html 
• annotate microbial sequences with ENVO terms 
• sequence analysis, literature mining, visualization 
• GenBank isolation source, PubMed Abstracts 
• sample comparison, temporal/spatial pattern analysis 
• extension: proteins, protein families, 3D visualization 
Reused: Analysis of America bird habitats, http://blog.eol.org/ 
(NoPlaceLikeHome, in collab: Rob Stevenson, Carl Nordman) 
ACTION ES1103 
ENA – 1st Dec 2014 – EBI, UK
http://jensenlab.org/ 
Santos A et al. (under review), 
preprint: http://biorxiv.org/content/early/2014/11/10/010975 
Frankild S et al. (under review), 
preprint: http://biorxiv.org/content/early/2014/08/25/008425 
Pafilis E et al. (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of 
Taxonomic Names in Text. PLoS ONE 8(6): e65390 
ENA – 1st Dec 2014 – EBI, UK
Acknowledgements 
Thank You! 
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou 
Lucia Fanini, Sarah Faulwetter, Anastasis Oulas 
NNF CPR: Lars Juhl Jensen, Sune Frankild 
U Mass: Rob Stevenson 
Uni Glasgow: Christopher Quince, Umer Ijaz 
EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz 
MM-MPI: J. Schnetzer, AWI: Dr P. Buttigieg, HITS: Dr. S. Berger and more 
Funding: EOL Rubenstein Fellowship, LifeWatch Greece, MARBIGEN, 
NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,”SEQenv” Hackathons (COST ES1103) 
ENA – 1st Dec 2014 – EBI, UK 
Amvrakikos Lagoons, May 2011 
ACTION ES1103
Acknowledgements 
Thank You! 
HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou 
ENA – 1st Dec 2014 – EBI, UK 
id: ENVO:00000038 
name: lagoon 
Amvrakikos Lagoons, May 2011 
ACTION ES1103 
Lucia Fanini, Sarah Faulwetter, Anastasis Oulas 
NNF CPR: Lars Juhl Jensen, Sune Frankild 
U Mass: Rob Stevenson 
Uni Glasgow: Christopher Quince, Umer Ijaz 
EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz 
MM-MPI: J. Schnetzer, AWI: Dr P. Buttigieg, and more 
Funding: EOL Rubenstein Fellowship, LifeWatch Greece, MARBIGEN, 
NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,”SEQenv” Hackathons (COST ES1103)
Tutorial 
• Start Firefox 
• Install the “megx-seqenv-bar.xpi” 
• Drug and Drop 
• “Install Now” and “Restart” 
• Visit a couple of PubMed abstracts or article web 
pages of your preference 
• Annotate the complete abstract, 
• Annotate selected sentences only 
ENA – 1st Dec 2014 – EBI, UK

Más contenido relacionado

La actualidad más candente

La actualidad más candente (9)

Daniel Inserillo CV
Daniel Inserillo CVDaniel Inserillo CV
Daniel Inserillo CV
 
OBIS and Caribbean Marine Atlas
OBIS and Caribbean Marine AtlasOBIS and Caribbean Marine Atlas
OBIS and Caribbean Marine Atlas
 
OBIS, a global biodiversity data-sharing platform for ABNJ
OBIS, a global biodiversity data-sharing platform for ABNJOBIS, a global biodiversity data-sharing platform for ABNJ
OBIS, a global biodiversity data-sharing platform for ABNJ
 
Emad CV May, 2015
Emad CV May, 2015Emad CV May, 2015
Emad CV May, 2015
 
LHobbs_CV_nocontact
LHobbs_CV_nocontactLHobbs_CV_nocontact
LHobbs_CV_nocontact
 
2014 Vianna et al. Acoustic telemetry validates citizen science for monitorin...
2014 Vianna et al. Acoustic telemetry validates citizen science for monitorin...2014 Vianna et al. Acoustic telemetry validates citizen science for monitorin...
2014 Vianna et al. Acoustic telemetry validates citizen science for monitorin...
 
Great Lakes, Multiple Concerns
Great Lakes, Multiple ConcernsGreat Lakes, Multiple Concerns
Great Lakes, Multiple Concerns
 
resume
resumeresume
resume
 
May2013
May2013May2013
May2013
 

Similar a Text Mining and Environmental Metadata Suggestion

2014 10 TDWG - Environments-EOL
2014 10 TDWG - Environments-EOL 2014 10 TDWG - Environments-EOL
2014 10 TDWG - Environments-EOL Evangelos Pafilis
 
UC Davis EVE161 Lecture 17 by @phylogenomics
 UC Davis EVE161 Lecture 17 by @phylogenomics UC Davis EVE161 Lecture 17 by @phylogenomics
UC Davis EVE161 Lecture 17 by @phylogenomicsJonathan Eisen
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK Cyndy Parr
 
Joshua Seidman Honors Thesis Rough Draft 2.4 enm
Joshua Seidman Honors Thesis Rough Draft 2.4 enmJoshua Seidman Honors Thesis Rough Draft 2.4 enm
Joshua Seidman Honors Thesis Rough Draft 2.4 enmJoshua Seidman
 
Eli Rose Resume
Eli Rose ResumeEli Rose Resume
Eli Rose ResumeEliTRose
 
Liddell TERN-super sites-phenology-ACEAS
Liddell TERN-super sites-phenology-ACEASLiddell TERN-super sites-phenology-ACEAS
Liddell TERN-super sites-phenology-ACEASaceas13tern
 
2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD
2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD
2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FDAmy Wolfe
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Cyndy Parr
 
Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...
Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...
Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...Premier Publishers
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientistsCyndy Parr
 
Abstract_Findabhair_Ni_Fhaolain
Abstract_Findabhair_Ni_FhaolainAbstract_Findabhair_Ni_Fhaolain
Abstract_Findabhair_Ni_FhaolainFindabhair N
 
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...Tom Mens
 
Shane Donaghy (2013) Dissertation (Full Text)
Shane Donaghy (2013) Dissertation (Full Text)Shane Donaghy (2013) Dissertation (Full Text)
Shane Donaghy (2013) Dissertation (Full Text)Shane Donaghy
 
Deep Blue Days - 14>16 Oct. 2014, Brest - programme
Deep Blue Days - 14>16 Oct. 2014, Brest - programmeDeep Blue Days - 14>16 Oct. 2014, Brest - programme
Deep Blue Days - 14>16 Oct. 2014, Brest - programmeRimetz-Planchon Juliette
 

Similar a Text Mining and Environmental Metadata Suggestion (20)

2014 10 TDWG - Environments-EOL
2014 10 TDWG - Environments-EOL 2014 10 TDWG - Environments-EOL
2014 10 TDWG - Environments-EOL
 
UC Davis EVE161 Lecture 17 by @phylogenomics
 UC Davis EVE161 Lecture 17 by @phylogenomics UC Davis EVE161 Lecture 17 by @phylogenomics
UC Davis EVE161 Lecture 17 by @phylogenomics
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
REU2016_FinalPaper
REU2016_FinalPaperREU2016_FinalPaper
REU2016_FinalPaper
 
OBIS and BBNJ
OBIS and BBNJ OBIS and BBNJ
OBIS and BBNJ
 
Joshua Seidman Honors Thesis Rough Draft 2.4 enm
Joshua Seidman Honors Thesis Rough Draft 2.4 enmJoshua Seidman Honors Thesis Rough Draft 2.4 enm
Joshua Seidman Honors Thesis Rough Draft 2.4 enm
 
OBIS, a global biodiversity data-sharing platform for ABNJ
OBIS, a global biodiversity data-sharing platform for ABNJOBIS, a global biodiversity data-sharing platform for ABNJ
OBIS, a global biodiversity data-sharing platform for ABNJ
 
Eli Rose Resume
Eli Rose ResumeEli Rose Resume
Eli Rose Resume
 
Liddell TERN-super sites-phenology-ACEAS
Liddell TERN-super sites-phenology-ACEASLiddell TERN-super sites-phenology-ACEAS
Liddell TERN-super sites-phenology-ACEAS
 
2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD
2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD
2016Sum_LaRC_AppalachianTrailHealthAQ_Presentation_FD
 
CV_RShelley_English
CV_RShelley_EnglishCV_RShelley_English
CV_RShelley_English
 
OBIS introduction-for-i marine
OBIS introduction-for-i marineOBIS introduction-for-i marine
OBIS introduction-for-i marine
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
 
Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...
Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...
Investigation of Groundwater Potential and Aquifer Protective Capacity of Par...
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientists
 
Abstract_Findabhair_Ni_Fhaolain
Abstract_Findabhair_Ni_FhaolainAbstract_Findabhair_Ni_Fhaolain
Abstract_Findabhair_Ni_Fhaolain
 
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...
ECOS: Ecological Studies of Open Source Software Ecosystems (@ CSMR-WCRE 2014...
 
Shane Donaghy (2013) Dissertation (Full Text)
Shane Donaghy (2013) Dissertation (Full Text)Shane Donaghy (2013) Dissertation (Full Text)
Shane Donaghy (2013) Dissertation (Full Text)
 
Charles Parsons Initiative - Energy and Sustainable Environment
Charles Parsons Initiative - Energy and Sustainable EnvironmentCharles Parsons Initiative - Energy and Sustainable Environment
Charles Parsons Initiative - Energy and Sustainable Environment
 
Deep Blue Days - 14>16 Oct. 2014, Brest - programme
Deep Blue Days - 14>16 Oct. 2014, Brest - programmeDeep Blue Days - 14>16 Oct. 2014, Brest - programme
Deep Blue Days - 14>16 Oct. 2014, Brest - programme
 

Último

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 

Último (20)

Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 

Text Mining and Environmental Metadata Suggestion

  • 1. Text Mining and Environmental Metadata Suggestion Evangelos Pafilis Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC) Hellenic Centre for Marine Research (HCMR), Heraklio Crete, Greece pafilis@hcmr.gr, http://epafilis.info ENA – 1st Dec 2014 – EBI, UK
  • 2. Species – Environments ENA – 1st Dec 2014 – EBI, UK
  • 3. Comparative Αnalysis • Location • Environment • Time Period ENA – 1st Dec 2014 – EBI, UK ? Coral Reefs Image from http://theresilientearth.com/
  • 4. Not Trivial ENA – 1st Dec 2014 – EBI, UK
  • 5. Slide by Dr. P. Yilmaz, http://www.arb-silva.de/projects/contextual-data/
  • 6. Essential Context Information Metadata Meta- = Μετά (“after”) => data “after” data => data describing data ENA – 1st Dec 2014 – EBI, UK
  • 7. a clear definition, that can be interpreted in many, sometimes conflicting, ways ENA – 1st Dec 2014 – EBI, UK
  • 8. a clear definition, that can be interpreted in many, sometimes conflicting, ways Essential Context Information ENA – 1st Dec 2014 – EBI, UK
  • 9. Community Standards • Standards (such as MiXS, MIMARKS) see http://gensc.org/gc_wiki/index.php/GSC_Publications for a comprehensive list of publications • capture genomic/metagenomic and other type of sequence contextual information • Including detailed guidelines on how to annotate a sample (e.g. Yilmaz P et al. (2011) The ISME journal 5: 1565–1567) ENA – 1st Dec 2014 – EBI, UK http://gensc.org/
  • 10. P. Yilmaz et al., Nat Biotech 29, 415–420 (2011)
  • 13. • Project descriptions • Scientific-content web pages • Full text scientific articles • Literature abstracts • In-house documents ENA – 1st Dec 2014 – EBI, UK
  • 14. Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific. Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”) ENA – 1st Dec 2014 – EBI, UK
  • 15. Looking up terms: Intensive, learning curve ENA – 1st Dec 2014 – EBI, UK
  • 16. Literature Mining ENA – 1st Dec 2014 – EBI, UK
  • 17. processing text to extract facts of interest ENA – 1st Dec 2014 – EBI, UK
  • 18. ENVIRONMENTS ENA – 1st Dec 2014 – EBI, UK
  • 19. ENVIRONMENTS: ENVO term identification in text terrestrial, aquatic, marine, lagoon, coral reef, sediment, freshwater, soil ENA – 1st Dec 2014 – EBI, UK
  • 20. ENVIRONMENTS: ENVO term identification in text Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific. Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”) ENA – 1st Dec 2014 – EBI, UK
  • 21. ENVIRONMENTS: ENVO term identification in text ID: ENVO:00000150 Name: coral reef Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific. Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”) ENA – 1st Dec 2014 – EBI, UK
  • 22. ENVIRONMENTS: ENVO term identification in text ID: ENVO:00000150 Name: coral reef Microbes are key players in both healthy and degraded coral reefs. A combination of metagenomics, microscopy, culturing, and water chemistry were used to characterize microbial communities on four coral atolls in the Northern Line Islands, central Pacific. Source: http://metagenomics.anl.gov/linkin.cgi?metagenome=4440039.3 (“Project Description”) ENA – 1st Dec 2014 – EBI, UK
  • 23. ENVIRONMENTS http://environments.hcmr.gr http://environments-eol.blogspot.gr/ ENA – 1st Dec 2014 – EBI, UK ● Dictionary based ● Open source ● Environment Ontology ● fast performance ● 4000 PubMed abstracts / second * ● Based on SPECIES name recognition tagger (Pafilis et al, PLOS ONE) ● E600 gold standard: ENVO-based corpus of EOL Species pages ● Recognition Accuracy – Mention Level: - F1: 82.0% 87.1% of the TPs: exact id among predicted ones ● Submitted preprint: http://biorxiv.org/ content/early/2014/11/13/011403 Pafilis E et al. (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8(6): e65390, *: based a single-thread run on an Intel 2,27GHz, 24 GB RAM processing a set of 536,052 abstracts
  • 24. ENVO: source of environment descriptor names and synonyms http://environmentontology.org ~1600 terms, June 2013 ENA – 1st Dec 2014 – EBI, UK biome environmental feature environmental material environmental condition … … … … habitat … Based on slides by Dr. Pier Luigi Buttigier, AWI, Bremenhaven, Germany
  • 25. ENVIRONMENTS – Improving Accuracy ● Increasing matches in text ● orthographic variation supported e.g. freshwater, fresh water, and fresh-water ● Case-insensitive matching ● Synonym generation to reflect the way environment descriptive terms are mentioned in text (both generic and ENVO specific) Action Example ● Preventing overmatching (i.e. avoiding increased FP) ● „stopword-list” (e.g. spring, well, range) ENA – 1st Dec 2014 – EBI, UK Add a variant in which non-informative words have been removed epipelagic zone → epipelagic estuarine biome → estuarine Plural form addition sediment → sediments Adjective form addition lagoon → lagoonal
  • 26. Scope ENVO parts Not included: species tissues foods Limitations – Known Issues negation not supported conflicts with anatomy terms (e.g. mouth, blowhole) ENA – 1st Dec 2014 – EBI, UK
  • 27. ENVIRONMENTS – Sample Output eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477 ENA – 1st Dec 2014 – EBI, UK File Name Start coord End coord Match text ENVO ID Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
  • 28. ENVIRONMENTS – Sample Output eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477 ENA – 1st Dec 2014 – EBI, UK File Name Start coord End coord Match text ENVO ID Traversing all IS_A, PART_OF Relationships in ENVO Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221
  • 29. Download ENA – 1st Dec 2014 – EBI, UK ENVIRONMENTS • Home Page: http://environments.hcmr.gr/ • Tagger Software: http://download.jensenlab.org/environments_tagger.tar.gz
  • 30. other forms of access ENA – 1st Dec 2014 – EBI, UK
  • 31. ENA – 1st Dec 2014 – EBI, UK http://eol.org/info/discover_what
  • 32. ENA – 1st Dec 2014 – EBI, UK ID: ENVO:00000150 Name: coral reef ENVIRONMENTS ACTION ES1103 Interactive Curation http://www.ncbi.nlm.nih.gov/pubmed/18301735
  • 33. Interactive Curation ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 http://www.ncbi.nlm.nih.gov/pubmed/18301735
  • 34. Interactive Curation ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 http://www.ncbi.nlm.nih.gov/pubmed/18301735
  • 35. Interactive Curation ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 http://www.ncbi.nlm.nih.gov/pubmed/18301735
  • 36. Interactive Curation ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 http://www.ncbi.nlm.nih.gov/pubmed/18301735
  • 37. ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 Not only ENVO terms
  • 38. ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 http://www.ncbi.nlm.nih.gov/pubmed/18301735
  • 39. What else is being identified? ENA – 1st Dec 2014 – EBI, UK ACTION ES1103 ready you to discover!
  • 40. ENA – 1st Dec 2014 – EBI, UK ACTION ES1103
  • 41. Summary ! Importance of standardized metadata and annotations ! ENVO: Standardized hierarchically organized descriptions of environment types ! Literature, project and other scientific content web pages may describe the environment context of a metagenomics sample ENA – 1st Dec 2014 – EBI, UK ! ENVIRONMENTS: ! Dictionary-based environment descriptive term identification ! Ontological Community standards, e.g. ENVO: name source ! Command line application ! Browser extensions, a user-friendly interface ! Highly Interactive ! Can be used while browsing the web ! Extract ENVO from a selected part of a web page ! Extended for: ! Organism, diseases, and tissue mention identification
  • 42. Digging-out Information http://hartpurylrc.Photo by Dr Chatzinikolaou E files.wordpress.com ENA – 1st Dec 2014 – EBI, UK
  • 43. BioCreative: Metagenomics Track Critical Assessment of Information Extraction in Biology • Preparing a Metagenomics Track as part of the BioCreative 2015 challenge • Aim: improve the environmental-context annotation of sequences in major metagenomics repositories. • Track coordinator: Dr. L. Hirschman, MITRE • BioCreative (www.biocreative.org) ENA – 1st Dec 2014 – EBI, UK
  • 44. Biodiversity – Genomics ENVIRONMENTS-EOL http://environments-eol.blogspot.com/ Encyclopedia of Life (EOL) http://www.eol.org • process EOL taxon pages • extract environmental context (ENVO terms) • EOL Taxon Page: Quick Facts, Data tab • integrated in Traitbank • large scale biological questions Rubenstein Fellowship 2013 In collab: Jennifer Hammock, Patrick Leary, Katja Schulz, Cyndy Parr Hexanchus griseus EOL page, http://eol.org/pages/212027 SEQenv http://environments.hcmr.gr/seqenv.html • annotate microbial sequences with ENVO terms • sequence analysis, literature mining, visualization • GenBank isolation source, PubMed Abstracts • sample comparison, temporal/spatial pattern analysis • extension: proteins, protein families, 3D visualization Reused: Analysis of America bird habitats, http://blog.eol.org/ (NoPlaceLikeHome, in collab: Rob Stevenson, Carl Nordman) ACTION ES1103 ENA – 1st Dec 2014 – EBI, UK
  • 45. http://jensenlab.org/ Santos A et al. (under review), preprint: http://biorxiv.org/content/early/2014/11/10/010975 Frankild S et al. (under review), preprint: http://biorxiv.org/content/early/2014/08/25/008425 Pafilis E et al. (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8(6): e65390 ENA – 1st Dec 2014 – EBI, UK
  • 46. Acknowledgements Thank You! HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou Lucia Fanini, Sarah Faulwetter, Anastasis Oulas NNF CPR: Lars Juhl Jensen, Sune Frankild U Mass: Rob Stevenson Uni Glasgow: Christopher Quince, Umer Ijaz EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz MM-MPI: J. Schnetzer, AWI: Dr P. Buttigieg, HITS: Dr. S. Berger and more Funding: EOL Rubenstein Fellowship, LifeWatch Greece, MARBIGEN, NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,”SEQenv” Hackathons (COST ES1103) ENA – 1st Dec 2014 – EBI, UK Amvrakikos Lagoons, May 2011 ACTION ES1103
  • 47. Acknowledgements Thank You! HCMR-IMBG: Christos Arvanitidis, Christina Pavloudi, Katerina Vasileiadou ENA – 1st Dec 2014 – EBI, UK id: ENVO:00000038 name: lagoon Amvrakikos Lagoons, May 2011 ACTION ES1103 Lucia Fanini, Sarah Faulwetter, Anastasis Oulas NNF CPR: Lars Juhl Jensen, Sune Frankild U Mass: Rob Stevenson Uni Glasgow: Christopher Quince, Umer Ijaz EOL: Cynthia Parr, Jennifer Hammock, Patrick Leary, Katja Schulz MM-MPI: J. Schnetzer, AWI: Dr P. Buttigieg, and more Funding: EOL Rubenstein Fellowship, LifeWatch Greece, MARBIGEN, NNF-CPR, EOL-BHL NESCent Researh, Sprint 2014,”SEQenv” Hackathons (COST ES1103)
  • 48. Tutorial • Start Firefox • Install the “megx-seqenv-bar.xpi” • Drug and Drop • “Install Now” and “Restart” • Visit a couple of PubMed abstracts or article web pages of your preference • Annotate the complete abstract, • Annotate selected sentences only ENA – 1st Dec 2014 – EBI, UK