SlideShare una empresa de Scribd logo
1 de 27
Unlocking knowledge in biodiversity
legacy literature through automatic
semantic metadata extraction
Riza Batista-Navarro, William Ulate, Jennifer
Hammock, Georgios Kontonatsios, Trish
Rose-Sandler and Sophia Ananiadou
Structured
Data
? Text
Mining
http://miningbiodiversity.org
The partners
Social Media Lab
410/9/2015 Mining Biodiversity
Mining Biodiversity
• Transform BHL into a next-generation social
digital library
• A multi-disciplinary approach
– Text Mining
– Machine learning
– History of Science
– Environmental History & Studies
– Library and Information Science
– Social Media
510/9/2015 Mining Biodiversity
What do we want to do?
Social Media
Visualisation
Semantic
Metadata
610/9/2015 Mining Biodiversity
Biodiversity Heritage Library
• a consortium of botanical and natural history
libraries
• stores digitised legacy literature on
biodiversity
• currently holds 160,000 volumes = millions of
pages (PDFs and OCR-generated text)
• open-access
710/9/2015 Mining Biodiversity
Current features
• supports keyword-based search
• species names annotated and linked to the
Encyclopedia of Life
• integrates automatic taxonomic name finding
tools (uBio Taxonfinder)
• data access through export functionalities and
Web services
810/9/2015 Mining Biodiversity
Keyword-based search
and Browsing
Advanced search
(also keyword-based)
10/9/2015 10Mining Biodiversity
What’s wrong with
keyword-based search?
• Ambiguity!
Boxwood
historic place in
Alabama?
North American term for
plants in the Buxaceae
family?
Box
container?
Boxwood for other English-
speaking countries?
What’s wrong with
keyword-based search?
• Ambiguity!
California bay
hardwood
tree?
location?
Drum
musical
instrument?
fish?
What’s wrong with
keyword-based search?
• Ambiguity!
Emperor
fish?
person?
Scrambled eggs
food?
plant?
Semantic metadata generation
• Entity types
– species
– location
– habitat
– anatomical parts
– qualities
– persons
– temporal expressions
• Association types
– observation
– Habitation
– nutrition
– trait
10/9/2015 Mining Biodiversity 14
Examples of semantic metadata
(annotations)
• Observation
• Habitation
Examples of semantic metadata
(annotations)
• Nutrition
• Trait
How does semantic
information help?
SPECIES:
California bay
hardwood tree
location
LOCATION:
California bay
Text mining-based approach
Seed
documents
Unlabelled
documents
Learn semantics
Annotator/Curator
Validate
Feedback
Annotate
Search
index
Store
Annotate
Automatic annotation by
text mining (TM)
– Web-based, graphical TM workbench
– conforms with the Unstructured Information
Management Architecture (UIMA) standard
– facilitates the straightforward integration of
various analytics into workflows
– allows for the validation of annotations
10/9/2015 Mining Biodiversity 19
interface
10/9/2015 20Mining Biodiversity
Learning semantics
• Training of models using machine learning
– conditional random fields (CRFs) for sequence
labelling
– learning the features of mentions and relations of
interest based on labelled documents
• contextual features: surrounding, co-occurring words
• dictionary matches: presence of certain words in
controlled vocabularies, e.g., Catalogue of Life,
Phenotype and Trait Ontology, Gazetteer
10/9/2015 Mining Biodiversity 21
interface
10/9/2015 22Mining Biodiversity
Annotation workflowPre-
processing
Dictionary
lookup
Machine
learning-based
recognition
Relation
extraction
Saving
Validation interface
Enhanced searching of BHL content
Faceted
search
Automatically
generated
questions
Time-
sensitive
search
Enhanced document viewing
Page in
PDF/image
format
OCR-corrected text
with colour-coded
annotations
Conclusions
• Literature is a rich source of information but
difficult to search
• Keyword-based search not enough to address
ambiguity
• Semantic metadata allows for more accurate
searching
• Semantic metadata can be extracted using text
mining tools
• The Argo text mining workbench facilitates the
construction of custom semantic metadata
generation workflows

Más contenido relacionado

La actualidad más candente

We've Got Issues: Issue Tracking and Workflow in the Digital Library
We've Got Issues: Issue Tracking and Workflow in the Digital LibraryWe've Got Issues: Issue Tracking and Workflow in the Digital Library
We've Got Issues: Issue Tracking and Workflow in the Digital LibraryElectronic Resources & Libraries
 
2009 05 20 Cimc Pilsk
2009 05 20 Cimc Pilsk2009 05 20 Cimc Pilsk
2009 05 20 Cimc PilskSCPilsk
 
Bhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationBhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationtgarnett
 
Building a Global Library of Taxonomic Literature
Building a Global Library of Taxonomic LiteratureBuilding a Global Library of Taxonomic Literature
Building a Global Library of Taxonomic LiteratureMartin Kalfatovic
 
Cybertaxonomy may 31 2011
Cybertaxonomy may 31 2011Cybertaxonomy may 31 2011
Cybertaxonomy may 31 2011tgarnett
 
The Biodiversity Heritage Library: Workflow Overview
The Biodiversity Heritage Library: Workflow OverviewThe Biodiversity Heritage Library: Workflow Overview
The Biodiversity Heritage Library: Workflow OverviewMartin Kalfatovic
 
Smithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSmithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSCPilsk
 
“Yet Another BHL Presentation”: The Biodiversity Heritage Library
“Yet Another BHL Presentation”: The Biodiversity Heritage Library“Yet Another BHL Presentation”: The Biodiversity Heritage Library
“Yet Another BHL Presentation”: The Biodiversity Heritage LibraryMartin Kalfatovic
 
M sc advanced food marketing finding info
M sc advanced food marketing   finding infoM sc advanced food marketing   finding info
M sc advanced food marketing finding infonmjb
 
Stage 2 animal science finding info
Stage 2 animal science   finding infoStage 2 animal science   finding info
Stage 2 animal science finding infonmjb
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010tgarnett
 
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library ProjectSmithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library ProjectMartin Kalfatovic
 
Digital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage LibraryDigital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage LibraryMartin Kalfatovic
 
3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage LibraryMartin Kalfatovic
 
Botany and the BHL: A Botanical Overview of the Biodiversity Heritage Library
Botany and the BHL: A Botanical Overview of the Biodiversity Heritage LibraryBotany and the BHL: A Botanical Overview of the Biodiversity Heritage Library
Botany and the BHL: A Botanical Overview of the Biodiversity Heritage LibraryMartin Kalfatovic
 
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeBiodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
 
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeThe Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeMartin Kalfatovic
 
Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...
Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...
Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...Trish Rose-Sandler
 
The Biodiversity Heritage Library. 10+1 and Beyond: Looking Forward
The Biodiversity Heritage Library. 10+1 and Beyond: Looking ForwardThe Biodiversity Heritage Library. 10+1 and Beyond: Looking Forward
The Biodiversity Heritage Library. 10+1 and Beyond: Looking ForwardMartin Kalfatovic
 
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment ICZN
 

La actualidad más candente (20)

We've Got Issues: Issue Tracking and Workflow in the Digital Library
We've Got Issues: Issue Tracking and Workflow in the Digital LibraryWe've Got Issues: Issue Tracking and Workflow in the Digital Library
We've Got Issues: Issue Tracking and Workflow in the Digital Library
 
2009 05 20 Cimc Pilsk
2009 05 20 Cimc Pilsk2009 05 20 Cimc Pilsk
2009 05 20 Cimc Pilsk
 
Bhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaborationBhl knowledge-ecology-rlg-collaboration
Bhl knowledge-ecology-rlg-collaboration
 
Building a Global Library of Taxonomic Literature
Building a Global Library of Taxonomic LiteratureBuilding a Global Library of Taxonomic Literature
Building a Global Library of Taxonomic Literature
 
Cybertaxonomy may 31 2011
Cybertaxonomy may 31 2011Cybertaxonomy may 31 2011
Cybertaxonomy may 31 2011
 
The Biodiversity Heritage Library: Workflow Overview
The Biodiversity Heritage Library: Workflow OverviewThe Biodiversity Heritage Library: Workflow Overview
The Biodiversity Heritage Library: Workflow Overview
 
Smithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in ResearchSmithsonian Libraries Partnering in Research
Smithsonian Libraries Partnering in Research
 
“Yet Another BHL Presentation”: The Biodiversity Heritage Library
“Yet Another BHL Presentation”: The Biodiversity Heritage Library“Yet Another BHL Presentation”: The Biodiversity Heritage Library
“Yet Another BHL Presentation”: The Biodiversity Heritage Library
 
M sc advanced food marketing finding info
M sc advanced food marketing   finding infoM sc advanced food marketing   finding info
M sc advanced food marketing finding info
 
Stage 2 animal science finding info
Stage 2 animal science   finding infoStage 2 animal science   finding info
Stage 2 animal science finding info
 
Eol fellow-march2010
Eol fellow-march2010Eol fellow-march2010
Eol fellow-march2010
 
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library ProjectSmithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
 
Digital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage LibraryDigital Services Division & The Biodiversity Heritage Library
Digital Services Division & The Biodiversity Heritage Library
 
3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library3 Years On: The Biodiversity Heritage Library
3 Years On: The Biodiversity Heritage Library
 
Botany and the BHL: A Botanical Overview of the Biodiversity Heritage Library
Botany and the BHL: A Botanical Overview of the Biodiversity Heritage LibraryBotany and the BHL: A Botanical Overview of the Biodiversity Heritage Library
Botany and the BHL: A Botanical Overview of the Biodiversity Heritage Library
 
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of LifeBiodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
Biodiversity Heritage Library: Cornerstone of the Encyclopedia of Life
 
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of LifeThe Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
The Biodiversity Heritage Library: A Cornerstone of the Encyclopedia of Life
 
Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...
Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...
Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Li...
 
The Biodiversity Heritage Library. 10+1 and Beyond: Looking Forward
The Biodiversity Heritage Library. 10+1 and Beyond: Looking ForwardThe Biodiversity Heritage Library. 10+1 and Beyond: Looking Forward
The Biodiversity Heritage Library. 10+1 and Beyond: Looking Forward
 
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
 

Destacado

Mastering sap business objects 2011
Mastering sap business objects 2011Mastering sap business objects 2011
Mastering sap business objects 2011ldasss
 
Media
MediaMedia
MediaLaura
 
реклама стокгольма
реклама стокгольмареклама стокгольма
реклама стокгольмаguest2adea9
 
Dmd Group West101009
Dmd Group West101009Dmd Group West101009
Dmd Group West101009dmdwest
 
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11William Ulate
 
A new flora fauna mycota should...
A new flora fauna mycota should...A new flora fauna mycota should...
A new flora fauna mycota should...William Ulate
 
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...William Ulate
 

Destacado (7)

Mastering sap business objects 2011
Mastering sap business objects 2011Mastering sap business objects 2011
Mastering sap business objects 2011
 
Media
MediaMedia
Media
 
реклама стокгольма
реклама стокгольмареклама стокгольма
реклама стокгольма
 
Dmd Group West101009
Dmd Group West101009Dmd Group West101009
Dmd Group West101009
 
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
 
A new flora fauna mycota should...
A new flora fauna mycota should...A new flora fauna mycota should...
A new flora fauna mycota should...
 
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
 

Similar a Unlocking knowledge in biodiversity legacy literature through automatic semantic metadata extraction

FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?EUDAT
 
Evolving Scholarly Record - implications for rank and reputation assessment
Evolving Scholarly Record - implications for rank and reputation assessmentEvolving Scholarly Record - implications for rank and reputation assessment
Evolving Scholarly Record - implications for rank and reputation assessmentConstance Malpas
 
Transformation of library and information science: Resources, services and pr...
Transformation of library and information science: Resources, services and pr...Transformation of library and information science: Resources, services and pr...
Transformation of library and information science: Resources, services and pr...Nabi Hasan
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...innovatics
 
Bibliographic References in BHL
Bibliographic References in BHLBibliographic References in BHL
Bibliographic References in BHLWilliam Ulate
 
Νetworking content repositories to provide meaningful services to users
Νetworking content repositories to provide meaningful services to usersΝetworking content repositories to provide meaningful services to users
Νetworking content repositories to provide meaningful services to users Nikos Manouselis
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Getaneh Alemu
 
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...CALA-MW
 
Emerging trends in librarianship
Emerging trends in librarianshipEmerging trends in librarianship
Emerging trends in librarianshipH Anil Kumar
 
Ontology Web services for Semantic Applications
Ontology Web services for Semantic ApplicationsOntology Web services for Semantic Applications
Ontology Web services for Semantic ApplicationsTrish Whetzel
 
Information Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersInformation Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersEdeama Onwuchekwa
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for LibrariesThomas King
 
Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowTrish Whetzel
 

Similar a Unlocking knowledge in biodiversity legacy literature through automatic semantic metadata extraction (20)

FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
Ontology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortalOntology repositories and case study with OntoPortal
Ontology repositories and case study with OntoPortal
 
FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?FAIR data requires FAIR ontologies, how do we do?
FAIR data requires FAIR ontologies, how do we do?
 
Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)Presentation FAIRsFAIR workshop (April 2020)
Presentation FAIRsFAIR workshop (April 2020)
 
Evolving Scholarly Record - implications for rank and reputation assessment
Evolving Scholarly Record - implications for rank and reputation assessmentEvolving Scholarly Record - implications for rank and reputation assessment
Evolving Scholarly Record - implications for rank and reputation assessment
 
Semantic standards for the web
Semantic standards for the webSemantic standards for the web
Semantic standards for the web
 
Challenges for ontology repositories and applications to biomedicine and agro...
Challenges for ontology repositories and applications to biomedicine and agro...Challenges for ontology repositories and applications to biomedicine and agro...
Challenges for ontology repositories and applications to biomedicine and agro...
 
Transformation of library and information science: Resources, services and pr...
Transformation of library and information science: Resources, services and pr...Transformation of library and information science: Resources, services and pr...
Transformation of library and information science: Resources, services and pr...
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
 
Bibliographic References in BHL
Bibliographic References in BHLBibliographic References in BHL
Bibliographic References in BHL
 
4th Special Track on Metadata and Semantics for Agriculture, Food and Enviro...
4th Special Track on Metadata and Semanticsfor Agriculture, Food and Enviro...4th Special Track on Metadata and Semanticsfor Agriculture, Food and Enviro...
4th Special Track on Metadata and Semantics for Agriculture, Food and Enviro...
 
Νetworking content repositories to provide meaningful services to users
Νetworking content repositories to provide meaningful services to usersΝetworking content repositories to provide meaningful services to users
Νetworking content repositories to provide meaningful services to users
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)
 
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
Gaining Weight for Good Reason: Analysis of Fuller Bibliographic Records in S...
 
Emerging trends in librarianship
Emerging trends in librarianshipEmerging trends in librarianship
Emerging trends in librarianship
 
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
 
Ontology Web services for Semantic Applications
Ontology Web services for Semantic ApplicationsOntology Web services for Semantic Applications
Ontology Web services for Semantic Applications
 
Information Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information CentersInformation Retrieval Methods in Libraries and Information Centers
Information Retrieval Methods in Libraries and Information Centers
 
Change Management for Libraries
Change Management for LibrariesChange Management for Libraries
Change Management for Libraries
 
Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation Workflow
 

Más de William Ulate

Enhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxEnhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxWilliam Ulate
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryWilliam Ulate
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendlyWilliam Ulate
 
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...William Ulate
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014William Ulate
 
BHL Markup Efforts and Plans
BHL Markup Efforts and PlansBHL Markup Efforts and Plans
BHL Markup Efforts and PlansWilliam Ulate
 
Purposeful Gaming and BHL
Purposeful Gaming and BHLPurposeful Gaming and BHL
Purposeful Gaming and BHLWilliam Ulate
 
Fourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical UpdateFourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical UpdateWilliam Ulate
 
BHL Technical Update (May 2013)
BHL Technical Update (May 2013)BHL Technical Update (May 2013)
BHL Technical Update (May 2013)William Ulate
 
Global BHL Update May 2013
Global BHL Update May 2013Global BHL Update May 2013
Global BHL Update May 2013William Ulate
 
The BHL way to content
The BHL way to contentThe BHL way to content
The BHL way to contentWilliam Ulate
 
TDWG 2012 Poster for Art of Life project
TDWG 2012 Poster for Art of Life projectTDWG 2012 Poster for Art of Life project
TDWG 2012 Poster for Art of Life projectWilliam Ulate
 
BHL Technical Projects Updates
BHL Technical Projects UpdatesBHL Technical Projects Updates
BHL Technical Projects UpdatesWilliam Ulate
 
BHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable ResourceBHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable ResourceWilliam Ulate
 
Global BHL Meeting Action Items
Global BHL Meeting Action ItemsGlobal BHL Meeting Action Items
Global BHL Meeting Action ItemsWilliam Ulate
 

Más de William Ulate (15)

Enhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxEnhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptx
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital library
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
 
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
 
BHL Markup Efforts and Plans
BHL Markup Efforts and PlansBHL Markup Efforts and Plans
BHL Markup Efforts and Plans
 
Purposeful Gaming and BHL
Purposeful Gaming and BHLPurposeful Gaming and BHL
Purposeful Gaming and BHL
 
Fourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical UpdateFourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical Update
 
BHL Technical Update (May 2013)
BHL Technical Update (May 2013)BHL Technical Update (May 2013)
BHL Technical Update (May 2013)
 
Global BHL Update May 2013
Global BHL Update May 2013Global BHL Update May 2013
Global BHL Update May 2013
 
The BHL way to content
The BHL way to contentThe BHL way to content
The BHL way to content
 
TDWG 2012 Poster for Art of Life project
TDWG 2012 Poster for Art of Life projectTDWG 2012 Poster for Art of Life project
TDWG 2012 Poster for Art of Life project
 
BHL Technical Projects Updates
BHL Technical Projects UpdatesBHL Technical Projects Updates
BHL Technical Projects Updates
 
BHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable ResourceBHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable Resource
 
Global BHL Meeting Action Items
Global BHL Meeting Action ItemsGlobal BHL Meeting Action Items
Global BHL Meeting Action Items
 

Último

Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 

Último (20)

Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 

Unlocking knowledge in biodiversity legacy literature through automatic semantic metadata extraction

Notas del editor

  1. Most of us in the biodiversity informatics community are reliant on curated databases such as EOL (click) and NCBI Taxonomy (click). Indeed, they are some of the most fundamental sources of structured information that is critical to understanding biodiversity (click) Another rich, albeit less exploited resource is biodiversity literature (click) which provides possibly even more comprehensive information, considering that any significant findings have most likely been published in one form of writing or another: in reports, articles, books or monographs. However, unlike curated databases which provide information in a structured, readily computable form, literature collections are characterised by copious textual data expressed in natural language. This unstructured and voluminous nature of literature makes it difficult to find information of interest, thus posing a barrier to knowledge accessibility and discovery (click). As many of you know, the Biodiversity Heritage Library or BHL holds the biggest literature collection on biodiversity. In this talk, I will be describing our work on how we are extracting semantic content from BHL and putting it in a structured form that is a lot easier to access and search (click), and how we’re using text mining as the enabling technology for this (click).
  2. We are doing this work as part of a project funded by the transatlantic Digging Into Data program called Mining Biodiversity.
  3. In a nutshell, we have incorporated into BHL three elements, as part of the Mining Biodiversity project: Visualisation, Social Media and Semantic Metadata. The rest of this talk will be focussing on the extraction of semantic metadata aspect (click).
  4. One might say, I’m currently very much happy with how I’m searching BHL. What’s wrong with keywords? Well then, the answer to that is ambiguity! If one searches for “Boxwood”, a keyword-based system wouldn’t know if he/she was referring to a place in Alabama, or the North American term for plants under the Buxaceae family. It will just return all documents pertaining to both. Nor will it know if a query “Box” pertains to the same plant family because apparently this is how other English-speaking countries refer to it, or a container.
  5. Or “California bay”. A keyword-based system will not know if the user is referring to the hardwood tree or some location. What about “Drum”? Is it a fish or a musical instrument?
  6. “Emperor” too. It wouldn’t know if the user wants the fish or a person. Event “Scrambled eggs”. Is it breakfast or the plant known as such?
  7. To alleviate such issues we are enriching BHL content with semantic metadata. To this end, we are marking up mentions of different entity and association types within text. For entities, we are capturing species, locations, habitats, anatomical parts, qualities, people and temporal expressions. To capture associations, we link up these entities to encapsulate relationships such as observation, habitation and nutrition.
  8. So why does semantic information help? With semantic categorisation of terms, for example, if a user specified that he/she is looking for California bay in the SPECIES sense of the term, the system knows it should look for documents which contain a species entity of that name. And if the user specifies he/she is looking for a LOCATION called California bay, then similarly the system knows it should look for documents in which “California bay” has been annotated as a name of a place or location.
  9. In fleshing out the semantics from BHL documents, we took a text mining-based approach, the overall architecture of which is depicted in this figure (click). Firstly, we set aside a seed set of documents which were manually annotated (click). This set was used by our system to learn the semantics, i.e., entities and associations, in the documents (click). The system then applies what it learns on unlabelled documents (click). The annotations the system produces on these documents are then validated manually by an expert (click). Whatever corrections the expert makes are fed back into the system and are used by the system to learn again, in order to improve itself. (Active Learning) When the performance of the system is satisfactory, we run the final version of the system on the whole BHL collection and (click) store all of the generated annotations or semantic metadata in a search index, e.g., Solr. This index is what we’re using to complement the bibliographic metadata in BHL.
  10. This is Argo’s main interface. Argo comes with a library of various text mining components, which you can see on the left panel. Basically, these components can be dragged and dropped to the canvas in the middle which serves as a block diagramming tool. The user can then arrange these components according to the desired order of processing, and interconnect them to form a pipeline or workflow.
  11. What did we mean earlier by “learning semantics”? How does the text mining system or Argo workflow do this?
  12. This is Argo’s main interface. Argo comes with a library of various text mining components, which you can see on the left panel. Basically, these components can be dragged and dropped to the canvas in the middle which serves as a block diagramming tool. The user can then arrange these components according to the desired order of processing, and interconnect them to form a pipeline or workflow.
  13. This is the workflow that we put together using Argo. Without going too much into detail, I will just point out the general types of processing it tries to do: pre-processing (sentence splitting, tokenisation and part-of-speech tagging), matching against dictionaries or controlled vocabularies such as the ENVO and PATO ontologies, machine learning-based recognition of entities, extraction of relations based on the results of dependency parsing, and serialisation of the generated annotations.
  14. Additionally, Argo allows users to validate or correct any of the automatically generated annotations.