SlideShare una empresa de Scribd logo
1 de 22
An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments  Chris Freeland Technical Director, BHL Director of Bioinformatics,  Missouri Botanical Garden
Goals of BHL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],http://www.biodiversitylibrary.org
BHL Institutions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Now Online Only 290 million to go! See you in 2048!
Scanning Operations ,[object Object],[object Object],[object Object],[object Object],[object Object],Locations of BHL/IA Scanning Centers
Complexities of distributed, mass scanning from NYBG from Smithsonian
Open Access Data The snakes of Australia ; an illustrated and descriptive catalogue of all the known species. By Gerard Krefft...   Publisher: Sydney,T. Richards, Government Printer,1869.  PDF OCR XML JP2
Name Finding via  TaxonFinder
Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
Name Finding Stats to date * ,[object Object],[object Object],[object Object],[object Object],*19 October 2008
 
 
APIs & Data Sharing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Name Finding Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],See Poster in hall
Characteristics of sample = 86.91% 2610 Total Number of Unique Names 3003 Total Number of Names 7.7 Average Number of Names per Page 446.8 Average Number of Words per Page 392 Number of Pages
OCR error rate  for names only Top OCR errors Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. e->o 14 c->e 7 h->ii 13 i->l 6 h->l 12 u->n 5 u->ii 11 u->I 4 r->i 10 e->c 3 l->i 9 Omit Space 2 n->v 8 Insert Space 1 35.16%
Performances of algorithms TaxonFinder FAT Excluding names with OCR errors Including names with OCR errors 28.20% 40.32% Precision 23.34% 36.62% Recall 25.77% 38.47% F-score 32.25% 43.77% Precision 17.21% 25.82% Recall 24.73% 34.80% F-score
Considerations ,[object Object],[object Object],[object Object],[object Object],[object Object]
Recommendations ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Up next: BHL Article Repository ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
And if that wasn’t enough… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Contact ,[object Object],[object Object],[object Object],[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...datacite
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentivesdatacite
 
Lines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access RepositoriesLines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access RepositoriesGaz Johnson
 
NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)Dag Endresen
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Juan Sequeda
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning DataBahareh Heravi
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
DataCite at APE 2011
DataCite at APE 2011DataCite at APE 2011
DataCite at APE 2011datacite
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themRoss Mounce
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century ResearchRoss Mounce
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesJenny Molloy
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureRoss Mounce
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)dgarijo
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publicationsdgarijo
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataRoberto García
 

La actualidad más candente (20)

2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
2013 DataCite Summer Meeting - FundRef cooperation with CrossRef (Chuck Koshe...
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentives
 
Lines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access RepositoriesLines of Communication: Effectively Advocating Open Access Repositories
Lines of Communication: Effectively Advocating Open Access Repositories
 
NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)NGB Documenation System SESTO (4 February 2004)
NGB Documenation System SESTO (4 February 2004)
 
Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010Consuming Linked Data by Machines - WWW2010
Consuming Linked Data by Machines - WWW2010
 
Data Journalism - Cleaning Data
Data Journalism - Cleaning DataData Journalism - Cleaning Data
Data Journalism - Cleaning Data
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
DataCite at APE 2011
DataCite at APE 2011DataCite at APE 2011
DataCite at APE 2011
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Modern Tools & Rationales for 21st Century Research
Modern Tools & Rationales  for 21st Century ResearchModern Tools & Rationales  for 21st Century Research
Modern Tools & Rationales for 21st Century Research
 
Friday talk 11.02.2011
Friday talk 11.02.2011Friday talk 11.02.2011
Friday talk 11.02.2011
 
SciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro SlidesSciDataCon 2014 TDM Workshop Intro Slides
SciDataCon 2014 TDM Workshop Intro Slides
 
Data analytics courses
Data analytics coursesData analytics courses
Data analytics courses
 
Open Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | FutureOpen Research Data: Licensing | Standards | Future
Open Research Data: Licensing | Standards | Future
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Materials informatics
Materials informaticsMaterials informatics
Materials informatics
 
Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)Research Objects Tutorial (TPDL)
Research Objects Tutorial (TPDL)
 
Research Objects in Scientific Publications
Research Objects in Scientific PublicationsResearch Objects in Scientific Publications
Research Objects in Scientific Publications
 
Triplificating and linking XBRL financial data
Triplificating and linking XBRL financial dataTriplificating and linking XBRL financial data
Triplificating and linking XBRL financial data
 
Sql can be cool again
Sql can be cool againSql can be cool again
Sql can be cool again
 

Similar a An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments

BHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaBHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaChris Freeland
 
Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...Chris Freeland
 
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014William Ulate
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the partsCarole Goble
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounKaren S Calhoun
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Getaneh Alemu
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45minsDimitrios Koureas
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your DataDuncan Hull
 
Training daypresentation
Training daypresentationTraining daypresentation
Training daypresentationAmy Fry
 
OhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front EndOhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front EndAmy Fry
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...Marko Rodriguez
 

Similar a An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments (20)

BHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaBHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-Australia
 
Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...Digitization and enhancement of biodiversity literature through OCR, scientif...
Digitization and enhancement of biodiversity literature through OCR, scientif...
 
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...OCLC Research @ U of Calgary: New directions for metadata workflows across li...
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
 
Research Objects: more than the sum of the parts
Research Objects: more than the sum of the partsResearch Objects: more than the sum of the parts
Research Objects: more than the sum of the parts
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
Global Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage LibraryGlobal Library of Life: The Biodiversity Heritage Library
Global Library of Life: The Biodiversity Heritage Library
 
Next Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 CalhounNext Generation Technical Services May 2009 Calhoun
Next Generation Technical Services May 2009 Calhoun
 
TIDSR
TIDSRTIDSR
TIDSR
 
Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)Current metadata landscape in the library world (Getaneh Alemu)
Current metadata landscape in the library world (Getaneh Alemu)
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
 
Adding Meaning To Your Data
Adding Meaning To Your DataAdding Meaning To Your Data
Adding Meaning To Your Data
 
Training daypresentation
Training daypresentationTraining daypresentation
Training daypresentation
 
OhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front EndOhioLINK ERM Forum: The Front End
OhioLINK ERM Forum: The Front End
 
FAIRy Stories
FAIRy StoriesFAIRy Stories
FAIRy Stories
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 

Más de Chris Freeland

From Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-IgoeFrom Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-IgoeChris Freeland
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Chris Freeland
 
Building the Missouri Hub for DPLA
Building the Missouri Hub for DPLABuilding the Missouri Hub for DPLA
Building the Missouri Hub for DPLAChris Freeland
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryChris Freeland
 
Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015Chris Freeland
 
Establishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLAEstablishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLAChris Freeland
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriChris Freeland
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Chris Freeland
 
Built Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage LibraryBuilt Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage LibraryChris Freeland
 
A Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural LibrariansA Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural LibrariansChris Freeland
 
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives Chris Freeland
 
MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)Chris Freeland
 
BHL: Your 24hr Library
BHL: Your 24hr LibraryBHL: Your 24hr Library
BHL: Your 24hr LibraryChris Freeland
 
Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)Chris Freeland
 
BHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy LiteratureBHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy LiteratureChris Freeland
 
Life & Literature Future Framing for BHL
Life & Literature Future Framing for BHLLife & Literature Future Framing for BHL
Life & Literature Future Framing for BHLChris Freeland
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataChris Freeland
 
Scribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated LiteratureScribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated LiteratureChris Freeland
 
Plant Name Services Using Tropicos
Plant Name Services Using TropicosPlant Name Services Using Tropicos
Plant Name Services Using TropicosChris Freeland
 

Más de Chris Freeland (20)

From Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-IgoeFrom Eames & Young to Pruitt-Igoe
From Eames & Young to Pruitt-Igoe
 
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
Documenting the Now: Supporting Scholarly Use & Preservation of Social Media ...
 
Building the Missouri Hub for DPLA
Building the Missouri Hub for DPLABuilding the Missouri Hub for DPLA
Building the Missouri Hub for DPLA
 
Documenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repositoryDocumenting Ferguson: Building a community digital repository
Documenting Ferguson: Building a community digital repository
 
Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015Newman Numismatic Portal Overview - Mar 2015
Newman Numismatic Portal Overview - Mar 2015
 
Establishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLAEstablishing the Missouri Hub: A Service Hub for DPLA
Establishing the Missouri Hub: A Service Hub for DPLA
 
Organizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in MissouriOrganizing a DPLA Service Hub in Missouri
Organizing a DPLA Service Hub in Missouri
 
Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...Pilots & Partnerships: University Academic Computing and University Libraries...
Pilots & Partnerships: University Academic Computing and University Libraries...
 
Built Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage LibraryBuilt Works Registry: Geocoding Biodiversity Heritage Library
Built Works Registry: Geocoding Biodiversity Heritage Library
 
A Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural LibrariansA Digitization Primer for Botanical and Horticultural Librarians
A Digitization Primer for Botanical and Horticultural Librarians
 
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
Mainstreaming Digital Imaging: Missouri Botanical Garden Archives
 
MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)MBG Rare Book Digitization Project (2003)
MBG Rare Book Digitization Project (2003)
 
BHL: Your 24hr Library
BHL: Your 24hr LibraryBHL: Your 24hr Library
BHL: Your 24hr Library
 
Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)Seeding links from Wikipedia to BHL (2008 - 2012)
Seeding links from Wikipedia to BHL (2008 - 2012)
 
BHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy LiteratureBHL: Assigning DOIs & Other Identifiers to Legacy Literature
BHL: Assigning DOIs & Other Identifiers to Legacy Literature
 
Global BHL Activities
Global BHL ActivitiesGlobal BHL Activities
Global BHL Activities
 
Life & Literature Future Framing for BHL
Life & Literature Future Framing for BHLLife & Literature Future Framing for BHL
Life & Literature Future Framing for BHL
 
Approaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic dataApproaches to preserving digitized taxonomic data
Approaches to preserving digitized taxonomic data
 
Scribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated LiteratureScribbles & Scraps: Darwin’s Library & Annotated Literature
Scribbles & Scraps: Darwin’s Library & Annotated Literature
 
Plant Name Services Using Tropicos
Plant Name Services Using TropicosPlant Name Services Using Tropicos
Plant Name Services Using Tropicos
 

Último

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments

  • 1. An evaluation of taxonomic name finding & next steps in Biodiversity Heritage Library (BHL) developments Chris Freeland Technical Director, BHL Director of Bioinformatics, Missouri Botanical Garden
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Complexities of distributed, mass scanning from NYBG from Smithsonian
  • 7. Open Access Data The snakes of Australia ; an illustrated and descriptive catalogue of all the known species. By Gerard Krefft... Publisher: Sydney,T. Richards, Government Printer,1869. PDF OCR XML JP2
  • 8. Name Finding via TaxonFinder
  • 9. Raw Image Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
  • 10.
  • 11.  
  • 12.  
  • 13.
  • 14.
  • 15. Characteristics of sample = 86.91% 2610 Total Number of Unique Names 3003 Total Number of Names 7.7 Average Number of Names per Page 446.8 Average Number of Words per Page 392 Number of Pages
  • 16. OCR error rate for names only Top OCR errors Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. e->o 14 c->e 7 h->ii 13 i->l 6 h->l 12 u->n 5 u->ii 11 u->I 4 r->i 10 e->c 3 l->i 9 Omit Space 2 n->v 8 Insert Space 1 35.16%
  • 17. Performances of algorithms TaxonFinder FAT Excluding names with OCR errors Including names with OCR errors 28.20% 40.32% Precision 23.34% 36.62% Recall 25.77% 38.47% F-score 32.25% 43.77% Precision 17.21% 25.82% Recall 24.73% 34.80% F-score
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.