Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

BHL Technologies: Review for BHL-Australia

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 49 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a BHL Technologies: Review for BHL-Australia (20)

Más de Chris Freeland (15)

Anuncio

Más reciente (20)

BHL Technologies: Review for BHL-Australia

  1. 1. TECHNOLOGY Chris Freeland, Technical Director
  2. 2. Topics Covered <ul><li>Development History </li></ul><ul><li>Usage </li></ul><ul><li>Scanning & Content Acquisition </li></ul><ul><li>Technologies </li></ul><ul><ul><li>Data Mining </li></ul></ul><ul><ul><li>Services & APIs </li></ul></ul><ul><ul><li>CiteBank </li></ul></ul><ul><li>Global BHL </li></ul>
  3. 3. http://www.biodiversitylibrary.org/item/38659
  4. 4. Tech History <ul><li>Preliminary work: MOBOT’s Botanicus </li></ul><ul><ul><li>http://www.botanicus.org </li></ul></ul><ul><li>Funded by Keck Foundation & IMLS </li></ul><ul><li>Working demonstration of how nomenclators/databases (like Tropicos) can link into digitized scientific literature </li></ul><ul><li>Codebase reused for BHL, then changed to fit requirements for EOL </li></ul>
  5. 5. Usage
  6. 6. Referrers: 2008 - 2009
  7. 7. Referrers: 2010 Jan 1 – Mar 15, 2010
  8. 8. SCANNING & CONTENT ACQUISITION
  9. 9. Workflow Selection Preparation Post Production (Re)publication Digitization Conservation
  10. 11. Complexities of distributed, mass scanning from NYBG from Smithsonian
  11. 12. BHL ScanList http://bhl.nhm-wien.ac.at/scanlist/index.php
  12. 14. http://bhl.nhm-wien.ac.at/scanlist/index.php/Bibs/view/1018
  13. 15. Scanning = human work
  14. 16. Scan & Store: Internet Archive Scanning on Scribes Storage in Petaboxes
  15. 17. Scanning Derivatives <ul><li>XML </li></ul><ul><li>JP2 </li></ul><ul><li>PDF </li></ul><ul><li>JPG </li></ul><ul><li>TXT </li></ul><ul><li>DJVu </li></ul>Master Derivatives PDF OCR XML JP2
  16. 18. Ingest from other IA Partners <ul><li>Used mixture of subject analysis & other bibliographic metadata to identify content for inclusion in BHL </li></ul>
  17. 19. BHL TECHNOLOGIES
  18. 20. Distributed (Somewhat) Internet Archive: Digitized content / files MOBOT: Database & web application MBL: Redundant cluster
  19. 21. BHL Development Team
  20. 22. http://biodiversitylibrary.org/page/10165550
  21. 23. Image from Scanner Converted to text via OCR Name finding via TaxonFinder Extract names Submit to NameBank SOAP response Name Finding in action with Taxonomic Intelligence…
  22. 24. http://biodiversitylibrary.org/page/10165550
  23. 25. http://biodiversitylibrary.org/name/Petalostigma_banksii
  24. 28. http://eol.org/pages/1153286
  25. 29. Name finding statistics <ul><li>30 million pages scanned </li></ul><ul><li>70 million name strings found </li></ul><ul><li>60 million names verified with a NameBankID </li></ul><ul><li>1.5 million unique names with a NameBankID </li></ul><ul><li>3.5 million unique names *without* a NameBankID </li></ul><ul><ul><li>This is where the interesting data live!!! </li></ul></ul>
  26. 30. Services & APIs <ul><li>OpenURL </li></ul><ul><ul><li>Facilitate links to citations: protologues, articles, references </li></ul></ul><ul><ul><ul><li>Documentation: http://www.biodiversitylibrary.org/openurlhelp.aspx </li></ul></ul></ul><ul><ul><li>Useful to Nomenclators, Reference Systems </li></ul></ul><ul><ul><ul><li>IPNI </li></ul></ul></ul><ul><ul><ul><li>Tropicos </li></ul></ul></ul><ul><li>Names Service </li></ul><ul><ul><li>Return all occurrences of a name throughout BHL digitized corpus </li></ul></ul><ul><ul><ul><li>Documentation: http://bit.ly/2e6sg9 </li></ul></ul></ul><ul><ul><li>Working out a strategy for obscure species </li></ul></ul><ul><ul><li>Algorithm improvements to detect nomenclatural & taxonomic acts </li></ul></ul><ul><li>New API </li></ul>
  27. 31. http://www.biodiversitylibrary.org/openurl? pid=title:3934&volume=14&issue=&spage=301&date=1879 http://www.tropicos.org/Name/1200408
  28. 32. Services: OpenURL Disambiguation <ul><li>Looking for: </li></ul><ul><li>BHL returns: </li></ul>
  29. 33. Services: OpenURL Results
  30. 34. But where are the articles?? <ul><li>BHL scans cover to cover for monographs & serials </li></ul><ul><li>Have tested automated markup and article boundary extraction techniques </li></ul><ul><ul><li>Variety of typefaces & printing techniques make a wholly automated solution close to impossible </li></ul></ul><ul><li>So, when in need, crowdsource… </li></ul>
  31. 39. PDF Generation Stats
  32. 40. No, really, where are the articles?
  33. 41. http://www.citebank.org
  34. 42. http://citebank.org/search
  35. 43. http://citebank.org/node/47423
  36. 44. CiteBank boundaries Scanned Books Citation Pageturning UI PDF OCR eBook/Kindle Stored *somewhere* & retrievable via HTTP URI Citation Citation Citation Bibliography CiteBank
  37. 45. TOWARDS A GLOBAL BHL
  38. 46. Opportunities <ul><li>New technologies </li></ul><ul><ul><li>BHL-Europe: Scan List </li></ul></ul><ul><li>New use cases & user communities </li></ul><ul><ul><li>BHL-Europe: Cultural history </li></ul></ul><ul><li>New initiatives </li></ul><ul><ul><li>Data mining, markup, text correction </li></ul></ul><ul><li>Redundancy, localization </li></ul><ul><li>CONTENT!! </li></ul>
  39. 48. BHL is… <ul><li>A unique software tool </li></ul><ul><li>Built to serve taxonomists’ & other scientists’ research </li></ul><ul><li>Enhanced by 250+ years of accumulated knowledge </li></ul><ul><li>Complementary to physical libraries </li></ul><ul><li>A shared, global resource </li></ul><ul><li>An unparalleled opportunity for collaboration </li></ul>
  40. 49. Thanks! <ul><li>Chris Freeland </li></ul><ul><li>Technical Director, BHL </li></ul><ul><li>Director, Center for Biodiversity Informatics, </li></ul><ul><li>Missouri Botanical Garden </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>http://twitter.com/chrisfreeland </li></ul></ul>

×