SlideShare una empresa de Scribd logo
1 de 45
The BHL way to content

      William Ulate
     BHL Technical Director
     Global BHL Coordinator
        Leiden, Netherlands
         February 14, 2013
What is BHL?
              The Biodiversity Heritage Library is
a consortium of natural history and botanical libraries that
     cooperate to digitize and make accessible the legacy
   literature of biodiversity held in their collections and to
       make that literature available for open access and
       responsible use as a part of a global “biodiversity
                           commons.”
Extensive
Global…
New Partners and Geographies
Dear Sir / Madam Can i just
                              congratulate you on an
The freeing of knowledge      absolutely brilliant online
may lead to new               resource. I am compiling a
discoveries and changes       report on an invasive
                              hydromedusae and could not
in the way the natural        believe the ease and efficiency
world is perceived            of this web page which
                              genuinely saved me weeks of
                              my life

La plus grande
#bibliotheque #botanique &   Research that previously
#zoologique online The       took months now takes
largest online botanical &
                             only a few hours
zoological #library #BHL
More Online Content
               Pages (Millions) and Volumes (in Thousands)
                              included in BHL

120

                                                                   105.85
100
                                                       94.6
                                          84.86
 80


 60


 40                      40.00                                     38.9
                                                       35.4
                                          31.8
 20    22.00                                                              Volumes (K)
                         16.4
       9.2
                                                                          Pages (M)
  -
  Oct-08        Oct-09           Oct-10           Oct-11      Oct-12
Global Replication & Serving
     Replicated Data Center   Portal Application
> 390,000 views
  in 10 months
> 1200 sets
> 60,000+ images
The Art of Life project: describing and providing access to
natural history illustrations from the Biodiversity Heritage Library (BHL)
 Example of illustration described using Art of Life schema
                                                                  Title    Stictospiza formosa

                                                                  Type     Illustrations


                                                                  Date     Publication: 1898

                                                                 Agent     Author: Arthur G. Butler (1844-1925)
                                                                           Illustrator: F.W. Frohawk (1861-1946)


                                                        Description        A pair of finches with green and yellow bodies resting on reeds

                                                               Subjects    Scientific name: Amandava formosa (Latham, 1790)
                                                                           Vernacular Name: Green Avadavat or Green Munia
                                                                           Accepted Name: Amandava formosa (Latham, 1790)
                                                                           Birds, finches



                                                        Inscriptions       bottom center: Green Amaduvade Waxbill (Stictospiza formosa)

                                                                Source     Butler, Arthur Gardiner. Foreign finches in captivity. Hull and London: Brumby and
                                                                           Clarke, limited,1889 (2nd edition). This image comes from the Biodiversity Heritage
                                                                           Library, and is available online at biodiversitylibrary.org/page/17195895

                                                                 Rights    Public domain




 Art of Life schema elements required in Red
  Element          Definition                                              Examples                                                                      Repea
                                                                                                                                                         t

          Agents        person or corporate entity involved in               <vra:agent>                                                                     Y
                        the creation, design, production, or                  <vra:name type="personal" vocab="LCNAF" refid="89015596>
                        publication of a visual resource.                    Curtis,John</vra:name>
                                                                              <vra:dates type="life">
                                                                                 <vra:earliestDate>1791</vra:earliestDate>
                                                                                  <vra:latestDate>1862</vra:latestDate>
                                                                               </vra:dates>
                                                                               <vra:role vocab="AAT" refid="300025574">publisher</vra:role>
                                                                             </vra:agent>

  Copyright        The copyright status of the visual                     <vra:rights refid=”http://creativecommons.org/licenses/by-                    N
                   resource.                                              nc/2.0/deed.en”>Creative Commons Attribution-NonCommercial 2.0
                                                                          Generic (CC BY-NC 2.0)
                                                                          </vra:rights>


  Date             Date or range of dates associated with                  <vra:date type="creation">                                                    Y
                   the creation or publication of the visual                <vra:earliestDate>1945</vra:earliestDate>
                   resource.                                                <vra:latestDate>1955</vra:latestDate>
                                                                           </vra:date>


  Description      A free-text note about content of the                   <vra:description>This illustration shows a scale, coloured illustration       Y
                   image, including comments, description,                 of Sepsis annulipes (now known as Encita annulipes) beside the
                   or interpretation, that gives additional                Trifolium ochroleucum plant. Several dissections from Sepsis
                   information not recorded in other                       cylindrica Fab. (all these details are provided on the next page of this
                   categories.                                             book and the subsequent page).</vra:description>



  Inscriptions     All marks, caption, or written words                    <vra:inscription>                                                             Y
                   added to the object at the time of                        <vra:position>bottom</vra:position>
                   production or in its subsequent history,                  <vra:text>Radula of L. souleyetianum on a more
                   including signatures, dates, dedications,               reduced scale</vra:text>
                   texts, and colophons, as well as marks,                 </vra:inscription>
                   such as the stamps of silversmiths,
                   publishers, or printers.


  Source           A citation for the book, journal or                     <vra:source><vra:name type=”book”>Butler, Arthur Gardiner.                    N
                   resource that hosts the visual resource                 Foreign finches in captivity. HullBrumby and Clarke, limited,1889 (2nd
                                                                           edition). </vra:name>
                                                                              <vra:refid
                                                                           type=”URI”>http://biodiversitylibrary.org/page/17195895</vra:refid>
                                                                           </vra:source>


  Subject          Terms or phrases that describe, identify,               <vra:subject><vra:term type=”personalName”>Carl                               Y
                   or interpret the visual resource.                       Linnaeus</vra:term></vra:subject>

                                                                           <dwc:scientificName>Plant: Picea abies</dwc:scientificName>
                                                                           <dwc:acceptedName>Plant: Picea abies</dwc:acceptedName>
                                                                           <dwc:vernacularName>Plant: Norway spruce<dwc:vernacularName>                          We welcome your feedback on the schema! http://tinyurl.com/9hm7nsb
  Title            The title or identifying phrase given to an            <vra:title xml:lang=”la”>Sepsis annulipes</vra:title>                          Y
                   Image                                                  <vra:title type=“alternate”>Orangutan</vra:title>
Where are we?
• Scientific Name Extraction
   – Improved algorithm (Thanks uBio!)
• Articles
   – Extended BHL data model to store article metadata
   – Content and Process to harvest data from BioStor in place
• Create user interfaces for adding article metadata and
  associated files
   – Functional requirements defined
   – Process flow for adding article metadata and associated files
   – Implement UI changes
• Change BHL UI to accommodate article search
• Change BHL UI to accommodate article display (TOC)
Scientific Name Extraction
• TaxonFinder algorithm in production since
  2008
  – More than 100 million candidate name strings
  – More than 1.5 million unique, verified names
  – Available through UI, APIs, Data Exports & Internet
    Archive
• New collaboration with Global Names
  – Improved algorithm, better precision & recall
  – More data with TaxonFinder and Neti Neti!
Taxon Names
BEFORE

Name Instances   101,591,803   101,288,804
Unique Names       7,498,554     7,464,924
Verified Names     1,905,507     1,902,803
EOL Names         63,130,350    62,963,582
EOL Pages         13,579,868    13,532,684

AFTER

Name Instances   151,222,182   150,066,425
Unique Names      29,246,382    29,091,767
Verified Names    10,153,165    10,109,540
EOL Names         87,791,695    87,135,089
EOL Pages         15,466,713    15,342,867
Part-level metadata
• Disambiguating and locating structural
  components in the corpus
• Done by automated and crowdsourced means
  – Thanks Rod Page! Welcome others!
• Greatly increases semantic value of the dataset
• Addressing important – makes data addressable
  and thus linkable
Articles in the BHL UI
Images
PDF Generator
Support citation reconciliation
.
.Linneaus, C. Species Plantarum, vol. 2 p. 971. 1753
.Linné, Carl von. Sp. Pl. Vol. 2 Page 971. 1753
.
 Caroli Linnaei, Species Plantarum exhibentes plantas rite cognitas, ad genera
.relatas, cum Differentis Specificis, Nominibus Trivialibus,2:971. 1753 Selectis,
 Locis Natalibus, secundum SYSTEMA SEXUALE digestas..
                                                             Synonymis

.L. Sp. Pl. 2: 971. 1753
.

                                                                      Zea mays
Citations Providers
What we’d like to do
        http://biodivlib.wikispaces.com/BHL+and+Gaming
                                             ^Challenges framed as games

•   Improve OCR
•   Rekeying Tables of Contents
•   Researching candidate Scientific Names
•   Image identification & extraction
    – http://biodivlib.wikispaces.com/Art+of+Life
    – Currently funded by NEH
2007 Name Finding Study

    >35% OCR error rate for names only
   Of the 3,003 names, 1,056 were incorrectly transcribed by OCR.


                                                                       Top OCR errors
                                                                        1   Insert Space   8    n->v
                                 35.16%                                 2    Omit Space    9    l->i
                                                                        3       e->c       10   r->i
                                                                        4       u->I       11   u->ii
                                                                        5       u->n       12   h->l
Wei, et al. An Evaluation of Taxonomic Name Recognition (TNR) in the    6        i->l      13   h->ii
Biodiversity Heritage Library. Proceedings of TDWG. 2008.
http://www.tdwg.org/proceedings/article/view/380                        7       c->e       14   e->o
Abbild ungen und Beschreibungen
                   der

               Fische Syriens,
                    nebst
einer neuen Classification und Characteristik
           sämmtlicher Gattungen
                     der
                       i
             JOH. JAKOB HECKEL,
Inipectoi am k. k. Hof-Natur.-iUenkabinete in
    Wien, mehr, yelelirt. UeHtllMeii. MIfglivd.




               STUTTGART.
  E. Schweizerbart' sehe Verlagshandlung,
                   1843.
Older material
• Great deal of material is pre-1923
• Irregular fonts – blackletter
• Multiple languages on same page – English
  text with Latin scientific names
• Changes in geographic names
• Changes in scientific names
*E.xvi�c�piteI von c. cXx.WptdvonfnrWmn
bu�fbe;bcn.5 am cix bIa � S &3rn~ 41X
a�m cv(f b1air�'o�et ert oiensr �; �',
:�hlrfc�c wa ff�4am.diug bist a
6aiw~s ff oJrJtwt nof bL4ecImt& blfafra mem
b t wag `wr 4 cn wiu 4 e8t5m.ed bvUratflb ck
wuo, ma144'*4I bttE5rmbebt =rt3'kn am4ra
tif vrmr Waff C * t6rmnli an `tn�ciblatGteaM
w ?ffoaifrn w4wmeu nu weib e , wpiteI
voE5teiri ct c ober gtUcr cit cm` 91 cLi biar J '
>bSciatl�Oiff ;Bruet wacfttc n qmcx b1a bl:
bt5c lttmtt bb9 lkr w.llr#e iti ncn xoa ff cu :r
trtuft *e t � B Rn "� trv W1Rt' ?Cm c blas
waIwutr Ober �ci ti 1V Ces ' wt
gbtiemwwajfu tpctt, afferain 9 c: b�titbfof
�r f eran m rs bra wlg auig4;f aer�m *mc vrt
blatcabtfm wfru an'deg~m rt blas Iaum
bwWt� run f ncmai b14ianf tJobrrfan
ebrut4net vnber Brwt Ober awawi*m.crriii
btafwfm uww c on$ 'it ttu wttkc 5,10 $ m~C
fca trc* cx u W�e�&mcyfbq4 Mabtt mmw
rc a iiu bc Jcn ncI.end.*, blat s. a u:�rprd3
rw4ftf wm c ii,+ ttCC tn wa frr9fr orfab fcfbt
enb c optiti bt -r9 ceDa ttDcn i34M sn Sem i
Expanding scope
• Manuscripts, field notebooks –mostly
  handwritten, often with drawings
• Global expansion means dealing with non-
  Western script systems and a whole new set
  of OCR problems – Arabic materials from
  Bibliotheca Alexandria in Egypt
Images
OCR Improvements
• Gaming
• Transcription
OCR Improvements
• Transcription
• Purposeful Gaming
• Crowdsource Markup
Transcribe Bentham
• A collaboration of the University of London Computer
  Centre, UCL Library Services and UCL Learning and Media
  Services with consultation from the UCL Centre for Digital
  Humanities
• Volunteer users can log-in and transcribe previously
  unstudied and unpublished manuscripts from the Bentham
  Papers collection in UCL Library's Special Collections in the
  Transcription Desk.
• Since launch, volunteers from around the world have
  transcribed several thousand Bentham manuscripts to an
  extremely high standard.
• Results and findings:
  http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html
Transcribe Bentham
• Who were the volunteers?




• http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html
Transcribe Bentham
• Age ranges




• http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html
http://blog.winepresspublishing.com/2011/05/pubtoons-23-angry-books/
Purposeful Gaming



Space        Climate


                       Humanities


             Nature          Biology
Purposeful Gaming
DIGITALKOOT
• Joint project run by the National Library
  of Finland and Microtask to index the
  library's enormous archives so that they
  are searchable on the Internet for easier
  access to the Finnish cultural heritage.
• Launched on Feb 8 2011, nearly 110 000
  participants completed over 8 million
  word fixing tasks by Nov 29 2012
• DigiTalkoot enabled volunteers to
  participate in this fixing work by playing
  games.
Purposeful Gaming
DIGITALKOOT
• Joint project run by the National Library
  of Finland and Microtask to index the
  library's enormous archives so that they
  are searchable on the Internet for easier
  access to the Finnish cultural heritage.
• Launched on Feb 8 2011, nearly 110 000
  participants completed over 8 million
  word fixing tasks by Nov 29 2012
• DigiTalkoot enabled volunteers to
  participate in this fixing work by playing
  games.
Purposeful Gaming
DIGITALKOOT
• Joint project run by the National Library
  of Finland and Microtask to index the
  library's enormous archives so that they
  are searchable on the Internet for easier
  access to the Finnish cultural heritage.
• Launched on Feb 8 2011, nearly 110 000
  participants completed over 8 million
  word fixing tasks by Nov 29 2012
• DigiTalkoot enabled volunteers to
  participate in this fixing work by playing
  games.
OCR Improvements




German text interpreted by the OCR process as:
    “unb auf ben ©elnrgen be6 fublic{)en”
OCR Improvements
                                                       Transcription   Transcription
                         IA OCR            OCR 2
                                                             1               2

1                          unb              und            und             und         Ok

2                          den              ben            den             den         Ok

3                      ©elnrgen        ©ebirgen        Bebirgen        Gebirgen      X

4                          be6              des            de5             des         Chk

5                       fublic{)en       fublichen      Füdlichen       Südlichen      X

6                    £)eittfc{)(anb6   Deutfchlanbs   Deutfchlands    Deutschlands    X


      Different resulting texts from parsing the phrase:
    “und auf den Gebirgen des südlichen Deutschlands”
      (“and on the mountains of southern Germany”)
Crowdsource Markup
Display text                              Species Profile Model category

General/summary                           TaxonBiology

Geographic range                          Distribution

Habitat                                   Habitat

Food sources and feeding behavior         TrophicStrategy

Physical description (general)            Description

Physical description (detailed morphology) DiagnosticDescription
Thank you
William Ulate
Global BHL Project Manager / Technical Director
Missouri Botanical Garden
william.ulate@mobot.org
Skype: william_ulate_r

Más contenido relacionado

Destacado

Fourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical UpdateFourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical UpdateWilliam Ulate
 
Global BHL Update May 2013
Global BHL Update May 2013Global BHL Update May 2013
Global BHL Update May 2013William Ulate
 
5 grundpræmisser for digital deltagelse
5 grundpræmisser for digital deltagelse5 grundpræmisser for digital deltagelse
5 grundpræmisser for digital deltagelsePeter Vittrup
 
Youth and social media in Denmark and China
Youth and social media in Denmark and ChinaYouth and social media in Denmark and China
Youth and social media in Denmark and ChinaPeter Vittrup
 
Positions Currently Covered Under Special Employee Referral Scheme
Positions Currently Covered Under Special Employee Referral SchemePositions Currently Covered Under Special Employee Referral Scheme
Positions Currently Covered Under Special Employee Referral Schemeslide4sunil
 
Det udvidede designbegreb / Forandring gennem design
Det udvidede designbegreb / Forandring gennem designDet udvidede designbegreb / Forandring gennem design
Det udvidede designbegreb / Forandring gennem designPeter Vittrup
 
Our digital lives. Participation. Friends. NOW!
Our digital lives. Participation. Friends. NOW!Our digital lives. Participation. Friends. NOW!
Our digital lives. Participation. Friends. NOW!Peter Vittrup
 
Kend dine brugere - og lav engagerende indhold til Facebook
Kend dine  brugere - og lav engagerende indhold til FacebookKend dine  brugere - og lav engagerende indhold til Facebook
Kend dine brugere - og lav engagerende indhold til FacebookPeter Vittrup
 
Brug af Facebook i foreningen
Brug af Facebook i foreningenBrug af Facebook i foreningen
Brug af Facebook i foreningenPeter Vittrup
 
Borgerens digitale hverdag
Borgerens digitale hverdagBorgerens digitale hverdag
Borgerens digitale hverdagPeter Vittrup
 
Purposeful Gaming and BHL
Purposeful Gaming and BHLPurposeful Gaming and BHL
Purposeful Gaming and BHLWilliam Ulate
 

Destacado (11)

Fourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical UpdateFourth Global BHL Meeting - Technical Update
Fourth Global BHL Meeting - Technical Update
 
Global BHL Update May 2013
Global BHL Update May 2013Global BHL Update May 2013
Global BHL Update May 2013
 
5 grundpræmisser for digital deltagelse
5 grundpræmisser for digital deltagelse5 grundpræmisser for digital deltagelse
5 grundpræmisser for digital deltagelse
 
Youth and social media in Denmark and China
Youth and social media in Denmark and ChinaYouth and social media in Denmark and China
Youth and social media in Denmark and China
 
Positions Currently Covered Under Special Employee Referral Scheme
Positions Currently Covered Under Special Employee Referral SchemePositions Currently Covered Under Special Employee Referral Scheme
Positions Currently Covered Under Special Employee Referral Scheme
 
Det udvidede designbegreb / Forandring gennem design
Det udvidede designbegreb / Forandring gennem designDet udvidede designbegreb / Forandring gennem design
Det udvidede designbegreb / Forandring gennem design
 
Our digital lives. Participation. Friends. NOW!
Our digital lives. Participation. Friends. NOW!Our digital lives. Participation. Friends. NOW!
Our digital lives. Participation. Friends. NOW!
 
Kend dine brugere - og lav engagerende indhold til Facebook
Kend dine  brugere - og lav engagerende indhold til FacebookKend dine  brugere - og lav engagerende indhold til Facebook
Kend dine brugere - og lav engagerende indhold til Facebook
 
Brug af Facebook i foreningen
Brug af Facebook i foreningenBrug af Facebook i foreningen
Brug af Facebook i foreningen
 
Borgerens digitale hverdag
Borgerens digitale hverdagBorgerens digitale hverdag
Borgerens digitale hverdag
 
Purposeful Gaming and BHL
Purposeful Gaming and BHLPurposeful Gaming and BHL
Purposeful Gaming and BHL
 

Más de William Ulate

Enhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxEnhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxWilliam Ulate
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryWilliam Ulate
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendlyWilliam Ulate
 
Expanding Access to Biodiversity Literature. Mining Biodiversity.
Expanding Access to Biodiversity Literature. Mining Biodiversity.Expanding Access to Biodiversity Literature. Mining Biodiversity.
Expanding Access to Biodiversity Literature. Mining Biodiversity.William Ulate
 
Text Mining Biodiversity 20160127
Text Mining Biodiversity 20160127Text Mining Biodiversity 20160127
Text Mining Biodiversity 20160127William Ulate
 
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11William Ulate
 
Unlocking knowledge in biodiversity legacy literature through automatic seman...
Unlocking knowledge in biodiversity legacy literature through automatic seman...Unlocking knowledge in biodiversity legacy literature through automatic seman...
Unlocking knowledge in biodiversity legacy literature through automatic seman...William Ulate
 
Engaging the Citizen Scientist in Content Enhancement for BHL
Engaging the Citizen Scientist in Content Enhancement for BHLEngaging the Citizen Scientist in Content Enhancement for BHL
Engaging the Citizen Scientist in Content Enhancement for BHLWilliam Ulate
 
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...William Ulate
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014William Ulate
 
BHL Markup Efforts and Plans
BHL Markup Efforts and PlansBHL Markup Efforts and Plans
BHL Markup Efforts and PlansWilliam Ulate
 
A new flora fauna mycota should...
A new flora fauna mycota should...A new flora fauna mycota should...
A new flora fauna mycota should...William Ulate
 
BHL Technical Update (May 2013)
BHL Technical Update (May 2013)BHL Technical Update (May 2013)
BHL Technical Update (May 2013)William Ulate
 
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...William Ulate
 
BHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable ResourceBHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable ResourceWilliam Ulate
 
Global BHL Meeting Action Items
Global BHL Meeting Action ItemsGlobal BHL Meeting Action Items
Global BHL Meeting Action ItemsWilliam Ulate
 

Más de William Ulate (16)

Enhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptxEnhancing the WFO in support of GSPC.pptx
Enhancing the WFO in support of GSPC.pptx
 
Finding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital libraryFinding the annotation needs of the botanical community in a digital library
Finding the annotation needs of the botanical community in a digital library
 
Botanists and annotations printer friendly
Botanists and annotations   printer friendlyBotanists and annotations   printer friendly
Botanists and annotations printer friendly
 
Expanding Access to Biodiversity Literature. Mining Biodiversity.
Expanding Access to Biodiversity Literature. Mining Biodiversity.Expanding Access to Biodiversity Literature. Mining Biodiversity.
Expanding Access to Biodiversity Literature. Mining Biodiversity.
 
Text Mining Biodiversity 20160127
Text Mining Biodiversity 20160127Text Mining Biodiversity 20160127
Text Mining Biodiversity 20160127
 
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11BHL Tech Status Update Tech Director W.Ulate 2015.12.11
BHL Tech Status Update Tech Director W.Ulate 2015.12.11
 
Unlocking knowledge in biodiversity legacy literature through automatic seman...
Unlocking knowledge in biodiversity legacy literature through automatic seman...Unlocking knowledge in biodiversity legacy literature through automatic seman...
Unlocking knowledge in biodiversity legacy literature through automatic seman...
 
Engaging the Citizen Scientist in Content Enhancement for BHL
Engaging the Citizen Scientist in Content Enhancement for BHLEngaging the Citizen Scientist in Content Enhancement for BHL
Engaging the Citizen Scientist in Content Enhancement for BHL
 
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
Digitalización de Literatura de Biodiversidad: an overview of the BHL for CON...
 
BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014BHL Technical Director's Report, Mar. 2014
BHL Technical Director's Report, Mar. 2014
 
BHL Markup Efforts and Plans
BHL Markup Efforts and PlansBHL Markup Efforts and Plans
BHL Markup Efforts and Plans
 
A new flora fauna mycota should...
A new flora fauna mycota should...A new flora fauna mycota should...
A new flora fauna mycota should...
 
BHL Technical Update (May 2013)
BHL Technical Update (May 2013)BHL Technical Update (May 2013)
BHL Technical Update (May 2013)
 
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
The Biodiversity Heritage Library: an Open Global Resource of Literature for ...
 
BHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable ResourceBHL: Toward a Global, Sustainable Resource
BHL: Toward a Global, Sustainable Resource
 
Global BHL Meeting Action Items
Global BHL Meeting Action ItemsGlobal BHL Meeting Action Items
Global BHL Meeting Action Items
 

The BHL way to content

  • 1. The BHL way to content William Ulate BHL Technical Director Global BHL Coordinator Leiden, Netherlands February 14, 2013
  • 2. What is BHL? The Biodiversity Heritage Library is a consortium of natural history and botanical libraries that cooperate to digitize and make accessible the legacy literature of biodiversity held in their collections and to make that literature available for open access and responsible use as a part of a global “biodiversity commons.”
  • 5. New Partners and Geographies
  • 6. Dear Sir / Madam Can i just congratulate you on an The freeing of knowledge absolutely brilliant online may lead to new resource. I am compiling a discoveries and changes report on an invasive hydromedusae and could not in the way the natural believe the ease and efficiency world is perceived of this web page which genuinely saved me weeks of my life La plus grande #bibliotheque #botanique & Research that previously #zoologique online The took months now takes largest online botanical & only a few hours zoological #library #BHL
  • 7. More Online Content Pages (Millions) and Volumes (in Thousands) included in BHL 120 105.85 100 94.6 84.86 80 60 40 40.00 38.9 35.4 31.8 20 22.00 Volumes (K) 16.4 9.2 Pages (M) - Oct-08 Oct-09 Oct-10 Oct-11 Oct-12
  • 8. Global Replication & Serving Replicated Data Center Portal Application
  • 9.
  • 10. > 390,000 views in 10 months > 1200 sets > 60,000+ images
  • 11.
  • 12. The Art of Life project: describing and providing access to natural history illustrations from the Biodiversity Heritage Library (BHL) Example of illustration described using Art of Life schema Title Stictospiza formosa Type Illustrations Date Publication: 1898 Agent Author: Arthur G. Butler (1844-1925) Illustrator: F.W. Frohawk (1861-1946) Description A pair of finches with green and yellow bodies resting on reeds Subjects Scientific name: Amandava formosa (Latham, 1790) Vernacular Name: Green Avadavat or Green Munia Accepted Name: Amandava formosa (Latham, 1790) Birds, finches Inscriptions bottom center: Green Amaduvade Waxbill (Stictospiza formosa) Source Butler, Arthur Gardiner. Foreign finches in captivity. Hull and London: Brumby and Clarke, limited,1889 (2nd edition). This image comes from the Biodiversity Heritage Library, and is available online at biodiversitylibrary.org/page/17195895 Rights Public domain Art of Life schema elements required in Red Element Definition Examples Repea t Agents person or corporate entity involved in <vra:agent> Y the creation, design, production, or <vra:name type="personal" vocab="LCNAF" refid="89015596> publication of a visual resource. Curtis,John</vra:name> <vra:dates type="life"> <vra:earliestDate>1791</vra:earliestDate> <vra:latestDate>1862</vra:latestDate> </vra:dates> <vra:role vocab="AAT" refid="300025574">publisher</vra:role> </vra:agent> Copyright The copyright status of the visual <vra:rights refid=”http://creativecommons.org/licenses/by- N resource. nc/2.0/deed.en”>Creative Commons Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) </vra:rights> Date Date or range of dates associated with <vra:date type="creation"> Y the creation or publication of the visual <vra:earliestDate>1945</vra:earliestDate> resource. <vra:latestDate>1955</vra:latestDate> </vra:date> Description A free-text note about content of the <vra:description>This illustration shows a scale, coloured illustration Y image, including comments, description, of Sepsis annulipes (now known as Encita annulipes) beside the or interpretation, that gives additional Trifolium ochroleucum plant. Several dissections from Sepsis information not recorded in other cylindrica Fab. (all these details are provided on the next page of this categories. book and the subsequent page).</vra:description> Inscriptions All marks, caption, or written words <vra:inscription> Y added to the object at the time of <vra:position>bottom</vra:position> production or in its subsequent history, <vra:text>Radula of L. souleyetianum on a more including signatures, dates, dedications, reduced scale</vra:text> texts, and colophons, as well as marks, </vra:inscription> such as the stamps of silversmiths, publishers, or printers. Source A citation for the book, journal or <vra:source><vra:name type=”book”>Butler, Arthur Gardiner. N resource that hosts the visual resource Foreign finches in captivity. HullBrumby and Clarke, limited,1889 (2nd edition). </vra:name> <vra:refid type=”URI”>http://biodiversitylibrary.org/page/17195895</vra:refid> </vra:source> Subject Terms or phrases that describe, identify, <vra:subject><vra:term type=”personalName”>Carl Y or interpret the visual resource. Linnaeus</vra:term></vra:subject> <dwc:scientificName>Plant: Picea abies</dwc:scientificName> <dwc:acceptedName>Plant: Picea abies</dwc:acceptedName> <dwc:vernacularName>Plant: Norway spruce<dwc:vernacularName> We welcome your feedback on the schema! http://tinyurl.com/9hm7nsb Title The title or identifying phrase given to an <vra:title xml:lang=”la”>Sepsis annulipes</vra:title> Y Image <vra:title type=“alternate”>Orangutan</vra:title>
  • 13.
  • 14.
  • 15.
  • 16. Where are we? • Scientific Name Extraction – Improved algorithm (Thanks uBio!) • Articles – Extended BHL data model to store article metadata – Content and Process to harvest data from BioStor in place • Create user interfaces for adding article metadata and associated files – Functional requirements defined – Process flow for adding article metadata and associated files – Implement UI changes • Change BHL UI to accommodate article search • Change BHL UI to accommodate article display (TOC)
  • 17. Scientific Name Extraction • TaxonFinder algorithm in production since 2008 – More than 100 million candidate name strings – More than 1.5 million unique, verified names – Available through UI, APIs, Data Exports & Internet Archive • New collaboration with Global Names – Improved algorithm, better precision & recall – More data with TaxonFinder and Neti Neti!
  • 18. Taxon Names BEFORE Name Instances 101,591,803 101,288,804 Unique Names 7,498,554 7,464,924 Verified Names 1,905,507 1,902,803 EOL Names 63,130,350 62,963,582 EOL Pages 13,579,868 13,532,684 AFTER Name Instances 151,222,182 150,066,425 Unique Names 29,246,382 29,091,767 Verified Names 10,153,165 10,109,540 EOL Names 87,791,695 87,135,089 EOL Pages 15,466,713 15,342,867
  • 19. Part-level metadata • Disambiguating and locating structural components in the corpus • Done by automated and crowdsourced means – Thanks Rod Page! Welcome others! • Greatly increases semantic value of the dataset • Addressing important – makes data addressable and thus linkable
  • 20. Articles in the BHL UI
  • 23. Support citation reconciliation . .Linneaus, C. Species Plantarum, vol. 2 p. 971. 1753 .Linné, Carl von. Sp. Pl. Vol. 2 Page 971. 1753 . Caroli Linnaei, Species Plantarum exhibentes plantas rite cognitas, ad genera .relatas, cum Differentis Specificis, Nominibus Trivialibus,2:971. 1753 Selectis, Locis Natalibus, secundum SYSTEMA SEXUALE digestas.. Synonymis .L. Sp. Pl. 2: 971. 1753 . Zea mays
  • 25. What we’d like to do http://biodivlib.wikispaces.com/BHL+and+Gaming ^Challenges framed as games • Improve OCR • Rekeying Tables of Contents • Researching candidate Scientific Names • Image identification & extraction – http://biodivlib.wikispaces.com/Art+of+Life – Currently funded by NEH
  • 26. 2007 Name Finding Study >35% OCR error rate for names only Of the 3,003 names, 1,056 were incorrectly transcribed by OCR. Top OCR errors 1 Insert Space 8 n->v 35.16% 2 Omit Space 9 l->i 3 e->c 10 r->i 4 u->I 11 u->ii 5 u->n 12 h->l Wei, et al. An Evaluation of Taxonomic Name Recognition (TNR) in the 6 i->l 13 h->ii Biodiversity Heritage Library. Proceedings of TDWG. 2008. http://www.tdwg.org/proceedings/article/view/380 7 c->e 14 e->o
  • 27. Abbild ungen und Beschreibungen der Fische Syriens, nebst einer neuen Classification und Characteristik sämmtlicher Gattungen der i JOH. JAKOB HECKEL, Inipectoi am k. k. Hof-Natur.-iUenkabinete in Wien, mehr, yelelirt. UeHtllMeii. MIfglivd. STUTTGART. E. Schweizerbart' sehe Verlagshandlung, 1843.
  • 28. Older material • Great deal of material is pre-1923 • Irregular fonts – blackletter • Multiple languages on same page – English text with Latin scientific names • Changes in geographic names • Changes in scientific names
  • 29. *E.xvi�c�piteI von c. cXx.WptdvonfnrWmn bu�fbe;bcn.5 am cix bIa � S &3rn~ 41X a�m cv(f b1air�'o�et ert oiensr �; �', :�hlrfc�c wa ff�4am.diug bist a 6aiw~s ff oJrJtwt nof bL4ecImt& blfafra mem b t wag `wr 4 cn wiu 4 e8t5m.ed bvUratflb ck wuo, ma144'*4I bttE5rmbebt =rt3'kn am4ra tif vrmr Waff C * t6rmnli an `tn�ciblatGteaM w ?ffoaifrn w4wmeu nu weib e , wpiteI voE5teiri ct c ober gtUcr cit cm` 91 cLi biar J ' >bSciatl�Oiff ;Bruet wacfttc n qmcx b1a bl: bt5c lttmtt bb9 lkr w.llr#e iti ncn xoa ff cu :r trtuft *e t � B Rn "� trv W1Rt' ?Cm c blas waIwutr Ober �ci ti 1V Ces ' wt gbtiemwwajfu tpctt, afferain 9 c: b�titbfof �r f eran m rs bra wlg auig4;f aer�m *mc vrt blatcabtfm wfru an'deg~m rt blas Iaum bwWt� run f ncmai b14ianf tJobrrfan ebrut4net vnber Brwt Ober awawi*m.crriii btafwfm uww c on$ 'it ttu wttkc 5,10 $ m~C fca trc* cx u W�e�&mcyfbq4 Mabtt mmw rc a iiu bc Jcn ncI.end.*, blat s. a u:�rprd3 rw4ftf wm c ii,+ ttCC tn wa frr9fr orfab fcfbt enb c optiti bt -r9 ceDa ttDcn i34M sn Sem i
  • 30. Expanding scope • Manuscripts, field notebooks –mostly handwritten, often with drawings • Global expansion means dealing with non- Western script systems and a whole new set of OCR problems – Arabic materials from Bibliotheca Alexandria in Egypt
  • 33. OCR Improvements • Transcription • Purposeful Gaming • Crowdsource Markup
  • 34. Transcribe Bentham • A collaboration of the University of London Computer Centre, UCL Library Services and UCL Learning and Media Services with consultation from the UCL Centre for Digital Humanities • Volunteer users can log-in and transcribe previously unstudied and unpublished manuscripts from the Bentham Papers collection in UCL Library's Special Collections in the Transcription Desk. • Since launch, volunteers from around the world have transcribed several thousand Bentham manuscripts to an extremely high standard. • Results and findings: http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html
  • 35. Transcribe Bentham • Who were the volunteers? • http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html
  • 36. Transcribe Bentham • Age ranges • http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html
  • 38. Purposeful Gaming Space Climate Humanities Nature Biology
  • 39. Purposeful Gaming DIGITALKOOT • Joint project run by the National Library of Finland and Microtask to index the library's enormous archives so that they are searchable on the Internet for easier access to the Finnish cultural heritage. • Launched on Feb 8 2011, nearly 110 000 participants completed over 8 million word fixing tasks by Nov 29 2012 • DigiTalkoot enabled volunteers to participate in this fixing work by playing games.
  • 40. Purposeful Gaming DIGITALKOOT • Joint project run by the National Library of Finland and Microtask to index the library's enormous archives so that they are searchable on the Internet for easier access to the Finnish cultural heritage. • Launched on Feb 8 2011, nearly 110 000 participants completed over 8 million word fixing tasks by Nov 29 2012 • DigiTalkoot enabled volunteers to participate in this fixing work by playing games.
  • 41. Purposeful Gaming DIGITALKOOT • Joint project run by the National Library of Finland and Microtask to index the library's enormous archives so that they are searchable on the Internet for easier access to the Finnish cultural heritage. • Launched on Feb 8 2011, nearly 110 000 participants completed over 8 million word fixing tasks by Nov 29 2012 • DigiTalkoot enabled volunteers to participate in this fixing work by playing games.
  • 42. OCR Improvements German text interpreted by the OCR process as: “unb auf ben ©elnrgen be6 fublic{)en”
  • 43. OCR Improvements Transcription Transcription IA OCR OCR 2 1 2 1 unb und und und Ok 2 den ben den den Ok 3 ©elnrgen ©ebirgen Bebirgen Gebirgen X 4 be6 des de5 des Chk 5 fublic{)en fublichen Füdlichen Südlichen X 6 £)eittfc{)(anb6 Deutfchlanbs Deutfchlands Deutschlands X Different resulting texts from parsing the phrase: “und auf den Gebirgen des südlichen Deutschlands” (“and on the mountains of southern Germany”)
  • 44. Crowdsource Markup Display text Species Profile Model category General/summary TaxonBiology Geographic range Distribution Habitat Habitat Food sources and feeding behavior TrophicStrategy Physical description (general) Description Physical description (detailed morphology) DiagnosticDescription
  • 45. Thank you William Ulate Global BHL Project Manager / Technical Director Missouri Botanical Garden william.ulate@mobot.org Skype: william_ulate_r

Notas del editor

  1. For the meeting on Wednesday on legacy literature, we would like to ask you to give a brief (5-10min) outline of what your plans are with BHL, and especially your move into content. This would be helpful for a more informed following discussion.
  2. ExtensiveAiming for a critical mass of biodiversity literatureGlobalOriginating in the US and UK, BHL now has nodes in Europe, China, Australia, Brazil, Egypt, and AfricaOpen Data is freely available for viewing, downloading, and re-use
  3. Title:The Art of Life Schema: describing and providing access to natural history illustrations form the Biodiversity Heritage Library (BHL) Authors:William Ulate (Missouri Botanical Garden): William.Ulate@mobot.orgTrish Rose-Sandler (Missouri Botanical Garden); trish.rose-sandler@mobot.orgGaurav Vaidya (University of Colorado Boulder): gaurav@ggvaidya.comRobert Guralnick (University of Colorado): robgur@gmail.com 
  4. Natural history illustrations from the Biodiversity Heritage Library seem to leap across boundaries while being catalogued, emerging simultaneously as history, science and art. As historic documents, they paint a vibrant picture of the first time European scientists and explorers encountered exotic plants and animals in the 17th and 18th centuries, drawn by some of the finest illustrators of the world.   Also, as biodiversity records, they provide valuable documentation of when, where, and who first observed a species, and some of them are our only surviving representations of extinct species.  Finally, as aesthetic elements, they communicate human emotions and other values toward nature by exemplifying the mimesis in art and providing a vivid expression of human creativity and imagination.This year, the Missouri Botanical Garden received a grant from the National Endowment for the Humanities (NEH) to support a project called The Art of Life: Data Mining and Crowdsourcing the Identification and Description of Natural History Illustrations from the Biodiversity Heritage Library (BHL).
  5. Initially, software tools will help discover visual resources (illustrations, maps, and other works of art) in BHL’s corpus, and basic metadata will be recorded. These resources will then be shared on multiple image delivery systems, including Flickr and the Wikimedia Commons, where citizen scientists will be able to add further annotations. Because of the wide diversity of information that a citizen scientist can add to any image, a comprehensive yet manageable schema is needed to help standardize inputs and enable synchronization and seamless import back into the BHL databases.
  6. The authors have worked on the development of an effective metadata schema for such natural history illustrations, but instead of developing yet another schema from scratch, they have identified existing schemas that meet the needs of the project and integrated a solution that combines the best in biodiversity informatics and image curation standards and best practices. This schema needs to support three main objectives:  (1) to enable the discovery, description and use of the identified images by artists, biologists, humanities scholars, and educators;  (2) to make BHL’s metadata and images available to other platforms; and  (3) to import crowdsourced metadata generated in other platforms back into BHL..A preliminary schema version will be presented to the TDWG community, explaining how we addressed metadata challenges specific to biodiversity data, in order to obtain feedback on the final version.
  7. [Define functional requirements]Experience with Citebank has resulted in many lessons learned about working with diverse publication types; data formats; and contributors with varying levels of technical competencies. Those lessons were incorporated into a functional requirements document that is being used to inform development of the BHL data model.============Where are we going?- Work with ZooBank &amp; Index Fungorum to integrate BHL’s existing OpenURL resolver- Authoritative list of titles in common use for nomenclatural acts &amp;resolution/reconciliation tools (“TL3”)- Harvest relevant content from Mendeley using taxonomical intelligence tools- Define services and interfaces for “dirty bucket” and “clean bucket” data storage &amp; reuse to accommodate GNUB data model- Interoperate with Wilden &amp; Shorthouse work on citation parsing tools &amp; services
  8. Mention Neti Neti
  9. On legacy literature, what your plans are with BHL, and especially your move into content?
  10. We ask the user to provide metadata if they’re generating a chapter or book title
  11. [Diagram of citations reconciliation]In support of this, BHL will provide a key functional component to the GNA - that of reconciliation services for citations. Once reconciled, citations can be linked either to scanned page images in the BHL, or to PDFs uploaded by users. If neither exists, citations can point to other digital representations online.
  12. [Citebank stats]2. Enabled automated importing into Citebank via OAI-PMH. This feature is used on a daily basis to update content from digital libraries (BHL), institutional repositories (Smithsonian &amp; AMNH DSpace), government agencies (SciELO), and publishers (Pensoft).3. Enabled batch-loading of content from learned societies and publishers who don&apos;t have a publishing platform. Content is now available for the Journal of East Africa Natural History Association, American Mosquito Control Association, and others. A complete list of data providers is available at http://www.citebank.org/about/content_providers.4. Enabled import and crosslinking of other online digital libraries without APIs, such as Biblioteca Digital del Real JardínBotánico de Madrid / CSIC and the Organization for Tropical Studies Article Repository (OTS).=====5. Established a collection at Internet Archive for contributed Citebank materials at http://www.archive.org/details/citebank. Each file uploaded to Citebank is copied to IA for serving, plus IA produces a wide variety of derivatives from uploaded files, including OCR, which are then available via open APIs for data mining and other activities. An example: http://www.archive.org/details/cbarchive_138136_projectcoralfishlooksatpalau1846 and its record in Citebank: http://citebank.org/node/1381366. Enabled sign-on and upload of bibliographic metadata and associated files for individual contributors
  13. BHL has more than 300,000 pages tagged as a &quot;Table of Contents&quot; (and more that aren&apos;t tagged), which lists articles, chapters, and other structural boundaries in BHL scanned books. We&apos;d like to have those pages keyed into a select number of fields so that we can index &amp; find BHL content by article title, author of article, and to provide a more convenient way of browsing BHL texts online.BHL has OCR for each of its scanned pages. We send that OCR to the TaxonFinder algorithm, which identifies strings that &quot;look like&quot; scientific names, then compares them to NameBank, a list of known names, which is incomplete (there does not exist a comprehensive list of all the world&apos;s scientific names). We have more than 90 million strings (as of Feb 2012) that have been identified by the algorithm as a possible scientific name, and 76 million of those candidate strings have been matched to a known name. The remaining 14 million candidates are where all the intriguing stuff resides - is it a name that&apos;s not in NameBank, is it a misspelling or misOCRed string that matches to a known name, or even is it a name that&apos;s only ever appeared in the published record once &amp; been lost to science ever since?ontained within BHL’s digitized texts are millions of visual resources (plates, illustrations, figures, maps, and other images), many of which were produced by the finest botanical and zoological illustrators in the world, including the likes of John James Audubon, Georg Dionysus Ehret, and Pierre Redouté. These images are currently minimally described at a structural page level, enabling citation resolvers and human users to navigate to illustrations by page numbers, but the images lack sufficient descriptive metadata to enable dynamic filtering and inquiry based on factors like image type, color content, subject matter, or even names of the organisms depicted in the images.
  14. You can see from this slide that accuracy goes way down when processing older blackletter-type typefaces.
  15. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  16. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  17. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  18. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  19. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  20. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  21. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  22. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  23. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  24. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  25. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  26. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?
  27. On legacy literature, what your plans are with BHL, and especially your move into content?GrowthMore Global ContentTaxon NamesArticle MetadataMicrocitations and COiNSAPIZoobankOCR improvements through GamingCrowdsource MarkupWFO?