SlideShare una empresa de Scribd logo
1 de 83
Descargar para leer sin conexión
Crowdsourcing, Family History,
and Long Tails for Libraries
!
http://slidesha.re/1qzB8vv
Frederick Zarndt
frederick@frederickzarndt.com
Secretary, IFLA Newspapers Section
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.
Crowdsourcing is the practice of obtaining
needed services, ideas, or content by
soliciting contributions from a large group of
people, and especially from an online
community, rather than from traditional
employees or suppliers. ... [It] is different
from ordinary outsourcing since it is a task or
problem that is outsourced to an undefined
public rather than a specific, named group.
Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia,
http://en.wikipedia.org/wiki/Crowdsourcing (accessed March 17, 2013)
“crowdsourcing”
!
was coined by Jeff Howe in “The rise of
crowdsourcing” published in Wired
magazine June 2006.
web trends for
“crowdsourcing”
Jan-2006 to Jun-2014
• On the date of publication of Jeff Howe’s Wired
magazine article, 1-Jun-2007, Wikipedia did not have
an entry (list) of crowdsourcing projects*.
• On 25-Jan-2010 Wikipedia’s list of crowdsourcing
projects had 35 entries*.
• On 17-Mar -2013 Wikipedia’s list of crowdsourcing
projects had 158 entries+.
* From Internet Archives’ Wayback Machine.
+ Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia,
https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (accessed March 17, 2013).
Amazon Mechanical Turk was launched Nov 2005
Alexa global / country rank of Amazon Mechanical Turk (June 2014): 6,465 / 2,046
crowdsourcing
crowdsourcing
Each day 200,000,000 recaptcha’s are solved by humans around the world
Galaxy Zoo was 1st launched July 2007
Alexa global / country traffic rank of Galaxy Zoo (June 2014): 606,971 / 100,298
citizen science
Kickstarter was 1st launched in 2009
Alexa global / country traffic rank of Kickstarter (June 2014): 782 / 326
60,000+ projects successfully funded with more than USD $1,000,000,000
crowd funding
crowd collaboration
Family Search Indexing was 1st launched (beta) 2004
Alexa global / country traffic rank of FamilySearch (June 2014): 4,385 / 1,321
Project Gutenberg was 1st launched Dec 1971
Alexa global / country traffic rank of Project Gutenberg (June 2014): 6,615 / 4,066
Alexa global / country traffic rank of National Library of Finland
2,535,854 (31-Oct-2012) / 199 (2-Apr-2012)
so what? why should
a library care about
crowdsourcing?
Time Life Pictures
Getty Images
“user engagement refers to the quality of
the user experience that emphasizes the
positive aspects of the interaction with a web
application, and in particular the phenomena
associated with wanting to use that
web application longer and frequently”
Elad Yom-Tov, Mounia Lalmas, Georges Dupret, Ricardo Baeza-Yates, Pinard Donmez, and
Janette Lehmann. 2012. The effect of links on networked user engagement. In Proceedings of the
21st international conference companion on World Wide Web (WWW '12 Companion). ACM, New
York, NY, USA, 641-642. DOI=10.1145/2187980.2188167 http://doi.acm.org/
10.1145/2187980.2188167
“in addition to increasing search accuracy or
lowering the costs of document transcription,
crowdsourcing is the single greatest advancement
in getting people using and interacting with library
collections”
Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/
crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
“While [the National Library of Australia’s]
Trove offers a range of user engagement
features, and use of each of these features
continues to grow, it is Trove’s newspaper
text correction features that have attracted
the highest level of user engagement.”
Marie-Louise Ayres. 2013. ‘Singing for their supper’: Trove, Australian newspapers, and the
crowd. Paper presented at IFLA WLIC 2013, Singapore. Accessed June 2014 IFLA Library http://
library.ifla.org/id/eprint/245.
Alexa global / country rank of National Library of Australia (June 2014): 10,964 / 249
Trove gets ~78% of all National Library web traffic.
National Library of Australia
• Online since 2008
• More than 13,000,000 / 127,437,967 newspaper
pages / articles (May 2014)
• Top text corrector 2,625,205+ lines (May 2014)
• 2,682,119 lines corrected each month (average for
1st 5 months 2014)
• 129,046,297 lines corrected as of May 2014, up from
66,527,535 lines corrected May 2012
• 129,300 / 8,218 registered / active users (May 2014)
1
1,000
1,000,000
AustralianNewspapers
Books
Picturesandphotos
JournalArticles
Musicsoundandvideo
Maps
Archivedwebsites
Diaries,letters,archives
Peopleandorganisations
unique visits page views
2013 monthly averages
0
1,500,000
3,000,000
4,500,000
6,000,000
AustralianNewspapers
Books
Picturesandphotos
JournalArticles
Musicsoundandvideo
Maps
Archivedwebsites
Diaries,letters,archives
Peopleandorganisations
unique visits page views
2013 monthly averages
California Digital
Newspaper Collection
• CDNC began digitizing newspapers in 2005 as
part of NDNP
• Newspapers digitized to article-level as well as
to page-level as required by NDNP
• Hosted on Veridian beginning 2009
• Collection size 61,412 issues, 545,955 pages,
6,364,529 articles (May 2014)
OCR text correction
• OCR text correction added Aug 2011
• Corrections are done line by line
• 2246 registered / 1,266 active users (Jun 2014)
• 2,656,497+ lines of text corrected (Jun 2014)
• ~2% of the collection corrected, 98% to go!
• Top corrector 717,855 lines > 2x 2nd corrector
Cambridge Public Library
Historic Newspaper Collection
• Cambridge Historic Newspapers online since Jan 2012.
• Cambridge Massachusetts Public Library digitized local
newspapers (http://cambridge.dlconsulting.com/)
• Newspapers digitized to article-level
• Collection size 6,346 issues, 59,070 pages, 669,406
articles (May 2014)
• Collection includes 13,099 obituary cards
0%10%
90%
Historic Cambridge Newspapers

(1846-1923)
Cambridge City Directories

(1848 - 1910)
Cambridge Chronicle

(August 2005 to present)
2013 monthly averages
why correct text?
here’s why…
Image copyright
Karl R Lilliendahl
Photographer
Deaths. lln»rieff, Esq. of <c .. Qn.
Sunday, the till. greatly Drandrellt, of
Orms4irJi.- ~ ; ;✓ ' • * On ijfr r inn
l j j j i l F i i j ' 1 1 f H a v o d i v y d ,
Carnarvonshire, S ; **" *- ' « ' March
Oxford, F. Tfovmeud, Uerald. » • V .
•On Tncsdav last, Mr. Charles.
IWilinson, this 8 ; had vf thesis#,, a
week ago, which tcrminate<i'iu his
death. . / ' ■ O'i Sunday, dJst nit. at.
AsbtCnvHall, mar Lancaster,
Mr.,Geo. Worn ick, many years
house'steward hit late Once The
Hamilton and Brandon. He locked
himself h»oWn'r«wte<: soon. twelve
o'clock" that dny, and fii»-d a loaded
pistol "through Ins bead, 1 which
instantaneously killed him. Coronet's
Verdict, shot himself in a temporary fit of
Friday week,
raw OCR text
Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3.
newspaper image
user lines corrected*
1 646,873
2 236,323
3 111,749
4 100,749
5 99,999
6 87,720
7 82,768
8 63,786
9 57,441
10 56,458
lines corrected* user
2,455,338 1
1,822,422 2
1,448,370 3
1,265,217 4
1,174,835 5
1,069,669 6
1,058,179 7
1,020,462 8
949,694 9
886,315 10
*numbers from Mar 2014
User
rank
Lines corrected
Jun 2014
1 717,855
2 271,972
3 120,220
4 113,787
5 109,999
6 99,999
7 94,742
8 65,637
9 63,786
10 59,724
Lines corrected
Oct 2012
242,965
87,515
31,318
24,144
23,184
19,240
18,898
16,875
11,784
9,762
uncorrected OCR accuracy by
newspaper title
Title
OCR character
accuracy
~OCR word
accuracy
PRP Pacific Rural Press 1871 - 1922 92.6% 68.1%
SFC San Francisco Call 1890 - 1913 92.6% 68.1%
LAH Los Angeles Herald 1873 - 1910 88.7% 54.9%
LH Livermore Herald 1877 - 1899 88.6% 54.6%
DAC Daily Alta California 1841 - 1891 88.2% 53.4%
CFJ California Farmer and Journal
of Useful Sciences 1855 - 1880
86.5% 48.4%
SN Sausalito News 1885 - 1922 70.4% 17.3%
*Word accuracy assumes average word length is 5 characters
corrected OCR accuracy by
newspaper title
Title
OCR character
accuracy
Corrected
accuracy
PRP Pacific Rural Press 1871 - 1922 92.6% 99.3%
SFC San Francisco Call 1890 - 1913 92.6% 99.6%
LAH Los Angeles Herald 1873 - 1910 88.7% 99.1%
LH Livermore Herald 1877 - 1899 88.6% 99.9%
DAC Daily Alta California 1841 - 1891 88.2% 99.9%
CFJ California Farmer and Journal
of Useful Sciences 1855 - 1880
86.5% 99.8%
SN Sausalito News 1885 - 1922 70.4% 100.0%
Title
OCR character
accuracy
~OCR word
accuracy
Corrected
accuracy
~Corrected
word accuracy
PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5%
SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0%
LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6%
LH 1877 - 1899 88.6% 54.6% 99.9% 99.5%
DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5%
CF 1855 - 1880 86.5% 48.4% 98.3% 91.8%
SN 1885 - 1922 70.4% 17.3% 100.0% 100.0%
*Word accuracy assumes average word length is 5 characters
corrected OCR accuracy by
newspaper title
correction accuracy
by user
User
Average OCR
accuracy
Correction
accuracy
A 70.4% 100.0%
B 87.1% 99.5%
C 95.4% 99.5%
D 86.5% 98.3%
E 95.3% 100.0%
F 91.0% 100.0%
G 91.0% 99.8%
H 90.5% 99.0%
I 96.6% 99.8%
J 94.8% 100.0%
K 86.8% 99.3%
that’s interesting, but
who wants to correct
OCR text? it’s
Graphic from Kaufmann et al. “More than fun and money. Worker
Motivation in Crowdsourcing – A Study on Mechanical Turk.”
Motivation
Motivation
Genealogists and family
historians
• National Library of Australia’s 2012 Trove
status report showed that ~50% of Trove users
are family historians
• National Library of New Zealand survey found
that ~50% of PapersPast users are genealogists
PAPERSPAST
• 72% visit UDN for genealogical research
• 20% visit for various other types of historical research
• 87% find obituaries useful
• Over 60% find the other genealogical article types (birth and
wedding announcements) useful
• Only 7% do not find genealogical articles useful
• Many are writing family histories and consequently also look
for general background information
• Older content is much more highly valued than more recent
content (see more detailed explanation that follows)
• 44% find smaller, rural papers more useful, while only 15%
find larger, metropolitan papers more useful
Motivation
2012 user survey
John Herbert and Randy Olsen. Small town papers: still delivering the news.
WLIC 2012, Helsinki Finland. http://conference.ifla.org/past-wlic/2012/119-
herbert-en.pdf
• CDNC and Cambridge Public Library
published a user survey in Mar 2013
• 604 / 32 responses
• Surveys are (mostly) identical except
for organization name
Motivation
2013 user survey
User demographic
Genealogists and family historians
X
User demographic
No spring chickens
User demographic
Reasons for use
User demographic
Types of information
• “I enjoy the correction - it’s a great way to learn more
about past history and things of interest whilst doing a
‘service to the community’ by correcting text for the benefit
of others.”
• “I have recently retired from IT and thought that I could be
of some assistance to the project. It benefits me and other
people. It helps with family research.”
Rose Holley. March 2009. Many Hands Make Light Work. National Library of Australia.
Accessed June 2014 http://www.nla.gov.au/ndp/project_details/documents/
ANDP_ManyHands.pdf.
Motivation
Trove users’ report
“The ‘typical’ Trove user is a very well educated,
highly paid, English speaking employed woman
aged fifty or over, with a significant or primary
interest in family or local history, who visits the
Trove website very frequently. Users of Trove
newspapers are older than the average Trove
user; only 13% of newspaper users are under 40
years or age.”
Marie-Louise Ayres. ‘Singing for their supper’: Trove, Australian newspapers, and the
crowd. WLIC 2013,Singapore. http://library.ifla.org/245/1/153-ayres-en.pdf.
Motivation
Engaged users: Who are they?
“Many of Trove’s user engagement features are
very popular. More than 100,000 users have
registered to date, and more than 2 million tags
and nearly 60,000 comments had been added…
[Trove] text correction, however, stands head and
shoulders above any other user engagement
features.”
Motivation
Engaged users: What do they do?
Marie-Louise Ayres. ‘Singing for their supper’: Trove, Australian newspapers, and the
crowd. WLIC 2013,Singapore. http://library.ifla.org/245/1/153-ayres-en.pdf.
“when someone transcribes a document, they are
actually better fulfilling the mission of a cultural
heritage organization than someone who simply stops
by to flip through the pages”
Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/
crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
Motivation
Engaged users
“I am interested in all kinds of history. I have pursued genealogy
as a hobby for many years. I correct text at CDNC because I see
it as a constructive way to contribute to a worthwhile project.
Because I am interested in history, I enjoy it.”
Wesley, California
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
!
“I only correct the text on articles of local interest - nothing at
state, national or international level, no advertisements, etc. 
The objective is to be able to help researchers to locate local
people, places, organizations and events using the on-line
search at CDNC.  I correct local news & gossip, personal items,
real estate transactions, superior court proceedings, county and
local board of supervisors meetings, obituaries, birth notices,
marriages, yachting news, etc.”
Ann, California
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
“I am correcting text for the Coronado Tent City Program for
1903.  It is important to correct any problems with personal
names and other information so that researchers will be able
to search by keyword and be assured of retrieving desired
results. ... type fonts cause a great deal of difficulty in
digitizing the text and can cause problems for searchers.  Also,
many of the guests' names at Tent City and Hotel Del
Coronado were taken from the registration books and reported
in the Program.  This led to many problems in spelling of last
names and the editors were not careful to be consistent in the
spellings.  This Program is an important resource since it
provides an excellent picture of daily life in Tent City and
captures much of the history of Coronado itself.”
Gene, California
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
“I have always been interested in history, especially the
development of the American West, and nothing brings it alive
better than newspapers of the time. I believe them to be an
invaluable source of knowledge for us and future generations.”
David, United Kingdom
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
CDNC is an excellent source of information matching my
personal interest in such topics as sea history, development
of shipbuilding, clippers and other ships etc. ...
Unfortunately, the quality of text ... is rather poor I’m
afraid. This is why I started to do all corrections necessary
for myself ... and to leave the corrected text for use of
others. .... I am not doing this very regularly as this is just
my hobby and pleasure.
Jerzey, Poland
Personal communications with CDNC text correctors.
Motivation
CDNC users’ report
As an amateur historical researcher my time for research is very
limited.  Making time to travel to archives, libraries, and historical
societies does not happen as often as I would like.  The Cambridge
Public Library’s online newspaper collection has been an invaluable
resource and it is fun.  I am very grateful for all the help I have received
over the years from so many research organizations. Correcting text
has several benefits.  It makes it much more likely that I will find a
story if I decide to search for it in the future.  It is a way of saying
‘thank you’ to the Cambridge Library for having such a great resource
available and maybe I can make the next person’s research a little
easier. It is my own little historical preservation project.
Cambridge Historical Newspapers Text Corrector
Personal communications with CDNC text correctors.
Motivation
Cambridge users’ report
so old, boring, easily
entertained people correct
text. convince me there are
real benefits.
Economic
benefits
Public domain photo courtesy of US Navy
$
Economics
Financial value of outsourced OCR text correction
for newspapers?
The Assumptions
• 25 to 50 characters per line in a newspaper column:
Assume 40 characters per line (CDNC sample average)
• Outsourced text transcription or correction costs USD
$0.35 to $1.20 per 1000 characters: Assume $0.50
per 1000 characters
$$ 2,656,497 lines x 40 characters per line x
1/1000 x $0.50 = $53,130
$ 129,046,297 lines x 40 characters per line x
1/1000 x $0.50 = $2,580,926
Economics
$Financial value of in-house OCR text
correction?
The Assumptions
• Correction takes 15 seconds per line
• Cost is hourly wage plus benefits of lowest level
employee, $10 for CDNC, $41.88* for Australia
AUD $40.38 = USD $41.88 is the actual labor value assumed by the National Library of Australia
to calculate avoided costs due to crowdsourced OCR text correction in its 2012 Trove Status
Report.
Economics
$$ 2,656,497 lines x 15 seconds per line x 1/3600
hrs per second x $10.00 per hr = $110,687
$ 129,046,297 lines x 15 seconds per line x
1/3600 hrs per second x $41.88 per hr =
$22,518,579
Economics
Accuracy
“His Accuracy Depends on Ours!"
Office for Emergency Management. Office of
War Information. Domestic Operations
Branch. Bureau of Special Services. [Photo
held at US National Archives and Records
Administration]
Accuracy
• Edwin Kiljin (Koninklijke Bibliotheek the Netherlands)
reports raw OCR character accuracies of 68% for early 20th
century newspapers
• Rose Holley (National Library of Australia) reports raw
OCR character accuracy varied from 71% to 98% on a
sample Trove digitized newspapers
Rose Holley. How good can it get? Analysing and improving OCR accuracy in large scale historic
newspaper digitisation programs. D-Lib Magazine. Mar/Apr 2009. Accessed June 2014 http://
www.dlib.org/dlib/march09/holley/03holley.html.
Edwin Kiljin. The current state-of-art in newspaper digitization. D-Lib Magazine. Jan/Feb 2008. Accessed
June 2014 http://www.dlib.org/dlib/january08/klijn/01klijn.html.
Public domain graphic courtesy of Wikimedia Commons.
Accuracy
MAPPING TEXTS* assesses digitization quality of digital
newspapers by comparing the number of words recognized
to the total number of words scanned
* Mapping texts is a collaboration between the University of North Texas and Stanford University aimed at experimenting
with new methods for finding and analyzing meaningful patterns embedded in massive collections of digital newspapers.
How does low text accuracy affect search recall?
The Facts
• Average uncorrected OCR character accuracy of the
CDNC sample data is ~89%
• Average length of an English word is 5 characters
• Average word accuracy is 89% x 89% x 89% x 89% x 89%
= 55.8% - round up to 60% or 6 out of 10 words correct
Accuracy
ARNDT
ARNDT
ARNDT
ARNDT ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
Search recall no text correction
instances of “ARNDT” found instances of “ARNDT” not found
Accuracy
The Facts
• Average corrected character accuracy of the CDNC
sample data is ~99.4%
• Average word accuracy of CDNC corrected text is 99.4%
x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%
ARNDT
ARNDT
ARNDT
ARNDT ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
ARNDT
instances of “ARNDT” found instances of “ARNDT” not found
Search recall with text correction
A search for “Arndt” at Chronicling America gives
10,267 results*
• If Chronicling America text accuracy is 55.8% (same as
uncorrected CDNC sample), then 8,133 instances of
“Arndt” were not found
• If text accuracy is 97.0%, then 317 instances of “Arndt”
were not found
Accuracy
* Search performed 31 Oct 2012
Accuracy
Suppose the word/name is longer than 5
characters?
The Facts
• Assume that average uncorrected / corrected OCR
character accuracy is ~89% / ~99% same as CDNC.
Name Name length Raw text accuracy Corrected text accuracy
Eklund 6 49.7% 94.2%
Kennedy 7 44.2% 93.25
Espinosa 8 39.4% 92.3%
Bonaparte 9 35% 91.4%
Chatterjee 10 31.2% 90.4%
Accuracy
Name
Number of
search results
Missing results with
raw text accuracy
Missing results with
corrected text accuracy
Eklund 2,951 2,987 182
Kennedy 360,723 455,392 26,111
Espinosa 1,918 2,950 160
Bonaparte 44,664 82,947 4,203
Chatterjee 19 42 2
Chronicling America searches done 19-Mar-2013
(6,025,474 pages from 1836 to 1922).
but you left
out long
tails…
Public domain illustration
from "On The Genesis of
Species" by St. George Mivart
the long tail* of crowdsourced
OCR text correction
a probability distribution has a long tail if a larger
share of population rests within its tail than it would
under a normal distribution
!
the most productive users represent a small fraction
of the total user population and ~50% of total
production, or, said a different way, the largest
fraction but individually not quite so productive
users are as important as the most productive users
The phrase “long tail” was popularized by Chris Anderson in the October 2004 Wired
magazine article The Long Tail and by Clay Shirky’s February 2003 essay “Power laws,
web logs, and inequality”.
user lines corrected*
1 646,873
2 236,323
3 111,749
4 100,749
5 99,999
6 87,720
7 82,768
8 63,786
9 57,441
10 56,458
lines corrected* user
2,455,338 1
1,822,422 2
1,448,370 3
1,265,217 4
1,174,835 5
1,069,669 6
1,058,179 7
1,020,462 8
949,694 9
886,315 10
*numbers from Mar 2014
OCR text correction long tails
0
75000
150000
225000
300000
CDNC lines corrected by text corrector
0
750,000
1,500,000
2,250,000
3,000,000
NLA lines corrected by text corrector
top corrector 242,965 top corrector 1,456,906
50%
50%
50%
50%
Future considerations
• How to market / advertise
crowdsourcing?
• How to motivate
crowdsourcers?
• Is authentication / identity of
crowdsourcers an issue?
• How to administer
crowdsourced data?
Photo of Aleister Crowley [Public domain] from Wikimedia
Commons
Conclusions
Conclusion of the Sonata for piano #32, opus 111 by
Ludwig van Beethoven
• Lots of crowdsourcing in cultural heritage
organizations and elsewhere
• Benefits are multi-faceted: Economic, data
accuracy, user engagement, increased web traffic
are we
finished now?
Image copyright
Dan Heller
www.danheller.com
Resources
Public domain photo “A useful instruction for young sailors from the Royal
Hospital School, Greenwich” from the National Maritime Museum.
Correct California newspapers at http://cdnc.ucr.edu
Correct Cambridge MA newspapers http://bit.ly/cambridgepublic
Correct Australian newspapers http://trove.nla.gov.au
Correct Virginia newspapers http://virginiachronicle.com
Try crowdsourcing!
Other resources
Mapping Texts at http://mappingtexts.stanford.edu/
Wragge Labs at http://wraggelabs.com/
Wikipedia list of crowdsourcing projects
https://en.wikipedia.org/wiki/
List_of_crowdsourcing_projects
Wikipedia list of digitized newspapers
http://en.wikipedia.org/wiki/
List_of_online_newspaper_archives
?
Photo held by John Oxley Library, State Library of Queensland. Original from
Courier-mail, Brisbane, Queensland, Australia.
Frederick Zarndt
frederick@frederickzarndt.com
Secretary, IFLA Newspapers Section

Más contenido relacionado

Similar a 20140628 crowdsourcing, family history, and long tails for libraries [ala annual las vegas]

Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesightsuresh sood
 
Ndnp partner meeting
Ndnp partner meetingNdnp partner meeting
Ndnp partner meetingcarriegaxiola
 
The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014
The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014
The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014Andrew Lih
 
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...20130321 Putting the world's cultural heritage online with crowdsourcing [roo...
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...Frederick Zarndt
 
Utilizing citizen science to identify, map and monitor wild brook trout genet...
Utilizing citizen science to identify, map and monitor wild brook trout genet...Utilizing citizen science to identify, map and monitor wild brook trout genet...
Utilizing citizen science to identify, map and monitor wild brook trout genet...Keith G. Tidball
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Duncan Hull
 
Twitter & mobility disruptions
Twitter & mobility disruptionsTwitter & mobility disruptions
Twitter & mobility disruptionsHolly Anne
 
An Overview of Standards for Biodiversity Literature and the State of the BHL
An Overview of Standards for Biodiversity Literature and the State of the BHLAn Overview of Standards for Biodiversity Literature and the State of the BHL
An Overview of Standards for Biodiversity Literature and the State of the BHLMartin Kalfatovic
 
Stories To Tell: The making of our digital nation. Resource list - Projects y...
Stories To Tell: The making of our digital nation. Resource list - Projects y...Stories To Tell: The making of our digital nation. Resource list - Projects y...
Stories To Tell: The making of our digital nation. Resource list - Projects y...Rose Holley
 
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...Frederick Zarndt
 
Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...
Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...
Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...Vermont Digital Newspaper Project
 
Open access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesOpen access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesIryna Kuchma
 
Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013Martin Kalfatovic
 
Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)Martin Kalfatovic
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Trish Rose-Sandler
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryMia
 
Smithsonian Libraries Overview
Smithsonian Libraries OverviewSmithsonian Libraries Overview
Smithsonian Libraries OverviewMartin Kalfatovic
 

Similar a 20140628 crowdsourcing, family history, and long tails for libraries [ala annual las vegas] (20)

OUP Day Journals Brochure_2015 (2)
OUP Day Journals Brochure_2015 (2)OUP Day Journals Brochure_2015 (2)
OUP Day Journals Brochure_2015 (2)
 
Bigdataforesight
BigdataforesightBigdataforesight
Bigdataforesight
 
Finalnews
FinalnewsFinalnews
Finalnews
 
Ndnp partner meeting
Ndnp partner meetingNdnp partner meeting
Ndnp partner meeting
 
The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014
The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014
The Future of Knowledge in the Age of Wikipedia - REMIXNYC 2014
 
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...20130321 Putting the world's cultural heritage online with crowdsourcing [roo...
20130321 Putting the world's cultural heritage online with crowdsourcing [roo...
 
Utilizing citizen science to identify, map and monitor wild brook trout genet...
Utilizing citizen science to identify, map and monitor wild brook trout genet...Utilizing citizen science to identify, map and monitor wild brook trout genet...
Utilizing citizen science to identify, map and monitor wild brook trout genet...
 
Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia Improving the troubled relationship between Scientists and Wikipedia
Improving the troubled relationship between Scientists and Wikipedia
 
Twitter & mobility disruptions
Twitter & mobility disruptionsTwitter & mobility disruptions
Twitter & mobility disruptions
 
Finalnews
FinalnewsFinalnews
Finalnews
 
An Overview of Standards for Biodiversity Literature and the State of the BHL
An Overview of Standards for Biodiversity Literature and the State of the BHLAn Overview of Standards for Biodiversity Literature and the State of the BHL
An Overview of Standards for Biodiversity Literature and the State of the BHL
 
Stories To Tell: The making of our digital nation. Resource list - Projects y...
Stories To Tell: The making of our digital nation. Resource list - Projects y...Stories To Tell: The making of our digital nation. Resource list - Projects y...
Stories To Tell: The making of our digital nation. Resource list - Projects y...
 
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
2013 ifla satellite zarndt et al [crowdsourcing the world's cultural heritage...
 
Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...
Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...
Newspapers in a Global Microcosm: Updates & Activities from the IFLA Newspape...
 
Open access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, librariesOpen access for researchers, policy makers and research managers, libraries
Open access for researchers, policy makers and research managers, libraries
 
Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013Digital & Discovery @ Smithsonian Libraries 2013
Digital & Discovery @ Smithsonian Libraries 2013
 
Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)Digitization in Support of Services @ Smithsonian Libraries (May)
Digitization in Support of Services @ Smithsonian Libraries (May)
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
 
Rethink research, illuminate history with the British Library
Rethink research, illuminate history with the British LibraryRethink research, illuminate history with the British Library
Rethink research, illuminate history with the British Library
 
Smithsonian Libraries Overview
Smithsonian Libraries OverviewSmithsonian Libraries Overview
Smithsonian Libraries Overview
 

Más de Frederick Zarndt

Digitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesDigitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesFrederick Zarndt
 
2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and PracticesFrederick Zarndt
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017Frederick Zarndt
 
Project Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesProject Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesFrederick Zarndt
 
What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]Frederick Zarndt
 
Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...Frederick Zarndt
 
Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Frederick Zarndt
 
What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]Frederick Zarndt
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsFrederick Zarndt
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...Frederick Zarndt
 
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Frederick Zarndt
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]Frederick Zarndt
 
What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...Frederick Zarndt
 
20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...Frederick Zarndt
 
20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]Frederick Zarndt
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...Frederick Zarndt
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...Frederick Zarndt
 

Más de Frederick Zarndt (20)

Digitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum ArchivesDigitization of the Tuol Sleng Genocide Museum Archives
Digitization of the Tuol Sleng Genocide Museum Archives
 
2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices2017 Born Digital Legal Deposit Policies and Practices
2017 Born Digital Legal Deposit Policies and Practices
 
e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017e-Legal Deposit Survey 2017
e-Legal Deposit Survey 2017
 
Project Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin PrinciplesProject Management according to Great Pumpkin Principles
Project Management according to Great Pumpkin Principles
 
What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]What did you say? interculture communication [20160308 phnom penh]
What did you say? interculture communication [20160308 phnom penh]
 
Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...Coronado public library digital newspapers workshop local partnerships [oct 2...
Coronado public library digital newspapers workshop local partnerships [oct 2...
 
Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]Coronado public library digital newspapers workshop [Oct 2016]
Coronado public library digital newspapers workshop [Oct 2016]
 
What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]What did you say? mindful interculture communication [201608 icgse]
What did you say? mindful interculture communication [201608 icgse]
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital News
 
Here Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital NewsHere Today, Gone within a Month: The Fleeting Life of Digital News
Here Today, Gone within a Month: The Fleeting Life of Digital News
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...
 
An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...An international survey of born digital legal deposit policies and practices ...
An international survey of born digital legal deposit policies and practices ...
 
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...Rootstech 2015 finding and using digitized historical newspapers workshop [20...
Rootstech 2015 finding and using digitized historical newspapers workshop [20...
 
20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]20140410 ifla digitization workshop [idlc kuala lumpur]
20140410 ifla digitization workshop [idlc kuala lumpur]
 
What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...What did you say? Intercultural expectations, misunderstandings, and communic...
What did you say? Intercultural expectations, misunderstandings, and communic...
 
20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...20131019 digital collections - if you build them will anyone visit [library 2...
20131019 digital collections - if you build them will anyone visit [library 2...
 
20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]20130903 what did you say? interculture communication [hamburg]
20130903 what did you say? interculture communication [hamburg]
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...
 
201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...201308 wlic standards committee zarndt et al the alto editorial board collabo...
201308 wlic standards committee zarndt et al the alto editorial board collabo...
 
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
2013 ifla satellite zarndt et al [marketing cultural heritage digital collect...
 

Último

Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 

Último (20)

Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 

20140628 crowdsourcing, family history, and long tails for libraries [ala annual las vegas]

  • 1. Crowdsourcing, Family History, and Long Tails for Libraries ! http://slidesha.re/1qzB8vv Frederick Zarndt frederick@frederickzarndt.com Secretary, IFLA Newspapers Section Photo held by John Oxley Library, State Library of Queensland. Original from Courier-mail, Brisbane, Queensland, Australia.
  • 2. Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. ... [It] is different from ordinary outsourcing since it is a task or problem that is outsourced to an undefined public rather than a specific, named group. Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/Crowdsourcing (accessed March 17, 2013)
  • 3. “crowdsourcing” ! was coined by Jeff Howe in “The rise of crowdsourcing” published in Wired magazine June 2006.
  • 5. • On the date of publication of Jeff Howe’s Wired magazine article, 1-Jun-2007, Wikipedia did not have an entry (list) of crowdsourcing projects*. • On 25-Jan-2010 Wikipedia’s list of crowdsourcing projects had 35 entries*. • On 17-Mar -2013 Wikipedia’s list of crowdsourcing projects had 158 entries+. * From Internet Archives’ Wayback Machine. + Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, https://en.wikipedia.org/wiki/List_of_crowdsourcing_projects (accessed March 17, 2013).
  • 6. Amazon Mechanical Turk was launched Nov 2005 Alexa global / country rank of Amazon Mechanical Turk (June 2014): 6,465 / 2,046 crowdsourcing
  • 7. crowdsourcing Each day 200,000,000 recaptcha’s are solved by humans around the world
  • 8. Galaxy Zoo was 1st launched July 2007 Alexa global / country traffic rank of Galaxy Zoo (June 2014): 606,971 / 100,298 citizen science
  • 9. Kickstarter was 1st launched in 2009 Alexa global / country traffic rank of Kickstarter (June 2014): 782 / 326 60,000+ projects successfully funded with more than USD $1,000,000,000 crowd funding
  • 11. Family Search Indexing was 1st launched (beta) 2004 Alexa global / country traffic rank of FamilySearch (June 2014): 4,385 / 1,321
  • 12. Project Gutenberg was 1st launched Dec 1971 Alexa global / country traffic rank of Project Gutenberg (June 2014): 6,615 / 4,066
  • 13.
  • 14. Alexa global / country traffic rank of National Library of Finland 2,535,854 (31-Oct-2012) / 199 (2-Apr-2012)
  • 15. so what? why should a library care about crowdsourcing? Time Life Pictures Getty Images
  • 16. “user engagement refers to the quality of the user experience that emphasizes the positive aspects of the interaction with a web application, and in particular the phenomena associated with wanting to use that web application longer and frequently” Elad Yom-Tov, Mounia Lalmas, Georges Dupret, Ricardo Baeza-Yates, Pinard Donmez, and Janette Lehmann. 2012. The effect of links on networked user engagement. In Proceedings of the 21st international conference companion on World Wide Web (WWW '12 Companion). ACM, New York, NY, USA, 641-642. DOI=10.1145/2187980.2188167 http://doi.acm.org/ 10.1145/2187980.2188167
  • 17. “in addition to increasing search accuracy or lowering the costs of document transcription, crowdsourcing is the single greatest advancement in getting people using and interacting with library collections” Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/ crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013).
  • 18. “While [the National Library of Australia’s] Trove offers a range of user engagement features, and use of each of these features continues to grow, it is Trove’s newspaper text correction features that have attracted the highest level of user engagement.” Marie-Louise Ayres. 2013. ‘Singing for their supper’: Trove, Australian newspapers, and the crowd. Paper presented at IFLA WLIC 2013, Singapore. Accessed June 2014 IFLA Library http:// library.ifla.org/id/eprint/245.
  • 19. Alexa global / country rank of National Library of Australia (June 2014): 10,964 / 249 Trove gets ~78% of all National Library web traffic.
  • 20. National Library of Australia • Online since 2008 • More than 13,000,000 / 127,437,967 newspaper pages / articles (May 2014) • Top text corrector 2,625,205+ lines (May 2014) • 2,682,119 lines corrected each month (average for 1st 5 months 2014) • 129,046,297 lines corrected as of May 2014, up from 66,527,535 lines corrected May 2012 • 129,300 / 8,218 registered / active users (May 2014)
  • 23.
  • 24. California Digital Newspaper Collection • CDNC began digitizing newspapers in 2005 as part of NDNP • Newspapers digitized to article-level as well as to page-level as required by NDNP • Hosted on Veridian beginning 2009 • Collection size 61,412 issues, 545,955 pages, 6,364,529 articles (May 2014)
  • 25. OCR text correction • OCR text correction added Aug 2011 • Corrections are done line by line • 2246 registered / 1,266 active users (Jun 2014) • 2,656,497+ lines of text corrected (Jun 2014) • ~2% of the collection corrected, 98% to go! • Top corrector 717,855 lines > 2x 2nd corrector
  • 26.
  • 27. Cambridge Public Library Historic Newspaper Collection • Cambridge Historic Newspapers online since Jan 2012. • Cambridge Massachusetts Public Library digitized local newspapers (http://cambridge.dlconsulting.com/) • Newspapers digitized to article-level • Collection size 6,346 issues, 59,070 pages, 669,406 articles (May 2014) • Collection includes 13,099 obituary cards
  • 28. 0%10% 90% Historic Cambridge Newspapers
 (1846-1923) Cambridge City Directories
 (1848 - 1910) Cambridge Chronicle
 (August 2005 to present) 2013 monthly averages
  • 29. why correct text? here’s why… Image copyright Karl R Lilliendahl Photographer
  • 30. Deaths. lln»rieff, Esq. of <c .. Qn. Sunday, the till. greatly Drandrellt, of Orms4irJi.- ~ ; ;✓ ' • * On ijfr r inn l j j j i l F i i j ' 1 1 f H a v o d i v y d , Carnarvonshire, S ; **" *- ' « ' March Oxford, F. Tfovmeud, Uerald. » • V . •On Tncsdav last, Mr. Charles. IWilinson, this 8 ; had vf thesis#,, a week ago, which tcrminate<i'iu his death. . / ' ■ O'i Sunday, dJst nit. at. AsbtCnvHall, mar Lancaster, Mr.,Geo. Worn ick, many years house'steward hit late Once The Hamilton and Brandon. He locked himself h»oWn'r«wte<: soon. twelve o'clock" that dny, and fii»-d a loaded pistol "through Ins bead, 1 which instantaneously killed him. Coronet's Verdict, shot himself in a temporary fit of Friday week, raw OCR text Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3. newspaper image
  • 31. user lines corrected* 1 646,873 2 236,323 3 111,749 4 100,749 5 99,999 6 87,720 7 82,768 8 63,786 9 57,441 10 56,458 lines corrected* user 2,455,338 1 1,822,422 2 1,448,370 3 1,265,217 4 1,174,835 5 1,069,669 6 1,058,179 7 1,020,462 8 949,694 9 886,315 10 *numbers from Mar 2014
  • 32. User rank Lines corrected Jun 2014 1 717,855 2 271,972 3 120,220 4 113,787 5 109,999 6 99,999 7 94,742 8 65,637 9 63,786 10 59,724 Lines corrected Oct 2012 242,965 87,515 31,318 24,144 23,184 19,240 18,898 16,875 11,784 9,762
  • 33. uncorrected OCR accuracy by newspaper title Title OCR character accuracy ~OCR word accuracy PRP Pacific Rural Press 1871 - 1922 92.6% 68.1% SFC San Francisco Call 1890 - 1913 92.6% 68.1% LAH Los Angeles Herald 1873 - 1910 88.7% 54.9% LH Livermore Herald 1877 - 1899 88.6% 54.6% DAC Daily Alta California 1841 - 1891 88.2% 53.4% CFJ California Farmer and Journal of Useful Sciences 1855 - 1880 86.5% 48.4% SN Sausalito News 1885 - 1922 70.4% 17.3% *Word accuracy assumes average word length is 5 characters
  • 34. corrected OCR accuracy by newspaper title Title OCR character accuracy Corrected accuracy PRP Pacific Rural Press 1871 - 1922 92.6% 99.3% SFC San Francisco Call 1890 - 1913 92.6% 99.6% LAH Los Angeles Herald 1873 - 1910 88.7% 99.1% LH Livermore Herald 1877 - 1899 88.6% 99.9% DAC Daily Alta California 1841 - 1891 88.2% 99.9% CFJ California Farmer and Journal of Useful Sciences 1855 - 1880 86.5% 99.8% SN Sausalito News 1885 - 1922 70.4% 100.0%
  • 35. Title OCR character accuracy ~OCR word accuracy Corrected accuracy ~Corrected word accuracy PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5% SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0% LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6% LH 1877 - 1899 88.6% 54.6% 99.9% 99.5% DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5% CF 1855 - 1880 86.5% 48.4% 98.3% 91.8% SN 1885 - 1922 70.4% 17.3% 100.0% 100.0% *Word accuracy assumes average word length is 5 characters corrected OCR accuracy by newspaper title
  • 36. correction accuracy by user User Average OCR accuracy Correction accuracy A 70.4% 100.0% B 87.1% 99.5% C 95.4% 99.5% D 86.5% 98.3% E 95.3% 100.0% F 91.0% 100.0% G 91.0% 99.8% H 90.5% 99.0% I 96.6% 99.8% J 94.8% 100.0% K 86.8% 99.3%
  • 37. that’s interesting, but who wants to correct OCR text? it’s
  • 38. Graphic from Kaufmann et al. “More than fun and money. Worker Motivation in Crowdsourcing – A Study on Mechanical Turk.” Motivation
  • 39. Motivation Genealogists and family historians • National Library of Australia’s 2012 Trove status report showed that ~50% of Trove users are family historians • National Library of New Zealand survey found that ~50% of PapersPast users are genealogists PAPERSPAST
  • 40. • 72% visit UDN for genealogical research • 20% visit for various other types of historical research • 87% find obituaries useful • Over 60% find the other genealogical article types (birth and wedding announcements) useful • Only 7% do not find genealogical articles useful • Many are writing family histories and consequently also look for general background information • Older content is much more highly valued than more recent content (see more detailed explanation that follows) • 44% find smaller, rural papers more useful, while only 15% find larger, metropolitan papers more useful Motivation 2012 user survey John Herbert and Randy Olsen. Small town papers: still delivering the news. WLIC 2012, Helsinki Finland. http://conference.ifla.org/past-wlic/2012/119- herbert-en.pdf
  • 41. • CDNC and Cambridge Public Library published a user survey in Mar 2013 • 604 / 32 responses • Surveys are (mostly) identical except for organization name Motivation 2013 user survey
  • 46. • “I enjoy the correction - it’s a great way to learn more about past history and things of interest whilst doing a ‘service to the community’ by correcting text for the benefit of others.” • “I have recently retired from IT and thought that I could be of some assistance to the project. It benefits me and other people. It helps with family research.” Rose Holley. March 2009. Many Hands Make Light Work. National Library of Australia. Accessed June 2014 http://www.nla.gov.au/ndp/project_details/documents/ ANDP_ManyHands.pdf. Motivation Trove users’ report
  • 47. “The ‘typical’ Trove user is a very well educated, highly paid, English speaking employed woman aged fifty or over, with a significant or primary interest in family or local history, who visits the Trove website very frequently. Users of Trove newspapers are older than the average Trove user; only 13% of newspaper users are under 40 years or age.” Marie-Louise Ayres. ‘Singing for their supper’: Trove, Australian newspapers, and the crowd. WLIC 2013,Singapore. http://library.ifla.org/245/1/153-ayres-en.pdf. Motivation Engaged users: Who are they?
  • 48. “Many of Trove’s user engagement features are very popular. More than 100,000 users have registered to date, and more than 2 million tags and nearly 60,000 comments had been added… [Trove] text correction, however, stands head and shoulders above any other user engagement features.” Motivation Engaged users: What do they do? Marie-Louise Ayres. ‘Singing for their supper’: Trove, Australian newspapers, and the crowd. WLIC 2013,Singapore. http://library.ifla.org/245/1/153-ayres-en.pdf.
  • 49. “when someone transcribes a document, they are actually better fulfilling the mission of a cultural heritage organization than someone who simply stops by to flip through the pages” Paraphrased from Trevor Owen’s blog http://www.trevorowens.org/2012/03/ crowdsourcing-cultural-heritage-the-objectives-are-upside-down/ (accessed June 2013). Motivation Engaged users
  • 50. “I am interested in all kinds of history. I have pursued genealogy as a hobby for many years. I correct text at CDNC because I see it as a constructive way to contribute to a worthwhile project. Because I am interested in history, I enjoy it.” Wesley, California Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 51. ! “I only correct the text on articles of local interest - nothing at state, national or international level, no advertisements, etc.  The objective is to be able to help researchers to locate local people, places, organizations and events using the on-line search at CDNC.  I correct local news & gossip, personal items, real estate transactions, superior court proceedings, county and local board of supervisors meetings, obituaries, birth notices, marriages, yachting news, etc.” Ann, California Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 52. “I am correcting text for the Coronado Tent City Program for 1903.  It is important to correct any problems with personal names and other information so that researchers will be able to search by keyword and be assured of retrieving desired results. ... type fonts cause a great deal of difficulty in digitizing the text and can cause problems for searchers.  Also, many of the guests' names at Tent City and Hotel Del Coronado were taken from the registration books and reported in the Program.  This led to many problems in spelling of last names and the editors were not careful to be consistent in the spellings.  This Program is an important resource since it provides an excellent picture of daily life in Tent City and captures much of the history of Coronado itself.” Gene, California Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 53. “I have always been interested in history, especially the development of the American West, and nothing brings it alive better than newspapers of the time. I believe them to be an invaluable source of knowledge for us and future generations.” David, United Kingdom Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 54. CDNC is an excellent source of information matching my personal interest in such topics as sea history, development of shipbuilding, clippers and other ships etc. ... Unfortunately, the quality of text ... is rather poor I’m afraid. This is why I started to do all corrections necessary for myself ... and to leave the corrected text for use of others. .... I am not doing this very regularly as this is just my hobby and pleasure. Jerzey, Poland Personal communications with CDNC text correctors. Motivation CDNC users’ report
  • 55. As an amateur historical researcher my time for research is very limited.  Making time to travel to archives, libraries, and historical societies does not happen as often as I would like.  The Cambridge Public Library’s online newspaper collection has been an invaluable resource and it is fun.  I am very grateful for all the help I have received over the years from so many research organizations. Correcting text has several benefits.  It makes it much more likely that I will find a story if I decide to search for it in the future.  It is a way of saying ‘thank you’ to the Cambridge Library for having such a great resource available and maybe I can make the next person’s research a little easier. It is my own little historical preservation project. Cambridge Historical Newspapers Text Corrector Personal communications with CDNC text correctors. Motivation Cambridge users’ report
  • 56. so old, boring, easily entertained people correct text. convince me there are real benefits.
  • 58. $ Economics Financial value of outsourced OCR text correction for newspapers? The Assumptions • 25 to 50 characters per line in a newspaper column: Assume 40 characters per line (CDNC sample average) • Outsourced text transcription or correction costs USD $0.35 to $1.20 per 1000 characters: Assume $0.50 per 1000 characters
  • 59. $$ 2,656,497 lines x 40 characters per line x 1/1000 x $0.50 = $53,130 $ 129,046,297 lines x 40 characters per line x 1/1000 x $0.50 = $2,580,926 Economics
  • 60. $Financial value of in-house OCR text correction? The Assumptions • Correction takes 15 seconds per line • Cost is hourly wage plus benefits of lowest level employee, $10 for CDNC, $41.88* for Australia AUD $40.38 = USD $41.88 is the actual labor value assumed by the National Library of Australia to calculate avoided costs due to crowdsourced OCR text correction in its 2012 Trove Status Report. Economics
  • 61. $$ 2,656,497 lines x 15 seconds per line x 1/3600 hrs per second x $10.00 per hr = $110,687 $ 129,046,297 lines x 15 seconds per line x 1/3600 hrs per second x $41.88 per hr = $22,518,579 Economics
  • 62. Accuracy “His Accuracy Depends on Ours!" Office for Emergency Management. Office of War Information. Domestic Operations Branch. Bureau of Special Services. [Photo held at US National Archives and Records Administration]
  • 63. Accuracy • Edwin Kiljin (Koninklijke Bibliotheek the Netherlands) reports raw OCR character accuracies of 68% for early 20th century newspapers • Rose Holley (National Library of Australia) reports raw OCR character accuracy varied from 71% to 98% on a sample Trove digitized newspapers Rose Holley. How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper digitisation programs. D-Lib Magazine. Mar/Apr 2009. Accessed June 2014 http:// www.dlib.org/dlib/march09/holley/03holley.html. Edwin Kiljin. The current state-of-art in newspaper digitization. D-Lib Magazine. Jan/Feb 2008. Accessed June 2014 http://www.dlib.org/dlib/january08/klijn/01klijn.html. Public domain graphic courtesy of Wikimedia Commons.
  • 64. Accuracy MAPPING TEXTS* assesses digitization quality of digital newspapers by comparing the number of words recognized to the total number of words scanned * Mapping texts is a collaboration between the University of North Texas and Stanford University aimed at experimenting with new methods for finding and analyzing meaningful patterns embedded in massive collections of digital newspapers.
  • 65. How does low text accuracy affect search recall? The Facts • Average uncorrected OCR character accuracy of the CDNC sample data is ~89% • Average length of an English word is 5 characters • Average word accuracy is 89% x 89% x 89% x 89% x 89% = 55.8% - round up to 60% or 6 out of 10 words correct Accuracy
  • 66. ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT Search recall no text correction instances of “ARNDT” found instances of “ARNDT” not found
  • 67. Accuracy The Facts • Average corrected character accuracy of the CDNC sample data is ~99.4% • Average word accuracy of CDNC corrected text is 99.4% x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%
  • 68. ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT ARNDT instances of “ARNDT” found instances of “ARNDT” not found Search recall with text correction
  • 69. A search for “Arndt” at Chronicling America gives 10,267 results* • If Chronicling America text accuracy is 55.8% (same as uncorrected CDNC sample), then 8,133 instances of “Arndt” were not found • If text accuracy is 97.0%, then 317 instances of “Arndt” were not found Accuracy * Search performed 31 Oct 2012
  • 70. Accuracy Suppose the word/name is longer than 5 characters? The Facts • Assume that average uncorrected / corrected OCR character accuracy is ~89% / ~99% same as CDNC. Name Name length Raw text accuracy Corrected text accuracy Eklund 6 49.7% 94.2% Kennedy 7 44.2% 93.25 Espinosa 8 39.4% 92.3% Bonaparte 9 35% 91.4% Chatterjee 10 31.2% 90.4%
  • 71. Accuracy Name Number of search results Missing results with raw text accuracy Missing results with corrected text accuracy Eklund 2,951 2,987 182 Kennedy 360,723 455,392 26,111 Espinosa 1,918 2,950 160 Bonaparte 44,664 82,947 4,203 Chatterjee 19 42 2 Chronicling America searches done 19-Mar-2013 (6,025,474 pages from 1836 to 1922).
  • 72. but you left out long tails… Public domain illustration from "On The Genesis of Species" by St. George Mivart
  • 73.
  • 74. the long tail* of crowdsourced OCR text correction a probability distribution has a long tail if a larger share of population rests within its tail than it would under a normal distribution ! the most productive users represent a small fraction of the total user population and ~50% of total production, or, said a different way, the largest fraction but individually not quite so productive users are as important as the most productive users The phrase “long tail” was popularized by Chris Anderson in the October 2004 Wired magazine article The Long Tail and by Clay Shirky’s February 2003 essay “Power laws, web logs, and inequality”.
  • 75. user lines corrected* 1 646,873 2 236,323 3 111,749 4 100,749 5 99,999 6 87,720 7 82,768 8 63,786 9 57,441 10 56,458 lines corrected* user 2,455,338 1 1,822,422 2 1,448,370 3 1,265,217 4 1,174,835 5 1,069,669 6 1,058,179 7 1,020,462 8 949,694 9 886,315 10 *numbers from Mar 2014
  • 76. OCR text correction long tails 0 75000 150000 225000 300000 CDNC lines corrected by text corrector 0 750,000 1,500,000 2,250,000 3,000,000 NLA lines corrected by text corrector top corrector 242,965 top corrector 1,456,906 50% 50% 50% 50%
  • 77. Future considerations • How to market / advertise crowdsourcing? • How to motivate crowdsourcers? • Is authentication / identity of crowdsourcers an issue? • How to administer crowdsourced data? Photo of Aleister Crowley [Public domain] from Wikimedia Commons
  • 78. Conclusions Conclusion of the Sonata for piano #32, opus 111 by Ludwig van Beethoven • Lots of crowdsourcing in cultural heritage organizations and elsewhere • Benefits are multi-faceted: Economic, data accuracy, user engagement, increased web traffic
  • 79. are we finished now? Image copyright Dan Heller www.danheller.com
  • 80. Resources Public domain photo “A useful instruction for young sailors from the Royal Hospital School, Greenwich” from the National Maritime Museum.
  • 81. Correct California newspapers at http://cdnc.ucr.edu Correct Cambridge MA newspapers http://bit.ly/cambridgepublic Correct Australian newspapers http://trove.nla.gov.au Correct Virginia newspapers http://virginiachronicle.com Try crowdsourcing!
  • 82. Other resources Mapping Texts at http://mappingtexts.stanford.edu/ Wragge Labs at http://wraggelabs.com/ Wikipedia list of crowdsourcing projects https://en.wikipedia.org/wiki/ List_of_crowdsourcing_projects Wikipedia list of digitized newspapers http://en.wikipedia.org/wiki/ List_of_online_newspaper_archives
  • 83. ? Photo held by John Oxley Library, State Library of Queensland. Original from Courier-mail, Brisbane, Queensland, Australia. Frederick Zarndt frederick@frederickzarndt.com Secretary, IFLA Newspapers Section