20130321 Putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city]

Putting the world’s
cultural heritage online
with crowdsourcing
Frederick Zarndt
@cowboyMontana
frederick@frederickzarndt.com
Slides @ http://bit.ly/crowdsrootstech2013

CCS / Digital Divide Data / DL Consulting
Photo held by John Oxley Library, State Library of
Queensland. Original from Courier-mail, Brisbane,
Queensland, Australia.

In 2004 James Surowiecki published ...

The Wisdom of Crowds: Why
the Many Are Smarter Than
the Few and How Collective
Wisdom Shapes Business,
Economies, Societies and
Nations

In it he says ...

... a crowd of persons that are
diverse ...

...
in
d ep
en
de
nt
...

usually make
better
judgements or
decisions than
single persons

“Country Fair” by Grandma Moses. Original painting 1950.

“crowdsourcing”

was coined by Jeff Howe in “The rise of
crowdsourcing” published in Wired
magazine June 2006.

web trends for
“crowdsourcing”
Jan-2006 to Jan-2013

• On the date of publication of Jeff Howe’s Wired
magazine article, 1-Jun-2007, Wikipedia did not have
an entry (list) of crowdsourcing projects*.
• On 25-Jan-2010 Wikipedia’s list of crowdsourcing
projects had 35 entries*.
• On 17-Mar -2013 Wikipedia’s list of crowdsourcing
projects had 158 entries+.

* From Internet Archives’ Wayback Machine.
+ Wikipedia contributors, "List of crowdsourcing projects," Wikipedia, The Free Encyclopedia, https://
en.wikipedia.org/wiki/List_of_crowdsourcing_projects (accessed March 17, 2013).

Crowdsourcing is the practice of obtaining
needed services, ideas, or content by
soliciting contributions from a large group of
people, and especially from an online
community, rather than from traditional
employees or suppliers. ... [It] is different
from ordinary outsourcing since it is a task or
problem that is outsourced to an undefined
public rather than a specific, named group.

Wikipedia contributors, "Crowdsourcing," Wikipedia, The Free Encyclopedia, http://en.wikipedia.org/wiki/
Crowdsourcing (accessed March 17, 2013)

crowdcollaboration crowd*

crowdsourcing
ng
di

citizen science
un
df
ow
cr

crowdcasting crowdvoting

what is Alexa?

• Alexa collects and analyzes Internet data for purposes of web analytics. Web analytics is the
measurement, collection, analysis and reporting of Internet data for the purposes of understanding
and optimizing web usage. Alexa is now a subsidiary of Amazon.

• Alexa was founded in 1996 by Brewster Kahle (Internet Archive) and Bruce Gilliat.

• Alexa operations includes archiving of webpages as they are crawled. This database served as the
basis for the creation of the Internet Archive accessible through the Wayback Machine.

• Alexa continually crawls all publicly-available websites to create a series of snapshots of the web.

• Alexa gathers information from a variety of sources to provide key statistics about each site on the
web, for example, Traffic Rank, the number of PageViews, and site Speed, Bounce Rate, etc.
This information is derived from Alexa toolbar users (~6,000,000 worldwide).

definitions

• A PageView is a request for a file whose type is defined as a page.

• A Unique Visitor is a uniquely identified client generating requests on the web
server or viewing pages within a defined time period (i.e. day, week or month). A
Unique Visitor counts once within the timescale.

• A Visit is a series of page requests from the same uniquely identified client with a
time of no more than 30 minutes between each page request.

• Bounce Rate is the percentage of visits where the visitor enters and exits at the same
page without visiting any other pages on the site in between.

• World | Country Rank is a function of the average daily unique visits and the number
of unique pages requested.

definitions adapted from Wikipedia http://en.wikipedia.org/wiki/Web_analytics

crowdfunding

Kickstarter (http://www.kickstarter.com/) was 1st launched in Apr 2009. As of 17-Mar-2013
its Alexa Internet traffic rank is 751 (global) / 294 (USA).
35,000+ projects successfully funded with $500,000,000+ by 3,000,000+ people.

crowdvoting

reddit (http://www.reddit.com/) was 1st launched in June 2005. As of 17-Mar-2013 its Alexa
Internet traffic rank is 124 (global) / 54 (USA). reddit had more than 55,000,000 unique
visitors from 175 countries who cast more than 17,000,000 votes about which stories are
important.

Amazon Mechanical Turk (https://www.mturk.com) was launched Nov 2005.
As of 17-Mar-2013 its Alexa Internet traffic rank is 8,219 (global) / 3,036 (USA).

crowdsourcing

Each day 200,000,000 recaptcha’s are solved by humans around the world.

Zooniverse (https://www.zooniverse.org) was 1st launched as Galaxy Zoo July 2007.
As of 17-Mar-2013 it has 801,682 participants worldwide. Its Alexa traffic rank is
271,574 (global) / 127,695 (USA).

Wikipedia

• Wikipedia began 2001

• Now in 285 languages, 24,640,000 articles

• 4,210,000 articles in English

• More than 1,000,000 articles each in German, French, Italian, and Dutch

• 40 wikipedia languages with more than 100,000 articles

• 112 wikipedia languages with more than 10,000 articles

• 488,470,000 unique visitors (Jan 2013)

• 84,848,000 active (5+ edits) contributors

• Alexa global traffic rank: #6 in worldwide web traffic

Statistics from Wikimedia Report Card http://reportcard.wmflabs.org

Family Search Indexing was 1st launched (beta) 2004. As of 17-Mar-2013 Family Search’s
(https://familysearch.org/) Alexa Internet traffic rank is 4,480 (global) / 1,208 (USA).

• Started (beta) 2004

• More than 780,000 worldwide registered volunteers from ~25
countries index records relevant to family history

• Approximately 100,000 active volunteers each month

• UI in Chinese, English, German, French, Italian, Japanese,
Korean, Portuguese, and Russian

• Blind double-key entry with arbitration / reconciliation

• More than 1,500,088,741 records indexed (July 2012)

• Accuracy typically > 99.95%

Statistics from private communication with Family Search 5-Jul-2013

Project Gutenberg was 1st launched Dec 1971.
As of 17-Mar-2013 Project Gutenberg’s Alexa Internet traffic rank 5,192 (global) / 2,851 (USA).

• Started Dec 1971

• Worldwide volunteers transcribe or proofread OCR’d public
domain books through Distributed Proofreaders

• 42,000 free ebooks completed (March 2013)

• More than 100,000 free ebooks offered by its partners and
affiliates

• Partner / affiliated projects for Australia, Canada, Europe,
Germany, Runeberg (Nordic literature), self-published
contemporary authors, Consortia Center in collaboration with
the World eBook Library, ...

As of 17-Mar-2013 the National Library of Australia’s (http://trove.nla.gov.au/) Alexa Internet traffic
rank is 14,490 (global) / 330 (Australia). Trove gets ~75% of all National Library web traffic.

National Library of
Australia
• Online since 2008
• 7,200,000+ pages
• Top text corrector 1,250,000 lines (June 2012)
• 2,450,000+ lines corrected each month (average for
1st 6 months 2012)
• 68,908,757 lines corrected as of July 2012, up from
42,411,468 lines corrected July 2011.
• 63,613 total registered users (July 2012)
• 4,146 active users (June 2012)
Statistics from private communication with the National Library of Australia Oct 2012

Courtesy of Tim Sherrat, Tinkerer-in-Chief at WraggeLabs Emporium (http://wraggelabs.com/

As of 17-Mar-2013 National Library of Finland’s (http://www.nationallibrary.fi/) Alexa Internet global
traffic rank is 4,303,901. Its Internet traffic rank for Finland was 199 as of 2-Apr-2012.

National Library of
Finland
• Digitalkoot is a project to improve OCR text in digitized
newspapers -- by playing games!
• Digitalkoot is a collaboration between the National
Library and Microtask
• Players correct OCR text by playing Myyräsillassa
(Mole Bridge) or Myyräjahdissa (Mole Hunt)
• National Library has 4,000,000+ digitized pages
• 109,321 registered players (October 2012)
• Since February 2011 8,024,530 micro-tasks have been
completed

As of 17-Mar-2013 UC Riverside’s Alexa Internet traffic rank is 11,782 (global) / 4,120 (USA).
CDNC gets ~3.30% of all UC Riverside web traffic.

California Digital
Newspaper Collection
• CDNC began digitizing newspapers in 2005 as part of
the Library of Congress National Digital Newspapers
Program (NDNP)
• Newspapers digitized to article-level in addition to
page-level as required by NDNP (same as Utah Digital
Newspapers)
• Since 2009 hosted on Veridian at http://cdnc.ucr.edu
• Collection size 55,970 issues, 495,175 pages, 5,658,224
articles, 498,000,000+ lines (Mar-2013)

OCR text correction

• OCR text correction added August 2011
• Corrections are done line by line
• ~578,000+ lines of text corrected Oct 2012
• ~935,398+ lines of text corrected Mar 2013
• ~2% of the collection corrected, 98% to go!
• Top corrector 327,244 lines > 2x 2nd corrector

Cambridge Public Library
Historic Newspaper Collection

• Cambridge Historic Newspapers online since Jan 2012.
• Cambridge Massachusetts Public Library digitized local
newspapers (http://cambridge.dlconsulting.com/)
• Newspapers digitized to article-level
• Collection size 6,346 issues, 59,070 pages, 669,406
articles (Mar-2013)
• Collection includes 13,099 obituary cards

Why correct text?
Here’s why ...

Raw OCR text Newspaper image
Deaths. lln»rieff, Esq. of <c .. Qn.
Sunday, the till. greatly Drandrellt, of
Orms4irJi.- ~ ; ;✓ ' • * On ijfr r inn
ljjjil F iij '11 f Havodivyd,
Carnarvonshire, S ; **" *- ' « ' March
Oxford, F. Tfovmeud, Uerald. » • V .
•On Tncsdav last, Mr. Charles.
IWilinson, this 8 ; had vf thesis#,, a
week ago, which tcrminate<i'iu his
death. . / ' ■ O'i Sunday, dJst nit. at.
AsbtCnvHall, mar Lancaster,
Mr.,Geo. Worn ick, many years
house'steward hit late Once The
Hamilton and Brandon. He locked
himself h»oWn'r«wte<: soon. twelve
o'clock" that dny, and fii»-d a loaded
pistol "through Ins bead, 1 which
instantaneously killed him. Coronet's
Verdict, shot himself in a temporary fit of
Friday week,

Excerpt from The British Newspaper Archive, Chester Courant, Tuesday 6-Apr-1819, page 3.

Motivation
Graphic from Kaufmann et al. “More than fun and money. Worker Motivation in
Crowdsourcing – A Study on Mechanical Turk.”

Wisdom of crowds

Each person should have private information even if
Diversity it's just an eccentric interpretation of the known
facts.
People's opinions aren't determined by the opinions
Independence
of those around them.

People are able to specialize and draw on local
Decentralization
knowledge.

Some mechanism exists for turning private
Aggregation
judgments into a collective decision.

James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How
Collective Wisdom Shapes Business, Economies, Societies and Nations, Anchor Books, New York, 2005.

Cognitive surplus

... people are learning to use their free time for creative activities
rather than consumptive ones [such as watching TV] ...

... the total human cognitive effort in creating all of Wikipedia in
every language is about one hundred million hours ...

... Americans alone watch two hundred billion hours of TV every
year, or enough time, if it would be devoted to projects similar to
Wikipedia, to create about 2000 of them ...

Clay Shirky. Cognitive surplus: Creativity and generosity in a connected age. Penguin Press. New York. 2010.

Motivation
Genealogists and family historians

• The 2012 National Library of Australia’s Trove
status report showed that ~50% of Trove users are
family historians

PAPERSPAST • National Library of New Zealand survey found that
~50% of PapersPast users are genealogists
• A 2013 California Digital Newspaper Collection
survey shows that more than 65% of its users are
genealogists; 75% are 50 years old or older
• A 2012 Utah Digital Newspapers survey showed
that 72% of its users are genealogists*
*John Herbert and Randy Olsen. “Small town papers: Still delivering the news”. Paper given
at 2012 World Library and Information Congress. Helsinki. August 2012.

Motivation
Trove users’ report

• “I enjoy the correction - it’s a great way to learn more about past
history and things of interest whilst doing a ‘service to the
community’ by correcting text for the benefit of others.”

• “I have recently retired from IT and thought that I could be of
some assistance to the project. It benefits me and other people. It
helps with family research.”

From Rose Holley in “Many Hands Make Light Work.” National Library of Australia March 2009.

Motivation
CDNC users’ report

“I am interested in all kinds of history. I have pursued genealogy as a
hobby for many years. I correct text at CDNC because I see it as a
constructive way to contribute to a worthwhile project. Because I am
interested in history, I enjoy it.”
Wesley, California

Personal communications with CDNC text correctors.

Motivation

“I only correct the text on articles of local interest - nothing at state,
national or international level, no advertisements, etc. The objective
is to be able to help researchers to locate local people, places,
organizations and events using the on-line search at CDNC. I correct
local news & gossip, personal items, real estate transactions, superior
court proceedings, county and local board of supervisors meetings,
obituaries, birth notices, marriages, yachting news, etc.”
Ann, California


Motivation

“I am correcting text for the Coronado Tent City Program for 1903.
It is important to correct any problems with personal names and
other information so that researchers will be able to search by
keyword and be assured of retrieving desired results. ... type fonts
cause a great deal of difficulty in digitizing the text and can cause
problems for searchers. Also, many of the guests' names at Tent
City and Hotel Del Coronado were taken from the registration books
and reported in the Program. This led to many problems in spelling
of last names and the editors were not careful to be consistent in the
spellings. This Program is an important resource since it provides
an excellent picture of daily life in Tent City and captures much of
the history of Coronado itself.”
Gene, California


Motivation

“I have always been interested in history, especially the
development of the American West, and nothing brings it alive
better than newspapers of the time. I believe them to be an
invaluable source of knowledge for us and future generations.”
David, United Kingdom


Motivation

CDNC is an excellent source of information matching my
personal interest in such topics as sea history, development of
shipbuilding, clippers and other ships etc. ... Unfortunately, the
quality of text ... is rather poor I’m afraid. This is why I started to
do all corrections necessary for myself ... and to leave the
corrected text for use of others. .... I am not doing this very
regularly as this is just my hobby and pleasure.
Jerzey, Poland


Ok, raw OCR newspaper text
is bad. But how much
difference can one person
(me) really make?

You can make a
difference

Graphic courtesy of TYPEinspire (http://typeinspire.com/)

User Lines corrected Lines corrected User
1 242,965 1,456,906 1
2 87,515 1,385,369 2
3 31,318 1,010,360 3
4 24,144 960,230 4
5 23,184 847,340 5
6 19,240 786,147 6
7 18,898 657,187 7
8 16,875 600,513 8
9 11,784 582,276 9
10 9,762 565,384 10

Statistics from Oct 2012

uncorrected OCR accuracy by
newspaper title
OCR character ~OCR word
Title
accuracy accuracy*

PRP Pacific Rural Press 1871 - 1922 92.6% 68.1%

SFC San Francisco Call 1890 - 1913 92.6% 68.1%

LAH Los Angeles Herald 1873 - 1910 88.7% 54.9%

LH Livermore Herald 1877 - 1899 88.6% 54.6%

DAC Daily Alta California 1841 - 1891 88.2% 53.4%

CFJ California Farmer and Journal
86.5% 48.4%
of Useful Sciences 1855 - 1880

SN Sausalito News 1885 - 1922 70.4% 17.3%

*Word accuracy assumes average word length is 5 characters

corrected OCR accuracy by
newspaper title
OCR character Corrected
Title
accuracy accuracy

PRP Pacific Rural Press 1871 - 1922 92.6% 99.3%

SFC San Francisco Call 1890 - 1913 92.6% 99.6%

LAH Los Angeles Herald 1873 - 1910 88.7% 99.1%

LH Livermore Herald 1877 - 1899 88.6% 99.9%

DAC Daily Alta California 1841 - 1891 88.2% 99.9%

CFJ California Farmer and Journal
86.5% 99.8%
of Useful Sciences 1855 - 1880

SN Sausalito News 1885 - 1922 70.4% 100.0%

corrected OCR accuracy by
newspaper title
OCR character ~OCR word Corrected ~Corrected word
Title
accuracy accuracy* accuracy accuracy*

PRP 1871 - 1922 92.6% 68.1% 99.3% 96.5%

SFC 1890 - 1913 92.6% 68.1% 99.6% 98.0%

LAH 1873 - 1910 88.7% 54.9% 99.1% 95.6%

LH 1877 - 1899 88.6% 54.6% 99.9% 99.5%

DAC 1841 - 1891 88.2% 53.4% 99.9% 99.5%

CF 1855 - 1880 86.5% 48.4% 98.3% 91.8%

SN 1885 - 1922 70.4% 17.3% 100.0% 100.0%

*Word accuracy assumes average word length is 5 characters

correction accuracy by user

Average uncorrected Average corrected
User
text accuracy text accuracy
A 70.4% 100.0%
B 87.1% 99.5%
C 95.4% 99.5%
D 86.5% 98.3%
E 95.3% 100.0%
F 91.0% 100.0%
G 91.0% 99.8%
H 90.5% 99.0%
I 96.6% 99.8%
J 94.8% 100.0%
K 86.8% 99.3%

the long tail* of crowdsourced OCR text
correction

a probability distribution has a long tail if a larger share
of population rests within its tail than it would under a
normal distribution

the most productive users represent a small fraction of
the total user population and ~50% of total production,
or, said a different way, the largest fraction but
individually not quite so productive users are as
important as the most productive users

*The phrase “long tail” was popularized by Chris Anderson in the October 2004 Wired magazine article
The Long Tail and by Clay Shirky’s February 2003 essay “Power laws, web logs, and inequality”.

OCR text correction long tails

3,000,000

2,250,000
50%
300000

top corrector 242,965 1,500,000 top corrector 1,456,906
225000

50% 750,000

150000 50%

0

75000
NLA lines corrected by text corector

50%
0

CDNC lines corrected by text corrector

Website traffic

After a crowdsourcing transcription project of diaries from the American
War Between the States, Nicole Saylor, Head of Digital Library Services
at the University of Iowa Libraries, reported

“On June 9, 2011, we went from about 1000 daily
hits to our digital library on a really good day to
more than 70,000.”

Nicole Saylor interviewed by Trevor Owens. “Crowdsourcing the Civil War: Insights Interview with Nicole Saylor” blog post at http://
blogs.loc.gov/digitalpreservation/2011/12/crowdsourcing-the-civil-war-insights-interview-with-nicole-saylor/. Dec 6, 2011.

Website traffic

Website traffic at CDNC before / after implementing
crowdsourcing

before crowdsourcing after crowdsourcing
change
11-Jun-2011 / 12-Jul-2011 11-Jun-2012 / 12-Jul-2012

visits 17,485 21,488 +22.9%

unique visitors 11,381 13,376 +17.5%

visit duration 9m 24s 11m 7s +18.3%

bounce rate 51.3% 44.5% -6.8%

pages per visit 14.9 11.7 -21.5%

Crowdsourcing
benefits

Public domain photo courtesy of US Navy

$
Economics

Financial value of outsourced OCR text correction for
newspapers?
The Assumptions
• 25 to 50 characters per line in a newspaper column:
Assume 40 characters per line (CDNC sample average)
• Outsourced text transcription or correction costs USD
$0.35 to $1.20 per 1000 characters: Assume $0.50 per
1000 characters

$
Economics

$ 578,000 lines x 40 characters per line x 1/1000 x
$0.50 = $11,560
$ 68,908,757 lines x 40 characters per line x
1/1000 x $0.50 = $1,378,175

$
Economics

Financial value of in-house OCR text correction?
The Assumptions
• Correction takes 15 seconds per line
• Cost is hourly wage plus benefits of lowest level
employee, $10 for CDNC, $41.88* for Australia

AUD $40.38 = USD $41.88 is the actual labor value assumed by the National Library of Australia to calculate
avoided costs due to crowdsourced OCR text correction in its 2012 Trove Status Report.

$
Economics

$ 578,000 lines x 15 seconds per line x 1/3600 hrs
per second x $10.00 per hr = $24,083
$ 68,908,757 lines x 15 seconds per line x 1/3600 hrs
per second x $41.88 per hr = $12,024,578

Accuracy

“His Accuracy Depends on Ours!"
Office for Emergency Management. Office of War Information.
Domestic Operations Branch. Bureau of Special Services. [Photo
held at US National Archives and Records Administration]

• Edwin Kiljin (Koninklijke Bibliotheek the Netherlands) reports
raw OCR character accuracies of 68% for early 20th century
newspapers
• Rose Holley (National Library of Australia) reports raw OCR
character accuracy varied from 71% to 98% on a sample Trove
digitized newspapers

Edwin Kiljin. “The current state-of-art in newspaper digitization.” D-Lib Magazine. January/February 2008.
Rose Holley. “How good can it get? Analysing and improving OCR accuracy in large scale historic newspaper
digitisation programs. D-Lib Magazine. March/April 2009.
Public domain graphic courtesy of Wikimedia Commons.
Graphic is logo for Accuracy in Media (http://www.aim.org/)

Accuracy
Mapping texts* assesses digitization quality of digital
newspapers by comparing the number of words recognized to
the total number of words scanned

* Mapping texts is a collaboration between the University of North Texas and Stanford University aimed at
experimenting with new methods for finding and analyzing meaningful patterns embedded in massive
collections of digital newspapers.

Accuracy

How does low text accuracy affect search recall?
The Facts
• Average uncorrected OCR character accuracy of the
CDNC sample data is ~89%
• Average length of an English word is 5 characters
• Average word accuracy is 89% x 89% x 89% x 89% x 89%
= 55.8% - round up to 60% or 6 out of 10 words correct

Search recall no text correction

ARND
T
ARNDT ARNDT
DT
ARN ARNDT
ARNDT

ARNDT ARNDT

ARNDT
ARNDT

instances of “ARNDT” found instances of “ARNDT” not found

Accuracy

The Facts
• Average corrected character accuracy of the CDNC
sample data is ~99.4%
• Average word accuracy of CDNC corrected text is 99.4%
x 99.4% x 99.4% x 99.4% x 99.4% = 97.0%

Search recall with text correction

ARNDT
ARNDT

ARNDT ARNDT

ARNDT
ARNDT

ARNDT
ARNDT ARNDT

ARNDT

instances of “ARNDT” found instances of “ARNDT” not found

Accuracy

A search for “Arndt” at Chronicling America gives
10,267 results*
• If Chronicling America text accuracy is 55.8% (same as
uncorrected CDNC sample), then 8,133 instances of
“Arndt” were not found
• If text accuracy is 97.0%, then 317 instances of “Arndt”
were not found

* Search performed 31 Oct 2012

Accuracy
Suppose the word/name is longer than 5
characters?
The Facts
• Assume that average uncorrected / corrected OCR
character accuracy is ~89% / ~99% same as CDNC.
Name Name length Raw text accuracy Corrected text accuracy
Eklund 6 49.7% 94.2%
Kennedy 7 44.2% 93.25
Espinosa 8 39.4% 92.3%
Bonaparte 9 35.0% 91.4%
Chatterjee 10 31.2% 90.4%

Accuracy

Chronicling America searches done 19-Mar-2013
(6,025,474 pages from 1836 to 1922).

Number of Missing results with Missing results with
Name
search results raw text accuracy corrected text accuracy

Eklund 2,951 2,987 182
Kennedy 360,723 455,392 26,111
Espinosa 1,918 2,950 160
Bonaparte 44,664 82,947 4,203
Chatterjee 19 42 2

Resources

Public domain photo “A useful instruction for young sailors from the Royal
Hospital School, Greenwich” from the National Maritime Museum.

Comprehensive worldwide list of online
newspaper archives

Wikipedia contributors, "List of online newspaper archives," Wikipedia, The Free Encyclopedia, https://
en.wikipedia.org/wiki/Wikipedia:List_of_online_newspaper_archives (accessed March 17, 2013).

Search many digital newspaper
collections at once!

As of 17-Mar-2013 elephind (http://www.elephind.com) has indexed 930 newspapers from 11
historical digital collections comprising 1,041,086 issues and 44,158,901 pages/articles.

Try crowdsourcing!

Correct California newspapers at http://cdnc.ucr.edu

Correct Australian newspapers http://trove.nla.gov.au

Correct Cambridge MA newspapers http://bit.ly/cambridgepublic

Correct Tennessee newspapers http://tndp.lib.utk.edu

Correct Virginia newspapers http://virginiachronicle.com

Login with user name “crowdsatrootstech2013” or
“crowdsatrootstech2013@gmail.com”,
password “roots$tech”

Hãy thử crowdsourcing!
Correct Vietnamese newspapers http://bit.ly/nationallibraryofvietnam

Попробуйте краудсорсинга!
Or try Russian language periodicals http://bit.ly/russianperiodicals

Kokeile crowdsourcing!
Or try Finnish newspapers http://digi.lib.helsinki.fi/sanomalehti

Other resources

Mapping Texts at http://mappingtexts.stanford.edu/

Wragge Labs at http://wraggelabs.com/

Wikipedia list of crowdsourcing projects
https://en.wikipedia.org/wiki/
List_of_crowdsourcing_projects

?
Frederick Zarndt
@cowboyMontana
frederick@frederickzarndt.com
Slides @ http://bit.ly/crowdsrootstech2013

CCS / Digital Divide Data / DL Consulting

Photo held by John Oxley Library, State Library of Queensland. Original from Courier-mail,

Brisbane, Queensland, Australia.

FYI about Trove

• If you hope to begin your text correction hobby
with Trove’s family notices (births, deaths,
weddings), you may have a tough go of it. As of
17-Mar-2013, there were 768,333 family notices
in Trove digitized newspapers; most seem to
have already been corrected.
• Lack of text correction opportunity
notwithstanding, now you know where to find
768,333 family notices published in Australia
from 1803 to 1954.

Try crowdsourcing!

Correct British newspapers http://www.britishnewspaperarchive.co.uk/

The British Newspaper Archive is a subscription service from
brightsolid and the British Library. From now until the end of
RootsTech you can use it at no cost with the user name and
password below.

Login with user name “crowdsatrootstech2013” or
“crowdsatrootstech2013@gmail.com”,
password “roots$tech”

20130321 Putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city]

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a 20130321 Putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city]

Similar a 20130321 Putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city] (20)

Más de Frederick Zarndt

Más de Frederick Zarndt (20)

Último

Último (20)

20130321 Putting the world's cultural heritage online with crowdsourcing [rootstech salt lake city]