Milena Dobreva (University of Malta, MT): How to Index Biographical Data from Archival Documents Using the Methods of the Citizen Science
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
co:op-READ-Convention Marburg - Milena Dobreva
1. How to Index Biographical
Data from Archival
Documents
Using the Methods of
Citizen Science?
Milena Dobreva
University of Malta
2. Grain elevators. Caldwell, Idaho, 1941. Photo by Russell Lee.
Prints and Photographs Division, Library of Congress
3. Main topics
• What is citizen science? (And what is the
connection to crowdsourcing? And archives?)
• Why citizen science is still not very popular in
the memory institutions?
• How specific tasks (e.g. indexing biographical
data from archival documents) can benefit
from citizen science?
• What hybrid models combining automated
methods and human contributions are
currently emerging?
4. Citizen science
• Involvement of members of the general public
in scholarly projects designed by academics
– Unprofessional researchers
– Voluntary participation (the reward for the
volunteers is intrinsic)
• Tasks may vary but currently those most
popular involve data collection or data entry.
• Crowdsourcing is one possible method (but it
is not necessarily aimed at research tasks!)
5. Indexing
• Structured data (databases, linked
open data; Encoded Archival
Context - Corporate bodies,
Persons, and Families (EAC-CPF))
• Extracts (rather than full texts)
• Quality control by professionals
• Extensive work on:
– Prosopography (SNAC, PROSOP, etc.)
– Merging lists of persons from
different sources/researchers
6. In a nutshell…
• Change in research – open science
• Big data and vast digital resources
• From ‘standing on the shoulders of
giants’ to ‘picking the brains’ of
these giants (and not only theirs!)
7. One historical example
In July 1857 the ‘Unregistered Words
Committee’ of the Philological Society of London
issued a circular asking for volunteers to read
particular books and copy out quotations
illustrating ‘unregistered’ words. The volume
was such that in January 1858, The Society
decided that “efforts should be directed toward
the compilation of a complete dictionary, and
one of unprecedented comprehensiveness.”
In April 1879, the newly-appointed editor James
Murray issued a new appeal to the public,
asking for volunteers to read specific books in
search of quotations to be included in the future
dictionary. Within a year there were close to
800 volunteers and over the next three years,
3,500,000 quotation slips were received and
processed by the OED team.
Sir James Murray before 1910
in the Scriptorium, Banbury
Road
15. Tasks aligned with participatory
models of citizen science
• Source: Bonney et al. (2009)
16. Some findings from previous
research
• The inclusion of the citizen in research studies contributed
to a rise in interest in the area. When the data of a
research is made public, the citizens are encouraged to
interpret and study this data in order to come to their
own conclusions. This is one of the most educational
features of citizen science.
• Citizen science is a good way to get cheap or free labour,
skills and computation power – but not in the memory
institions!
• This is a good way for citizens to understand and
appreciate research. (They also get to see how their tax
money is being utilized).
17. Project outcomes:
intended vs actual
• Data sets: 56
• Data analysis: 46
• Academic publications: 43
• Technical reports: 25
• New discoveries: 31
• New research methods: 17
• New inquiry: 21
• Policy changes: 21
• Community action: 38
• Environmental restoration: 23
• Individual learning: 47
• Data sets: 48
• Data analysis: 40
• Academic publications: 33
• Technical reports: 21
• New discoveries: 22
• New research methods: 11
• New inquiry: 14
• Policy changes: 11
• Community action: 26
• Environmental restoration: 17
• Individual learning: 42
Based on Wiggins, A., K. Crowston. (2012).
18. Data ownership
• No policy: 11
• Currently developing policy: 4
• Researchers own the data: 15
• Project contributors own the data: 13
• Third party owns the data: 1
• Public owns the data: 23
• Not sure/don’t know: 6
Based on Wiggins, A., K. Crowston. (2012).
21. Citizen science: where?
(applied vs pure/basic research)
Quest for
fundamental
understanding?
Yes
Pure basic
research
(Bohr)
Use-inspired
basic research
(Pasteur)
No –
Pure applied
research
(Edison)
No Yes
Considerations of use?
22. Some challenges
• Matching projects and people
• Division of labour and integration of contributions (contribution,
collaboration, co-creation)
• Platforms and their interoperability with other tools used within
the institution
• Trust in citizens’ contributions
• Motivation and its fluctuations
• What do the citizens gain (in terms of “domain literacy”) – longer
engagement beneficial!
• Practical issues: how/what domains are addressed in citizen
science projects; issues of quality and quantity of research
output; data ownership and data interoperability still not
sufficiently addressed
23. Civic Epistemologies
• Roadmap: www.civic-epistemologies.eu/roadmap/
• Registry of tools: www.civic-epistemologies.eu/registry-
of-resources/
• During the CIVIC EPISTEMOLOGIES Final Conference in Berlin
(12-13 November 2015), organised in cooperation with the
RICHES project, the partners of both projects proposed a set
of principles aiming to encourage and support the
participation of citizens in digital cultural heritage and
humanities research. The Berlin Charter, available online at
www.civic-epistemologies.eu/berlincharter is open to
be adopted by cultural and academic institutions, private
organisations, artists, professionals, researchers and
interested citizens.
24. Summary
• Tools
• Academics involved (research
project)
• Tasks
– Indexing: tasks like “where is Waldo”
• Quality expectations (users of the
outcome)
• Relationships with the volunteers
25. Archives and citizen science:
possible scenarios
• Competition (data created/analysed by machines vs by people)
• Facilitation (citizen science seen as method to generate big data)
• Interpretation (using humans to contextualise data applications)
• Complementarity (combining both in various combinations)
• Strategic partnerships
26. Citizen science and archives…
friends or foes?
• Capitalising on the tradition of voluntary work
• Identifying projects
(and new types of
user involvement)
• Building hybrid
infrastructures (tools,
citizens, professionals)
28. References
1. Bonney, R., H. Ballard, R. Jordan, E. McCallie, T. Phillips, J. Shirk, and C.
Wilderman (2009). Public participation in scientific research: defining the
field and assessing its potential for informal science education. A CAISE
Inquiry Group Report. Center for Advancement of Informal Science
Education (CAISE), Washington, D.C., USA.
2. Bonney, R., C. B. Cooper, J. Dickinson, S. Kelling, T. Phillips, K. V.
Rosenberg, and J. Shirk (2009) Citizen science: a developing tool for
expanding science knowledge and scientific literacy. BioScience 59(11):977–
984.
3. European Commission; Green Paper on Citizen Science (2013). Available on:
http://www.socientize.eu/sites/default/files/Green%20Paper%20on%20Citiz
en%20Science%202013.pdf
4. Franzoni, C., H.Sauermann (2014) Crowd Science: The Organization of
Scientific Research in Open Collaborative Projects. Research Policy, 2014,
Vol. 43 (1), pp. 1-20.
5. Wiggins, A., K. Crowston. (2012). Describing Public Participation in Scientific
Research, iConference 2012 Toronto, Ontario, Canada. Available:
http://crowston.syr.edu/system/files/iConference2012.pdf