Abstract: Founders and Survivors is an Australian Research Council-funded research project to build biographies of the approx. 70,000 convicts transported to Tasmania, and their descendants. The project is a collaboration between historians, public health scientists, and a growing number of volunteer genealogists and amateur historians. Our tools include a massive and complex XML database, and social and collaboration tools built with Drupal and Google Docs. This presentation will describe the goals and challenges of the project, the motivations behind the adoption of these tools, and their implementation. I am one of two developers on the Founders and Survivors project. I will introduce the project and how we use 'crowdsourcing' methods to enrich our archival sources.
Introduction to the project. Founders and Survivors: Tasmanian convicts and their descendants -- health and resilience. Collaboration between researchers from Universities of Melbourne and Tasmania and elsewhere.
Digital history: 'problem' or opportunity. Historical, archival resources that have not been compiled or explored.
Some examples of current research.
Shoestring budget. Experimental. Lots of research questions, limited technical resources.
Users of this website. Research team: diverse backgrounds, locations. Amateur historians with interest in Tasmanian history.
Some existing archival material has been digitised. We want to incorporate material from lives of convicts after leaving the convict system, from a range of primary sources and family histories.
The project began with digitised images of archival documents from court trials, ships and prisons, recording the physical and behavioural characteristics of convicts before and during transportation and during their sentence in Van Diemen's Land.
Good starting point for quantitative history. High-quality data.
Less reliable. Less accessible. Collaboration between 'professional' and 'amateur' historians
Our volunteers include family historians, retired historians, librarians and engineers ... Interest in family or local histories, or convicts in general. Varying levels of experience with technology and historical research.
What happened to convicts after they left the convict system?
How to collate different sources of data and incorporate new data (from volunteers and other researchers or archives). Experimentation – solutions not planned from the start.
Our other developer has consolidated the different sources of tabular data into one massive XML database using the BaseX engine and a data format based on the Text Encoding Initiative.
At the same time, I was experimenting with presenting some of the same ata in Drupal, but it would not scale (73,000 convicts, many different source documents for each). Drupal is now used to document the project, collect some data from volunteers, and coordinate volunteer efforts.
Some tabular data has been captured in Excel or CSV form. Most textual/narrative documents are yet to be transcribed and will require more human intervention to incorporate them into the master database. Unfulfilled dream about GEDCOM import.
Public and staff views of consolidated convict biographies using XSLT. Link between basex and ccc: scripts to add links to basex, run as cron jobs.
Convict biographies are captured in Drupal. XSLT template for a convict record includes a url to create a new entry in a Drupal form, using the Prepopulate module to capture enough from the XML record to assist in two-way linkage. (Just the record ID)
Automated process to incorporate community-contributed content into the master database (Perl).
Consolidated source info from the XML entry and prepopulated Drupal form with link to Archive Index ID number. [NB some of our record IDs are obscure. Here: CON31/40...]
What if more than one person submits info on the same convict? These will not be identical because every descendant or research has different (but overlapping) info. All submissions are checked by staff before being added to the master database.
More committed volunteers are assigned to ships and try to trace all convicts on that ship. In addition to convict biographies in Drupal, some summary data (targeted at analysis) is captured in Google Spreadsheets, one for each ship. Prepopulated using the Perl Google Docs API.
Links to XML and Drupal records.
Scale: both developers started on this project around the same time, with our own experiments (Drupal and XML), and XML appeared more suitable to the scale of our dataset. That was when we had much less data than we do now. Complex nature of our data: combination of tabular, textual and image sources; XML was a more natural fit for presenting a whole individual's lifecourse. Expertise: Some staff and volunteers seemed to have difficulty navigating complex forms. For the ship project, which involved volunteers making lots of numerical entries, we decided to use spreadsheets with validation controls instead.
Building a web frontend which is more than the requisite "About the project" site – interface to XML database, data capture, and volunteer forums. BaseX and Drupal live on our own servers – not dependent on Google.
This model has evolved as new data has become available and new analytical questions have been proposed – we did not know exactly what we needed to do when we began 3-4 years ago.