Thank you for inviting me to speak here today. Today I am going to talk about the potential of crowdsourcing for archives. This is based on what I have learnt as manager of a crowdsourcing site and from talking to other Managers of crowdsourcing sites. I am expressing my own personal viewpoints today and they are not necessarily the same as those of the National Library of Australia where I work.
Crowdsourcing is a new term. One which my spellchecker does not recognise and a word that hasn’t been used much in a library context up til now. There isn’t actually an agreed definition of crowdsourcing, though there is a great Wikipedia article on it. My explanation of crowdsourcing and the difference between crowdsourcing and outsourcing and social engagement is that: Crowdsourcing is usually done by a large group of unpaid volunteers, rather than a company, working towards a clear big goal, for the common good. The group may use social engagement strategies such as reviewing, marking, checking, identifying items, but rather than just helping them personally these activities when joined together result in a big overall achievement being made. Crowdsourcing may but not always require a greater level of effort than social engagement e.g. rather than clicking a checkbox to rate something you may be asked to read it and categorise it. Crowdsourcing projects almost always have a big seemingly unachievable goal at the beginning.
For Example making out of copyright books electronically available, transcribing birth death and marriage notices so that they become searchable, creating a free online encyclopedia.
Why should archives even think about doing this? The answer is that there are 8 significant benefits for us. We can achieve goals that we would never have the resource – financial or staff to do inhouse or to outsource. Crowdsourcing galvanises people to work fast towards a goal so results happene quickly. The community is actively engaged and we are able to effectivley utilise their knowledge.
The community are adding huge value to our collections and services and in turn we are encouraging a sense of public ownership and responsibility towards cultural heritage items, many of which old significance for our nation. We build trust and loyalty of our community and through the activity we can demonstrate the relevance and value of libraries in our society today. In my talk this morning I am going to show you 8 brilliant examples of crowdsourcing. 2 are from libraries and the other 6 have direct relevance to archives. I’m going to explain to you the common factors in crowdsourcing and give some tips for crowdsourcing. Finally I’m going to look at why libraries aren’t already doing it and what we need to think about and change to go forward into this exciting area. This information has been gathered by interviews I have undertaken with other crowdsourcing site managers asking them simple questions like ‘what lessons have you learnt?. I have been contacted by crowdsourcing site managers since in March I published a report called ‘Many Hands Make Light Work’ which was reviewed internationally and widely discussed. It was about the digital volunteers in the Australian Newspapers service of which I am manager.
This is the first site I am going to discuss as a library example. The site was released in August 2008 and contains 20 milllion articles of out of copyright australian newspapers from 1803 to 1954. Since release it has been heavily used.
It is very innovative since we not only allow, but also encourage all users to correct the electronically translated text of articles. The text is poor because it is the raw OCR and the newspapers are mostly of very poor quality. The electronically generated text created through the OCR process is displayed on the left hand side. This is also where the users can use the 3 enhancement features. Tagging of articles, adding comments to articles and correcting the text. Of the 3 the text correction is the most popular and the feature that is being most used. This innovative feature is not available in any other online newspaper service, and so has created a high level of interest from national libraries internationally. They have been watching us to see the results and activity that is occuring around this, and thinking about its wider application.
The results are pretty astounding both to the National Library of Australia and the world in general. In Nov 2009 over 6000 users have been actively correcting text each month and they have so far corrected 7 million lines of text. They have also been using the other features especially tagging to futher improve the quality and depth of the article information. Oct 2010 – 20 million lines corrected.
My third example is FamilySearchIndexing. A site run by the Church of Latter Day Saints in Utah. In August 2005 they enabled the Indexing part of the site which encourages members of the public to view handwritten BMD records and transcribe them. These records are then transferred into the search system
This is one of the largest sites of its kind. There are currently 160,000 volunteers around the world working on BDM for different countries (including NZ and Australia). 334 million records transcribed over the last 4 years. The volunteers need to help out because most of the records are handwritten and so can’t be effectively OCR’d.
The most interesting example in my opinion is that started in June this year by the UK newspaper The Guardian. There was a big controversy in the UK over MP’s expenses which caused public outcry. The result was that the MP’s expenses claims documents were to be made publicly available. The Guardian digitised them all and in a matter of a few days put up a public website where people could easily read them and mark those they thought needed further investigation and were potentially scandalous. Most of the claims were handwritten and largely illegible.
Within 80 hours 20,000 volunteers had read and checked nearly half of the expenses claims, a staggering 170,000 potentially very boring documents (had it not been for their very personal nature). People were looking for juicy things like expenses claim for pornographic videos, and the discovery of a duckhouse costing $4000 modelled to the very detail of a french chateux. Hardly necessary for taxpayers to pay for.
My second example is picture australia. This contains digital images from different Australia institutions. In 2006 a new feature was implemented in partnership with flickr which was to encourage members of the public to upload their own photographs on particular subjects into the national collections in order to improve the quality and depth of the collections. For example modern day people, places and events are topics we want the public to add.
The public were keen to do this and there is an active pool of volunteers who to date have added 55,000 images to our collections. The quality and standard of these images is very high.
My favourite site is Galaxy Zoo. I strongly recommend you have a look at this one when you get home. It has hooked in the world. It is exposing millions of digital images of the galaxy, never seen before and getting the public to help classify and identify them.
So far there are 150,000 volunteers who have classified over 50 million previously unseen galaxies – exciting stuff!!
An early example and one which is no longer active is the BBC WW2 Peoples War. In 2003 the BBC set up an interactive website to enable the public to record their stories of WW2 and upload their photos and artifacts. It was mainly older people without any previous computer or keyboard skill who did this, and libraries assisted by giving free internet access to those who wanted to contribute. A side outcome was the establishment of an active community who could communicate with each other online. The people in this group were very sad when the project closed and their group communication was shut down.
distributed proofreaders were established in 2000 originally to help Project Gutenberg. Their mission is to make out of copyright texts available for free online. They now work for anyone. Each country has volunteers including Australia and NZ.
They have managed to make 16,000 public domain books and journal issues available over the last 9 years as E-books with their volunteers doing every step of the process – finding the books, scanning the books, ocr’ing the books, proofreading and marking up the books and finally converting them into e-books through a distributed system.
Wikipedia is our most well known crowdsourcing example of course. Although we may not be able to remember life before Wikipedia it has actually only been in existence for 8 years.
It’s achievements have been immense, having a real effect on society. The English version of the encyclopedia has 3 million articles, but actually there are 250 different language encyclopedias containing a total of 10 million articles with the German and Spanish versions being very large.
An example one of my own digital volunteers alerted me to is the Mariners and Ships in Australian Waters. They are transcribing shipping and other related lists, the original items are in the state archives, but this site has been instigated and set up by volunteers, not the state archives. They have 600 volunteers.
There are other examples but I just lastly want to mention the FREEUKGen project. This has different parts. It’s one of the oldest projects starting in 1999 and the public are transcribing British BMD records, the census and other things. It is similar to the FamilySearchIndexing project. There is a real need for handwritten archives, manuscripts and records to be transcribed by hand so that they can become searchable and accessible.
In looking at all these sites I have been trying to find out if there are common factors in crowdsourcing and if what we are experiencing in Australian Newspapers is unique due to our country and resource, or whether crowdsourcing would work just as effectively in other countries, with other resources. I was also interested in finding out the lessons we have all learnt so that we can apply them when we set up new crowdsourcing sites. Research done in 2009 on this. All sites interviewed and stats taken. My discovery is that there are commonalities in almost every project and it is my belief that if libraries: non profit making organisations were to apply the tips for crowdsourcing I am about to share they would undoubtedly be successful.
We’re now going to look at the common factors amongst the examples which are: -Volunteer numbers and achievements -Volunteer profiles -Volunteer motivations -Rewards and acknowledgement -Management of volunteers
All the projects started very quietly and mostly continued without any fanfares publicity or marketing. Initially the numbers of volunteers were very low, but via viral marketing (forums and blogs) volunteer numbers exponentially increased. All sites wondered what would happen if they ran an advert on TV.. In all cases volunteers did far more work to a higher standard than expected and made significant achievements.
The most common questions people ask me are “Who are the volunteers?” and “Why do they do it?” Some people suspected that our text correctors were really library staff, which is not the case. The text correctors are real, normal people. They are anyone and everyone. I sent some of our volunteers a survey (as had the Distributed Proofreaders and FamilySearchIndexing) to find out the answer to these questions. Our survey results matched those of other sites and were very interesting.
The majority of the work is done by ‘super’ users or volunteers. The top 10% of volunteers can do as much as 89% of the work. Their age varies. It is not all older people as some imagine, in fact it is highly likely that moderators or those with extra responsibilities, or the super users are dynamic young professionals who have full-time jobs. There are retired people, but also stay at home mums and disabled or sick people. The volunteer profile is broad.
The motivating factors people gave for doing online voluntary work were no different to those that motivate anyone to do anything, for example they enjoy it, it’s interesting and fun they’re thinking about their own personal goals and also the group outcome. They like to think that what they are doing matters to their country or the world at large so historical and scientific projects especially are big draw cards
When given a high level of trust and respect they want to repay this so work extra hard. When given a big goal they like the challenge, the bigger the better. Giving something back to the community and helping each other were often cited, and many of these projects proved for unknown reasons to be totally addictive. Especially so the Galaxy Zoo and Australian Newspapers.
Not realising that volunteers had such high and sustainable levels of self motivation they had all been asked intially what would motivate them more and their answers were: Give us more stuff to do Raise the bar of the goal Progress chart We want Online camaraderie Clear instructions Acknowledgement Reward
Very few of the sites had thought to give reward or acknowledgement (and had initially associated this with money of which they had none), but several such as ourselves had instigated rewards and acknowledgements suggested by users. All of this was simple and cost free. The most requested was for individuals to be able to identify themselves to other volunteers, and also sometimes the public, and for them to see overall ranking tables to see where they fitted into the big picture. The ranking tables were more about big picture than being of a compettive nature. Other ideas were meeting the paid staff (which surprised the paid staff that this would be considered a reward) and certificates and promotional gifts.
All organisations agreed that management of volunteers was not a big task and nor should it become one. None had dedicated staff to manage volunteers (even Wikimedia which has 10 million volunteers). Instead they all agreed that getting some volunteers to manage others was the way to go and setting up communication and sharing software such as wiki’s and forums was the way to go to minimise staff time. For example instead of a staff member answering an enquiry another volunteer could in the forum could answer the question if they could see it.
So after all this talking I am finally able to summarise for you 14 tips that you should implement on your site if you want to crowdsource effectively. I’m going to illustrate my points with screenshots from the sites I have discussed. I should say no site does all of these things. I think this is largely because no-one has ever looked into crowdsourcing techniques as seriously as I over the last few months and pulled all the pieces together. Therefore if you set up a site which does all 14 things I think you would be on to a winner for sure!
The next is show your progress towards the goal. This simple red bar from the Guardian is very effective.
They’ve taken it to the next level by having progress bars on groups of records as well. They’ve also personalised this one by adding a photo of the MP which motivated people even more.
DP, wikipedia next
Front page – updated in live time
Your system has to be quick to get into and reliable once in. Really seriously consider whether you want people to have to register first or whether they can do it anonymously. You want as few blocks and clicks as possible so they do stuff quickly and on the spur of the moment. This is AN where we decided it was not necessary to login or register first, but they do need to do a captcha for the session to stop spammers and robots.
It must be both easy and fun. Many of the sites that require use of the human eye showed the original image on the left and the action or questions you need to answer on the right. Simple large boxes are key. Here is the Guardian expenses again.
They have only 2 actions to make. The wording on the buttons is also very encouraging
Here we are in galaxy zoo, starting the identification process of an image of a galaxy, with our first simple question.
This is followed by 2 more simple questions. The boxes are clear, easy and quick to just click on.
In Australian newspapers there is no knowledge of wiki editing, html or mark up required. It is simple to look at the image and simply correct the text by clicking on it and then saving on the left.
All sorts of interesting stuff is discovered in these projects and often outcomes you had not expected, as well as your goal happening. It is really important to remember to tell all your volunteers this information, because it spurs them on.
Guardian – don’t you just want to click on the ‘best individual discoveries’?
Here is the ‘hall of fame’ from the AN service. The top 5 correctors show on the home page as well as in the hall of fame. Originally the hall of fame only showed the top 10 but users wanted to see more, so now it is anyone who has corrected more than 5000 lines per month. Users are still asking for entire league tables however so they can see where they are in the big picture. This is a motivating factor for them. During development it was suggested that we need to use gaming technologies to encourage people to correct text but this has so far not proved necessary!
The Guardian implemented ranking tables as well.
Picture Australia acknowledges outstanding contributors by name, publicly (if they agree), and in newsletters and library publications.
The remaining tips are as on this slide. Tip 7. The Content or thing must be interesting (history, science, animals, personal, topical eg guardian scandals) Tip 8. Give volunteers options to be visible (to each other and the public, via profiles on items they have created, helped with, name of galazies) Tip 9. Give volunteers an online team environment e.g. wiki, forum cameradie and fun Tip 10. Give volunteers choices (do the next or pick something) Tip 11. Assume it will be done well (to build trust and expectation) Tip 12. Keep the site alive (new content, activity) Tip 13. Take advantage of topical events (news, disasters, anniversaries, deaths etc - Wikipedia) Tip 14. Listen to your ‘super’ volunteers carefully. Whatever they say is important they are your heaviest users.
However concerns of some libraries, archives and museums in doing social engagement and crowdsourcing activities are loss of power and control. I have addressed each of the concerns listed on this slide with cultural heritage managers and am able to say through experience on the Australian Newspapers project that they have all been disproven, that is vandalism vs disinterest, data corrupted, loss of control, loss of power. None of these things happened, in fact the reverse. So good things can happen when you are not a power and control freak.
The future potential of crowdsourcing digital volunteers is mind boggling when you think of it in the world context, and how many people have internet access. In Australia alone we have 21 million people, more than half of whom have internet access at home so could potentially be volunteers. FamilyIndexSearch project report that in their first year they had 2000 volunteers and by their third year they have 160,000 volunteers correcting birth,marriage and death records. The Australian Newspapers program is set to match this easily.
The future potential of crowdsourcing digital volunteers is mind boggling when you think of it in the world context, and how many people have internet access. In Australia alone we have 21 million people, more than half of whom have internet access at home so could potentially be volunteers. FamilyIndexSearch project report that in their first year they had 2000 volunteers and by their third year they have 160,000 volunteers correcting birth,marriage and death records. The Australian Newspapers program is set to match this easily.