1. 1http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
http://www.bl.uk/projects/british-library-labs
18th
March 2016 – BL Labs Roadshow 2016
Presentation at Sheffield Hallam University
Funded by the Andrew W. Mellon Foundation
2. 2http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
http://www.bl.uk/projects/british-library-labs
Funded by the Andrew W. Mellon Foundation
18th
March 2016 – BL Labs Roadshow 2016
Presentation at Sheffield Hallam University
4. 4http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Digital research methods
http://labs.bl.uk/Launch+Event (has some examples from researchers)
Corpus analysis tools
Text Mining
Visualisations
Location based searching
Geotagging
Annotation
Natural Language
Processing
Using Application Programming Interfaces for
datasets e.g. Metadata, Images
Transcribing
Crowdsourcing /
Human Computation
6. 6http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Managing expectations:
Looking for very specific things digitally
•Only a small amount of content is digitised!
•Might not be the treasure you expect at the
end of a digital journey!
•Starts with a conversation!
•Research interests?
•What digital collections are you interested
in?
8. 8http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Intersection
You and the Library
Your research
interests
Digital
collections
interested in
British Library
Digital
collections we
have
This is where Labs works
9. 9http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
http://www.bl.uk/subjects/digital-scholarship
http://labs.bl.uk/Digital+Collections
Soon…http://data.bl.uk
Mini Network Area Storage
Device (NAS) guide:
http://goo.gl/E8aRyQ
In 20 years time…
11. 11http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
•Submit ideas by 11 April 2016.
•Two finalists announced late May 2016.
•Residency June – October 2016.
•Up to £3600 support, technical, curatorial etc.
•Showcase @ Symposium Monday 7Nov 16.
•Winner £3000 & Runner up £1000!
Competition
12. 12http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
•Projects already using BL digital content in
interesting and innovative ways.
•Submit projects (previous and new) by
5 September 2016.
•Artistic, Commercial, Research and Learning /
Teaching categories.
•Winners announced @Symposium 7 Nov 16.
•£500 Winner & £100 Runner Up.
Awards
13. 13http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Projects & Ideas
•Ideas change once you try to access, examine
and use the data!
•Talk to us about working on potential ideas /
projects.
14. 14http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
2013
Pieter Francois
Dan Norton
2014
Desmond Schmidt
Bob Nicholson
2015
Katrina Navickas
Adam CrymbleDina MalkovaMario Klingemann
Spatial Humanities Project at
Lancaster University
James Heald
Who and Why?
Please refer to the Winners’ Hand out
19. 19http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Katrina Navickas (2015)
Political Meetings Mapper
http://politicalmeetingsmapper.co.uk
Labs Symposium 2015 (33 mins)
https://goo.gl/Qq78Oa
Interview 2015 (4 mins)
https://goo.gl/BSA3be
The Chartist Newspaper
http://goo.gl/vOLSnH
Chartist Monster Meeting
See after lunch…
20. 20http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Adam Crymble (2015)
Crowdsource Arcade
What if crowd sourcing
looked like this?
http://goo.gl/LBfJ4W
Game Jam - http://goo.gl/OH9pOZ
30 mins talk
Labs Symposium (2015)
https://goo.gl/7z0j8p
5 min interview (2015)
https://goo.gl/SSRsdd
http://goo.gl/0APpE8
23. 23http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Curious Images Event 2014
https://goo.gl/ubL0AO
Labs Symposium 2015
https://goo.gl/gRZ5Ia
44 Men who Look 44
(Notice the direction faces)
Tragic Looking Women
Collage Art
Artistic (2015)
Mario Klingeman - Quasimondo
Read more: http://goo.gl/dM8ieA
A Hat on the Ground
Spells trouble
25. 25http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
http://www.lancaster.ac.uk/fass/projects/spatialhum.wordpress/
Labs Symposium 2015: https://goo.gl/ZCU56a
Research (2015)
Spatial Humanities: Lancaster University
Combining Text and
Geographic Information
http://goo.gl/yZ3xCJ
Investigating geographical
representation of disease in
digitised 19th
Century
newspapers
26. 26http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Special Jury’s Prize (2015)
James Heald – Wikimedia and Map work
https://goo.gl/WYZCB2
http://goo.gl/HNQq5e
https://goo.gl/VPgffL
https://commons.wikimedia.org/
Labs Symposium (2015)
https://goo.gl/djtm1b
28. 28http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Creative with Wildlife Sounds
http://goo.gl/s7siv0
Sound Edit Wildlife Films
Competition 2013 http://vimeo.com/60401313
'Dave's Wild Life' by
Samuel de Ceccatty, won first prize!
http://sounds.bl.uk/Environment
31. 31http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Snipping out images
from 65,000 Digitised Books
Face recognition
Mechanical Curator
Flickr
Worked better for female faces
than men’s
Press
http://mechanicalcurator.tumblr.com
>400,000,000 views
> 500,000 tags
http://www.flickr.com/photos/britishlibrary/
1,020,418 images
need tagging!
Creative uses of images
http://goo.gl/qPPgxX
32. 32http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Tagging a million images
Iterative Crowdsourcing
http://goo.gl/j6fxac
Cardiff University’s
Lost Visions Project
http://www.metadatagames.org/
James Heald
Mario Klingemann
Chico 45
Use computational methods
Human Tagger
Top British Library Flickr Commons Taggers
34. 34http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Finding one image
on Flickr
Finding many more!Make collages
Make 4 paintings
Exhibit light boxes at
Burning Man 2014
In Nevada USA
Work with Labs &
British Library to install
Light boxes in London
In 2015
36. 36http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Let’s have a party!
Music mix by DJ Yoda using British Library Sounds
Exhibited from
June to Nov 2015
20th
June 2015
https://soundcloud.com/bbc6music/dj-yodas-library-jam
Installation in Poet’s Circle in Piazza outside at BL in St Pancras London
Images from Burning Man and Flickr
brought into the Poet’s Circle
41. 41http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Data Problems [stuff goes in here]
•Metadata isn’t as clean as many
•Square brackets to indicate inferred
information
•Code to plot when a record had square
brackets by Ben O’Steen
44. 44http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Simple Data Structures
•Everything has a URL
•URL links to page which tells you about the
thing
•Link to other things
•URL readable by machine not just a human
•Not there yet!
45. 45http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Training /
Teaming up with Expert?
•Many researchers have the domain knowledge but
lack the technical skills to use Digital Research
Methods
•Should our support be more focused on training?
•There are plenty of computational experts looking
for problems to solve
•Should they be teamed up with those that have
problems that need solving
47. 47http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Finding Open Digital Collections
• Copyright cleared for research
use
• Curated (Is there someone who
knows the ‘story’ about the
collection?)
• Collection / Item Level
Metadata available? (What state
is and does it need cleaning?)
• Where is it?
Image from : https://goo.gl/Qjeqo1
50. 50http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Supporting
Digital Experiments better
BL Labs Git Hub Site Re-OCRing Newspapers Flickr API
BL Explore – Search Catalogue Python Code
51. 51http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Lessons…
•Huge appetite to use BL digital content & data
(e.g. Flickr Commons stats).
•Identifying / bridging gaps for researchers to
use BL data.
•Labs can help researchers navigate through
the Library to get the data they want.
52. 52http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Perfection vs Imperfection
•If we focus too much on perfection
we will never get anything done!
•Fear of failure seen as a negative
thing.
•Just don’t be scared to try
experiments!
54. 54http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Finally…
•Try and examine, use our data and talk
to us about your ideas and projects!
•Consider entering the Competition and
Awards!
•You never know you might….
56. 56http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Accessing the Mini NAS
Mini Network Area Storage Device (NAS) guide:
http://goo.gl/E8aRyQ
Find the ‘opendata’ wireless access
point and join it.
The passphrase is ‘opendata’
Accessing data folders
Username:guest
Password:guest
Or use ftp://10.0.0.1/
58. 58http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
What’s on the Mini NAS?
• British National Bibliography – 3.5 million records (2
Gb)
• 90,000 Playbills – 1602 – 1902 (60 Gb)
• ALTO XML (includes METS and MODS) for OCR of
65,000 volumes, 22 million pages mostly from the 19th
Century (909 Gb)
• 1 million images snipped from books put on Flickr (667
Gb)
• 70,000 tagged images (170 Gb)
60. 60http://labs.bl.uk @BL_Labs @sheffhallamuni #bldigital labs@bl.uk
Ideas Lab
•Get into groups of 2-6
•Read the instructions in the Ideas Lab
Pack
•We are around to help and advise.
•Enjoy it and have fun!
25 Seconds (68 Words)
My name is Mahendra Mahey and I work on a project called British Library Labs. We are based at the British Library in London, in the Digital Scholarship department and we work closely with the Digital Research team there, Stella Wisdom is here today from that team today. It’s been running for three years now and is funded by the Andrew W. Mellon Foundation.
33 Seconds (100 Words)
In a nutshell the project encourages researchers, artists, entrepreneurs, educators and anyone else,
<Click>
to ‘experiment’ with our digital collections and data. We are particularly interested in those who have questions which focus on the potential to find and create NEW things through access to the digital content. For example, being able to ask a question across thousands of digitised books or newspapers using computational techniques would not feasible using manual methods. Let’s look at a clear example.
<Click>
Adam Crymble was doing his PhD research on Distant Reading at King’s College. He won a competition to explain his thesis in 2 minutes in the PhD Comics competition.
Examples like this will hopefully INSPIRE YOU to use the British Library’s digital content in some way in your work by showing some of what others have done.
17 Seconds (53 Words)
<Click>The British Library is one of the largest Library’s in the world <Click> with an estimated 180 million physical items, with only a small proportion being digitised. <Click>We estimate this is around 1-2%, but no one really knows exactly how much. However, increasingly more items are being stored as ‘born’ digital, such as the UK Web Archive<Click>
35 Seconds overall
We have created collection guides detailing some of these digital collections <Click>on our Digital Scholarship site.
<Click>and some on the Labs site.
<Click> Soon data.bl.uk will be the place where people can directly access some of the digital collections we have available.
<Click> Today we have brought data with us, see the guide on how to access it and print outs on the tables.
A pause for thought and reflection however, digital is just a current technology to deliver information. Perhaps in Years to come <Click>we won’t be using the word ‘Digital’ <Click>in front of the word ‘Scholarship’. It will just be ‘Scholarship’, digital tecnology will be part of the EVERYDAY process of research. Any way back to the present.
6 Seconds (20 Words)
So <Click> ‘how’ do we try and engage those who might be interested in the BL’s digital collections and data? <Click>
41 Seconds (123 Words)
One way is by running an annual competition which is open to the world! All you have to do is
<Click>submit and idea by 11 April 2016.
<Click>The two finalists will be announced in late May <Click>and they work with in residence between June and October,
<Click>where they will get up to £3600 financial support, together with technical, curatorial and other types of support.
<Click>The winners will showcase their work and receive their prizes at our symposium on Monday 7th of November.
<Click>£3000 will be awarded to the winner and £1000 for the runner up.
15 Seconds (45 Words)
The next way we try to engage those interested in using our digital content is through our Awards,
<Click>these recognise work already carried out using our digital content.<Click>The deadline for this year is the 5 September. You can submit previous and new projects<Click>in one of four categories: Artistic, Commercial, Research and Teaching & Learning <Click> Winners will be announced on Monday 7th of November
<Click> where each category winner, winning £500 with £100 for the runners up.
8 Seconds (24 Words)
The final way to engage with our digital collections and data is to simply examine and use our data. We have learnt ideas usually change when we have done this. Talk to us about projects or ideas you would like to work on whether it’s for the competition, awards or something else.<Click>
21 Seconds (63 Words) (LEAVE as Automatic)
The library is learning WHO wants to use our digital content and most importantly WHY? What you can see are just the winners of our competition and awards for the last 3 years. There are so many more people who have been engaging with the Labs. I will give a flavour of some of the work carried out and later we will talk about this engagement in more detail. <Click>
7 Seconds (21 Words)
So focusing back on the competition, let’s look at a few examples.
21 Seconds (65 Words)
Katrina Navickas was particularly interested in the <Click>Chartist Movement who were a group who were campaigning for the vote for working people. <Click>They were the biggest popular movement for democracy in 19th century British history, just as this is early picture shows a huge monster meeting at Kennington Common<Click>She wanted to use a combination of manual and computational methods to explore our Digitised Newspapers to find out when and where they met and plot them on map. <Click>and hopefully unearthing new history.
27 Seconds (82 Words)
Adam Crymble <Click>wanted to harness the power of playing fun games on arcade machines to help with crowdsourcing the tagging of un-described images. He particularly wanted to engage a younger audience into crowdsourcing .<Click>On the right you can see a replica 1980’s arcade machine we built and <Click>and on the bottom left some tagging games that were developed through a ‘Games Jam’ for the machine. <Click>. Let’s take a closer look at two of the games…<Click>
79 Seconds Video Clip
We are close to installing the machine at the National Video Arcade in Nottingham to see how successful the games will be. If you’re interested in having the machine in your institution, please contact us.
https://www.youtube.com/watch?v=xoCgHo2rwN4 (Switch on Subtitles)
1.47 – 3.06 1 min 19 seconds
<Click>
From 1.47 to 2.28 – Art Treachery – 41 seconds
From 2.28 to 3.06 – Art Attack – 38 seconds
Total for both clips – 1 min 19 seconds
9 Seconds (25 Words)
Now on to our Awards, these recognise work *already* carried out using our digital content. Last year’s categories were Artistic, Entrepreneurial and Research. <Click>
37 Seconds (112 Words)
The artistic winner was Mario Klingemann otherwise known as ‘Quasimondo’ . He tries to use computers to generate art or do clever and interesting things such as find images. He worked a lot a collection of un-described images largely from the 19th Century. <Click> Here you can see a picture of a 44 men he found algorithmically who looked around 44<Click>notice how the eyes of the faces change from left to right. <Click>Bottom Left is an attempt to use code to find images of <Click> ‘Tragic looking women’ and <Click>Top Right above is an attempt to create computer art by snipping bits of images together computationally.
26 Seconds (78 Words)
Dina Malkova was the winner of Commercial category. <Click>Inspired by a small digitised fragment of an <Click>illustration of Alice’s Adventures Under Ground original handwritten manuscript<Click>Dina made handmade and bespoke bow ties and cufflinks. <Click>You can still buy these items in the Alice pop up shop in London and of course online on Etsy.
12 Seconds (37 Words)
<Click>The research winner were a Spatial Humanities group of researchers from Lancaster University <Click>who focussed on analysing digitised newspapers to establish when and where diseases were mentioned in the Victorian Era and <Click>plotting them on a map to look for patterns.<Click><Click>
18 Seconds (56 Words)
Indexing BL the 1 million & Mapping the Maps – was led by James Heald and collaboration with others <Click>They produced an index of 1 million 'Mechanical Curator collection' images on <Click>Wikimedia Commons from a collection of largely un-described images. <Click>This gave rise to finding 50,000 maps within the collection partially through a map-tag-a-thon <Click>These are now being geo-referenced. <Click>
105 seconds
Curator Cheryl Tipp Curator of Environment and Nature Sounds <click>in Digital Scholarship worked with the creative industries department at the British Library and a company called Ideas Tap to launch the <click>‘Sound Edit Wildlife Films Competition’ which challenged animators, filmmakers and photographers to create a short film inspired by the Library's collection of 10 wildlife sound recordings.
<click>The winning entry was 'Dave's Wild Life' from Samuel de Ceccatty, a fantastic short which follows Dave, an amateur naturalist whose sole aim is to have his own TV show. The clip I will show uses the ‘Haddock drumming calls’ to give a voice to the cranes or, as Dave liked to call them, the Diplodocus longus cranum.
Cue up video and play from 47 s- 1.58
http://vimeo.com/60401313
Cue up video and play from 45 s- 1.58
75 seconds
The work of Labs is really about a number of stories, stories about digital collections and about researchers wanting to ask fascinating research questions about them. Let’s now tell you a story about one collection and the intended and unintended consequences of working with it.
The Library digitised 65,000 17th to 19th century books from our collections a few years ago (around 2.7 % of the physical total in that period). You can view them from our catalogue or read them on your <click>IPad via the Historical Books app developed by BiblioLabs. We also captured 22 million individual page images, along with full text scans of these images all of which contain untold quantity of useful data such as names of people, places, historical events, dates.
So the question became then, what next? What can 65,000 books tell us?
Posts small illustrations taken almost at random from the digitised book corpus to a Tumblr blog.
This experiment with undirected engagement was a by-product of work to uncover the hidden wealth of illustrations within the digitised pages.
Play from 4m 50 seconds to 5m 19 seconds
6 Seconds (18 Words)
Just to inspire you, I couldn’t resist showing you the animating of some British Library images, using Creature Software by Kestrel Moon, developed by a former PIXAR animator.
Let’s look at the finished work!
16 Seconds Video Clip
https://goo.gl/QilqqT
1.27- 1.43 – 16 seconds
85 seconds
<click>The British Library faces many challenges of access to our Digital collections!
<click> Sometimes digital content is only available onsite due to license restrictions,
<click>or even only on a specific computer in a reading room! Technically there are very few reasons why digital content can’t be online
<click> though it might be too big or hasn’t been transferred from other digital storage media.
<click>Sometimes access is through a paywall. Finally,
<click>some content is in the happy sunny place, online, open and freely available.
The real reasons why there are challenges to accessing digital content are of course human. They require different approaches from the Library and may often involve an honest, open dialogue and negotiation with the publishers.
The Labs project has tried to address this problem my creating a ‘residency model’ for researchers to work intensively with a digital collection on-site, so as to not infringe access conditions, I will say more about this later.
115 seconds
Finding openly licensed collections is sometimes like detective work and from lessons learned Labs, uses the following 4 methods for filtering digital content:
<click>Is the Copyright cleared for research and non commercial use?
<click>Is it Curated (Is there someone who knows the ‘story’ about the collection?)
<click>Is there Collection / Item Level Metadata available? And importantly what state is it in, does it need cleansing?
<click>Finally, where is it?
<click>These have been effective filters in doing the work of Labs in an agile way.
<click>Labs has therefore identified several collections at the website above, some are shown in the slide:
<click>Due to our licensing conditions, we are in the process of text mining the abstracts for a large number of journal titles in electronic form. The visualisation indicates the subject spread of our collections.
<click>We have been harvesting the UK Web since 1993 and this is available as a resource under specific conditions for research.
<click>We are also investigating the use of our item request data (around 17 million records) and anonymised reader data, data protection allowing.
<click>The British National Bibliography has over 3 million catalogue records available as linked open data, licensed under CCO from the British and Irish National Library catalogues.
More information is available on the Labs website, and we hope to one day develop data.bl.uk a place where all our open content and data lives with a unique identifier for each data set.
5 Seconds (15 Words)
<Click> Why is the Library doing this? Well there are many reasons, but essentially it is about…
20 Seconds (62 Words)
<Click>Labs is learning important lessons on how we are supporting researchers who want to experiment with our digital content using digital methods. <Click>We are learning what we are doing right.<Click>Understanding what researchers want, <Click>learning if we provide the appropriate services, tools and resources to support them. Trying to understand where the gaps are <Click>and what we should be doing in the future. <Click>
28 Seconds (86 Words)
We have learned many lessons. I will touch on a few briefly here. <Click>There is a tremendous appetite from researchers, artists, entrepreneurs and others who want to use our digital content/data (see our Flickr Commons Image statistics later). <Click> We are identifying and bridging gaps for researchers to access BL data.<Click> and helping researchers navigate through the Library’s systems and processes to get to it. <Click>At our first roadshow, student Alison Pope suggested that BL Labs acts like a human API (or access point) connecting people to the BL’s digital data.
20 Seconds (62 Words)
The Labs is a place where we do many small experiments quickly. Most importantly it’s where it’s OK to make mistakes and learn from them. Fail faster and fail better! Perhaps Jimmy Wales’ advice (founder of Wikipedia) can sum what we have learned time and time again.<Click>
40 Seconds
Video Clip
http://www.bbc.co.uk/news/business-34808495
20 Seconds (61 Words)
<Click>Examine and use our data and talk to us about your ideas and projects.<Click>Consider entering the Competition and Awards <Click>You never know YOU might….
9 Seconds (28 Words)
A tweet from Professor Melissa Terras from University College London, <Click>.