Uses of Library Collections

Context and collections
Ben O’Steen, British Library Labs
@benosteen

Getting to the heart of it
British Library Labs works with researchers on their
specific problems, trying to assess how widely this
problem is felt.
With their help, we talk to communities of researchers and
try to pinpoint what they need as opposed to what they
think they need to ask us.

One theme keeps appearing:
All projects to date would’ve been made
incredibly easier if all “items” were accessible
and citable (in a way that a computer can
follow).

Impact?
Hard to measure but:
- 13-20 million hits on average every month,
over 500,000,000 hits to date.
- Over 450,000 tags added by volunteers and
machine algorithms.
- Iterative crowdsourcing is key to making
the collection more useful to more people.

Iterative crowdsourcing?
(The term is borrowed from Mia Ridge.)
1. Crowdsource broad facts and
subcollections of related items emerge.
2. No 'one-size-fits-all': Subcollections allow for
more focussed curation.
GOTO 1

Tagathon found nearly 30,000 maps!

Georeferencing - http://bl.uk/maps

Can code identify subjective qualities?
Mario Klingemann

http://www.robertelliottsmith.com/?p=530

Presentation shapes perception
“On The Road”, Jack Kerouac
(via http://www.openculture.com/2007/08/on_the_road_the_original_scroll.html)

Creative Uses
• David Normal installation at Burning Man Festival
• “Moments” by Joe Bell
• Colouring-in Pages for Children

David Normal
http://www.davidnormal.com/

Burning Man Festival
David Normal created light boxes around the
Burning man, using the British Library’s Flickr Images

“Crossroads of Curiosity
(20th June -> November, 2015)

But how can anyone find anything
useful?

John Cooper, https://www.flickr.com/photos/atomicshed/2436324958 CC-BY-NC-ND 2.0

Infancy of understanding
Large-scale analysis of
text is evolving but
young.
Exasperating situation
where ‘black boxes’ of
algorithms are used to
draw conclusions.
http://www.scottbot.net/HIAL/?p=41271

“Black Boxes”:
a misnomer
It is legitimate and
useful to use code that
you could not write.
It is not legitimate to
simply believe the
‘label’ on the side of
the box.
E.g. “Sentiment
Analysis” is often
nothing of the sort.

Quoting Scott Weingart: (emphasis mine)
● Do sentiment analysis algorithms agree with one another enough to be considered
valid?
● Do sentiment analysis results agree with humans performing the same task
enough to be considered valid?
● Is Jockers’ instantiation of aggregate sentiment analysis validly measuring
anything besides random fluctuations?
● Is aggregate sentiment analysis, by human or machine, a valid method for revealing
plot arcs?
● If aggregate sentiment analysis finds common but distinct patterns and they don’t seem to
map onto plot arcs, can they still be valid measurements of anything at all?
● Can a subjective concept, whether measured by people or machines, actually be
considered invalid or valid?
(again from http://www.scottbot.net/HIAL/?p=41271)

“I am interested in
travel accounts in
Europe during the
19th Century”

2013 Competition winners
http://labs.bl.uk/Ideas+for+Labs
Pieter
Francois

Bias in digitisation
The tool was made to give a statistically valid sample.
Due to the paltry amount digitised, it showed how skewed
the digital corpus is, compared to the overall holdings.
Allen B. Riddell in “Where are the novels?”* estimates
that using HathiTrust’s corpus:
“... about 58%—somewhere between 47% and 68%—of
the 2,903 novels [all publications in English between 1800
and 1836] have publicly accessible scans.”
* (2012) https://ariddell.org/where-are-the-novels.html

In Summary:
- Context about how an digitised image came to be and
why it was scanned is both crucial to understand and
sometimes crucial to hide.
- aka Opening up large collections brings its own issues.
- Presentation shapes perception.
- Too much trust in black boxes algorithms, like search
engines or social feed suggestions.
- So little of our history is online that there is a natural
bias. The gaps are being filled in with less credible
sources.
- It still might have happened even if you cannot google it, and vice
versa!

My contact details:
ben.osteen@bl.uk
@benosteen
Links:
http://labs.bl.uk
http://mechanicalcurator.tumblr.com
https://flickr.com/photos/britishlibrary
https://github.com/bl-labs
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html

Uses of Library Collections

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a Uses of Library Collections

Similar a Uses of Library Collections (20)

Más de benosteen

Más de benosteen (20)

Último

Último (20)

Uses of Library Collections