Investigating the PROMISE of a Belgian web archive
Introduction to British Library digital resources for social scientists
1. Welcome and introduction to
British Library digital resources
for social scientists
John Kaye – Lead Curator Digital Social Science
Peter Webster - Web Archiving Engagement and Liaison
Manager
7th December 2012
www.slideshare.net/johnkayebl
2. What kind of library are we?
“We exist for everyone who wants to
do research – for academic, personal
or commercial purposes”
Our collections cover all known
subject areas; sciences, technology,
medicine, arts & humanities, social
sciences…
We have a copy of every item
published in the UK
Our collections cover all formats;
sound, images, video, newspapers,
maps, manuscripts, databases,
books and journals, much more…
2
7. Social Science online resources for researchers
ESRC online resource
Management and Business Studies Portal
Social Welfare Portal
www.bl.uk/oralhistory
Social Science blog
7
12. Oral history at a glance
www.bl.uk/oralhistory
370 collections from 1 tape to 5,500 (Millennium Memory Bank)
100-150 hours of new digital fieldwork recordings per month
2200 catalogue records added or updated per year
4000 public enquiries per year
40 talks and lectures per year
60 training sessions per year with OHS (500+ people)
12
13. Guides and support
Reference services: reading room, telephone, email
Help for Researchers web pages
Collection guides, eg for government publications:
http://www.bl.uk/reshelp/findhelprestype/offpubs/guides/govtgu
Topical bibliographies, eg Globalisation and
employment, Gang culture and knife crime, Corporate
Social Responsibility, Far Right in Britain …
Welfare Reform on the Web
13
15. Doctoral Open Days 2013
11 February – Social Sciences
18 February – Media, Cultural Studies and Journalism
http://www.bl.uk/whatson/events/docopendays/index.html
15
16. Web archives and digital method
Dr Peter Webster
Web Archiving Engagement and Liaison Officer
@UKWebArchive / @pj_webster
Peter.Webster@bl.uk
http://www.webarchive.org.uk
December 7th 2012
17. The lost web: people
[votedavidcameron.com, (archived 24/5/05)]
17
18. The lost web: people
[robincook.org.uk (archived 8/8/05)]
18
19. The lost web: organisations
[tvpa.police.uk (archived 21/11/12)] 19
20. The lost web: organisations
[woolworthsgroupplc.com (archived 12/12/08)] 20
21. Our mission:
Collect, preserve, and
make accessible
web sites of
cultural and scholarly
importance
from the UK domain
22. UK Web Archive http://www.webarchive.org.uk
Selective Web Archive
over 11,000 websites collected since
2004
over 50,000 instances
Over 16TB of compressed data
British Library, National Library
of Wales, JISC
Also National Library of Scotland,
the National Archives, Wellcome
Library
Many collaborators
eg Women’s Library, Live Arts
Development Agency, Quakers in
Britain
22
23. A typical event-based special collection
Collect, preserve, and
make accessible
eb sites of
cultural and scholarly
importance
from the UK domain
25. A comprehensive special collection
Collect, preserve, and
make accessible
eb sites of
cultural and scholarly
importance
from the UK domain
26. Web archiving: the basics
What
Selecting, capturing, storing, preserving and managing access to snapshots of websites over time
How
Use crawler software to download websites automatically
Selective or domain archiving
Provide access in a Web Archive
When
Since mid 1990s
Who
Heritage and memory organisations, eg BL, The National Archives
University libraries
Not-for-profit and commercial organisations, eg Internet Archive
Individual researchers
Why
Global information resource
Artefact of cultural and technology change
Representative sample of the web: historical and sociological data that may not be found
elsewhere
Part of national digital heritage - legal requirements
26
27. Selective versus domain archiving
Two complementary approaches: selective and domain archiving
Width
D
e Domain harvesting:
p
- Typically once/twice a year
t - Domain wide snapshot
h - Supported by national legislative
framework
Selective archiving: -- automated & cost-effective
- More frequent gathers; manual QA
- Guided by collection policy
- Can be based on events or themes
e.g. credit crunch
-- manual & expensive
27
28. Non-print Legal Deposit 2013: what will we collect ?
A deposit library is entitled to copy UK publications from the
open web.
A deposit library is entitled to collect other password-protected
material by harvesting, subject to giving at least 1 month’s
written notice for the publisher to provide a password or
access credentials.
28
29. What will we be collecting ?
Includes resources:
• that are issued from a .uk or other UK geographic top-level
domain, or
• where part of the publishing process takes place in the UK;
• but excluding any which are only accessible to audiences
outside the UK.
29
30. What will we NOT be collecting ?
Film and recorded sound where the audio-visual content
predominates
Private intranets and emails
Personal data in social networking sites or that are only
available to restricted groups.
30
31. What will users be able to do with it ?
Users may:
• access deposited material while on “library premises
controlled by a deposit library”.
• print one copy of a restricted amount of any deposited
material, for non-commercial research or other defined ‘fair
dealing’ purposes such as court proceedings, statutory
enquiry, criticism and review or journalism.
31
32. What will users NOT be able to do with it ?
Users may NOT:
• use an item simultaneously with another user;
• make any digital copies, except by specific and explicit
licence of the publisher.
32
33. A web archiving strategy based on prioritisation
Domain Crawl
Event Event Event
Domain Events: Special
harvesting: •Political, Collection:
•Broad sweep cultural, social •Focused,
of .uk domain and economic thematic
•Survey and events of collections
discovery national •Support
•Implement interest, eg priority
Legal Deposit Olympics subjects
2012
33
34. JISC UK Web Domain Dataset (1996-2010)
Funded by JISC to create a research collection of UK
websites
Collaboration between the Internet Archive, JISC and the
British Library
Copy of subset of the Internet Archive’s web collection that
relates to the UK
470466 files, mostly arc.gz, with 4494 warc.gz.
Total size: 32TB
No local access – possible through the Internet Archive
Can be used to generate secondary datasets and make
these available
Analytical access the main route
34
Welcome to the British Library, My name is John Kaye, I am Lead Curator for Digital Social Sciences. I ’ m going to take this opportunity to very briefly outline some of the digital resources that our team and others have produced that could be useful to social scientists in your field. I will then hand over to Peter Webster Web Archiving Engagement and Liaison Manager who will go into more of detail about one of our best digital assets, the UK Web Archive. So you can easily find these resources I have uploaded my slides to slideshare and have also placed some leaflets
National library of the UK, legal deposit library (we receive copy of all published material in UK). One of 5 largest research reference libraries in world Two main functions: the role of the BL is to provide access to these collections to whoever has a need to use them. Also to preserve for future generations (including material in printed and electronic format) Who uses the BL? Diverse audiences from students and academics from around the world - Currently over 60% of users of our collections and services are from UK higher education - includes university libraries, academic staff, postgraduate students, etc. Also the general public for exhibitions and events, to visit the building. Also run large engagement programme with schools which includes online resources and workshops. Includes a wide range of formats of material, not just books & journals, but sound, maps, photographs, illustrations, electronic databases. For example our Sound Archive is a national resource of audio history, recordings, music, including a large collection of wildlife sounds to regional dialects.
The American Vogue archive is now available digitally in our British Library Reading Rooms, including the Business & IP Centre. It features every issues of American Vogue from 1892 to the present day, spanning over 400,000 pages. You can find inspiration from style icons from past and present, from Suzy Parker and Jean Shrimpton to Kate Moss. Explore the history of fashion brands such as Chanel, Elizabeth Arden and Revlon over 120 years. The archive allows you to search across the issues by designer, contributor, type of garment or even fabric. Individual covers, advertisements, photo shoots and fold-outs have been pulled out as separate reports for you to search. Jack the Ripper, Illustrated Police News, October 27 1888. Digitised as part of the digitsation of 19 th century newspapers. Which is available to academic users via institutional login.
Broadcast news service! Broadcast News This service provides access to daily television and radio news and current affairs programmes from seventeen channels (fifteen TV, two radio) broadcast in the UK since May 2010, recorded off-air by the British Library. The programmes will be almost instantly available, with new programmes available in our Reading Rooms within hours of broadcast. We currently record forty-six hours per day, including television services of the BBC, ITV, Channel 4, Sky News, Al-Jazeera English, NHK World, CNN, France 24, Bloomberg, Russia Today and China's CCTV News, plus key news and current affairs programmes from BBC Radio 4 and the BBC World Service. Many of the television programmes come with subtitles, which we have made word-searchable, greatly enhancing Broadcast News as a research resource.
Working with photographs and other visual sources at the British Library is complex but offers great opportunities. Important to note that the examples provided above can be used for other sorts of visual material (maps, illustrations, etc) Working with visual materials will continue to evolve as the Library and researcher ‘go digital’: creating new opportunities and challenges.
For some types of material, there are services that can provide digital copies remotely. UK Theses can now be searched using Ethos, with some 59 institutions offering some or all of their theses. BL Sounds provides digital copies from our sound recordings holdings (by no means all though). Digitsied collections include ethnographic recordings, dialect, oral histories, wildlife sounds and folk songs. The UK Web Archive, mentioned earlier, is also freely accessible from any computer.
A major part of the work we do revolves around finding ways to improve access to our collections and support researchers (both inside academia and beyond – for example in the third sector). We have a new resource on the ESRC website which introduces PhD researchers to our collections. We have two portals which enable access to numerous articles and reports online – both of which are absolutely free. Our Sports and Society website has explored the Olympics and Paralympics through the lens of social science and includes numerous original pieces by academics and other researchers. Our new social science blog is a place to find out more about interesting and unusual collections and the work that we do with our different audiences.
We ’ ve recently produced our new ESRC guide to using the British Library, written specifically with researchers in the social sciences in mind. This is found on the ESRC website – just google ESRC British Library. This provides all the information you need to get started, with a rich collection of case studies from researchers who have used the Library collections, providing inspiration and practical advice. Case studies cover, amongst other topics,: market research; government publications and United Nations documentation; political pamphlets; and cookery and fashion publications.
Other services have been developed with a particular audience in mind. The Management and Business Studies portal is a free service for practitioners and researchers. It brings together information on the Library ’s content, with access to a curated set of research papers, policy documents, briefings and other material. Includes articles on key management thinkers, produced in association with the Chartered Management Institute. You need to register to use some of the features and content on the portal – once registered you can get regular updates on research published and content added.
The MBS portal has proved to be a big success, so we have followed this up with Social Welfare at the British Library – which is our latest service. Like MBS portal, it ’s aimed at practitioners as well as researchers on social policy and social welfare. In addition to access to research and policy documents, you can also find the Welfare Reform Digest, which abstracts news articles, government publications, research reports etc on social policy around the world.
We have recently launched our social sciences team blog focusing on research methods and resources, it has posts from our curators about projects we are working on, but we are also keen to hear what members of the research community are doing so we gladly accept guest posts and contributions, so if any of you would like a place to talk about your work then please get in touch.
I'm now going to go talk more in depth about some of the formats and materials we collect which may be of interest to you. I’m going to try to say something about the materials which aren’t books or journals i.e. the things you would be less likely to find in your local or university library. One such collection area for the British Library is oral histories. We collect and commission oral history recordings on subjects of national interest. Many of these are funded by a charitable trust called National Life Stories which has its home at the BL. Examples include: The Irish Women Travellers (catalogue no: C1106) is a collection of life story interviews with women from the Irish Traveller community. The recordings are part of an oral history research project undertaken by Sue Beck for an MSc in Public Health and Health Promotion at South Bank University, which explores the health of these women across generations and across the life span. The HIV/AIDS Testimonies (catalogue no: C743) is a collection of life story interviews with people with the HIV and AIDS virus. This project has been recorded in two stages. Interviewees from the original set of interviews recorded between 1995-2000 were re-approached and interviewed again between 2005-2008. This project, led by Dr Wendy Rickard, was conducted in conjunction with the University of East London and then London South Bank University. The Socialist Workers' Party Collection (catalogue no: C797) includes recordings made between 1992 and 2000 at the annual 'Marxism' event held in July each year. Speakers include Arthur Scargill, Tony Benn, Terry Eagleton, Tony Cliff, Chris Mullin, John Pilger, Patricia Hewitt, Michael Bogdanov, Christopher Hill, and George Galloway.
On site we also hold two major exhibitions every year. Our new exhibition has just launched: Mughal India: Art, Culture and Empire and our exhibition of On the Road has been on since October and is here until Christmas Our next exhibition will be on Propaganda (May 2013) Myths and Realities events, evening events run by the social sciences team are public events and we have three coming up in the spring around family, work and addiction.
Doctoral Open Days are designed for new postgraduate students (eg in first year) and are a more-detailed introduction to the Library. There is a more-detailed introduction to the Library, with curator-led workshops and talks relating to specific parts of the collections. They are free to attend but booking is essential as they fill up quickly. You can book at what ’ s on section of our website Lastly, if you are not able to attend these days, but would like know more, then reference teams and curatorial staff are very happy to discuss your research with you in more detail.
Footer text here...
Search by URL, title, full-text Browse by
Footer text here...
Footer text here...
Broadly there are 2 complementary main approaches to web archiving. Selective archiving is in general driven by an institution’s collection policy, which focuses on a selected, small portion of the national domain. The gathers tend to be more in-depth and drills into the structure of a website beyond the top level pages. It is a labour intensive archiving procedure involving detailed manual QA. Sites are gathered more frequently as well. In our case, we also ask permissions from the site owners. Web resources deemed appropriate for inclusion were selected and copyright holders of the sites were then contacted and sent requests to grant us a licence to archive and preserve the sites over time and make them available for public access.