This document discusses applying principles of trust and authenticity to digital archives at the LSE Library. It summarizes how the library has digitized archives from the Royal Economic Society and Press for Change. The library's processes for ingesting, virus checking, imaging, profiling, and creating metadata for digital objects are described. Challenges around digital preservation, infrastructure, and skills are discussed. The importance of collaboration, communication, and prioritizing achievable practices over perfection are emphasized for building trust in digital archives.
Separation of Lanthanides/ Lanthanides and Actinides
Applying Traditional Principles of Authenticity and Trust to Digital Archives at LSE
1. Trust: applying traditional
principles of authenticity and trust
to digital archives at LSE Library
Sue Donnelly, Archivist
Ellie Robinson, Digital Archivist
Ed Fay, Digital Library Manager
5. Royal Economic Society
• Founded 1889
• Began depositing at LSE in 1979
• Moved to digital submission of journal
articles in c.2007
6. Press for Change
• Founded 1992
• Worked with government on legislation:
o Gender Recognition Act 2004
o Equality Act 2010
• Deposited archive in 2012
o arrived on 4gb memory stick
7. A Day in the Life of a USB Stick
What we do and how we stay authentic
16. Metadata
• Technical, preservation, descriptive
• Some automated, some not
• Multi-purpose – to support preservation in
the long term but also to track ownership,
rights.
22. Making the case: articulating value
• Benefits and risks
o Strategic alignment
• Evidence base
o We understand our problems
o We can propose achievable solutions
• Context and terminology
o Key messages, but for whom?
• Importance of internal stakeholders
23. Making the case: articulating value
• Terminology
o Persistent access
o Long-term availability
o Digital continuity/stewardship
o Indefinite retention
o Protection of investment
o Legal compliance
o Competition, reputation, embarrassment
24. Making the case: risk register
Activity
overlooked or
under
resourced
Inadequate
staff skills
Media Failure of
degradation or authenticity,
obsolescence integrity,
Loss of provenance
essential Loss of trust
characteristics or reputation
Insufficient Cannot
backups implement
preservation
Infrastructure plans
cannot support
requirements
25. The Iceberg Model of
Digital Libraries
interfaces
collections/objects
workflows
systems
storage
digital preservation
26. Roles and responsibilities
• Innovation vs service development
o Core skills and focus
o Embedding operational capacity
• Communication
o Bi-lateral (archivists/techies)
o Confident in requirements
o Long process of engagement
o Interesting IT challenges
27. Roles and responsibilities
Academic Services
Academic Services
Senior Management Collection development
• Collection development
• Strategy Information skills training
• Information skills training
• Resources
Digital Library Team
Digital Library Team
Policy
• Policy
Skills / expertise
• Skills / expertise
Innovation / projects
• Innovation / projects
Collection Services
Collection Services Archive Services
Archive Services
Preservation
• Preservation Collection development
• Collection development
Description
• Description Description
• Description
Infrastructure
• Infrastructure Preservation
• Preservation
28. Trust and collaboration
• Comparator analysis (vs conformance)
• ‘Prioritising’ OAIS/TRAC
o Know what is most important for you
o Move in the right direction
o ‘Better’ rather than ‘best’ practice
• Shared infrastructure or services (?)
30. Trustable Digital Repository
• Sufficient investment
o Necessary skills/time/infrastructure
o Key drivers: provenance, authenticity
o Plan to scale, don’t plan to do it all now
• ‘Better’ rather than ‘best’ practice
o Continuous improvement
o Aiming towards maturity of practice
o Not trying to get there in one go
o This will take years...
31. SPRUCE
a project to inspire, guide, support and enable UK HEIs to
address preservation gaps; and to use the knowledge
gathered from that support work to articulate a
compelling business case for digital preservation
• Events: digital preservation solutions
• Embedding: grants to continue work
• Business case: benefits, skills gaps
http://dpconline.org/advocacy/spruce
32. Conclusions
• Trust isn’t a new issue
but the lack of standards is
• Need to learn by practice
getting hands on with the materials
• Keep talking
develop engagement and ways of
communicating requirements
33. “I [trust] LSE [Digital] Library”
Trust is slow to earn...
...and quick to burn
34. Useful links
Out of the Box (LSE Archives blog)
http://lib-1.lse.ac.uk/archivesblog/?tag=digital-archives
Sustainable Preservation Using Community Engagement
http://dpconline.org/advocacy/spruce
You've Got to Walk Before You Can Run: First Steps for Managing Born-
Digital Content Received on Physical Media (OCLC)
http://www.oclc.org/research/publications/library/2012/2012-06r.html
Notas del editor
Show clip from Hall Hartley’s TrustTrust is fundamental to the work of archives – people have been throwing their archives at us and trusting us to catch them for many years. Or abandoning this totally not knowing that they would be ‘adopted’ as useful and important by their local archive service.We’ve been catching paper, parchment, photographs even video and audio for a while now and mostly we do catch them. Occasionally we drop things but not too often. And today people are throwing cds, memory sticks, hard drives and their cloud storage at us – and suddenly the trust doesn’t seem quite so certain.
In the past trust in archives and archivists has been based on concrete and tangible ways of working – Strong rooms – with big doors, locks, fire systems, shelving and the magic code BS (or now) PD 5454.Strong boxes, acid free folders and bleached tape.Reading rooms with lists of rules and regulations, registered readers and a supervisor at the deskThe terms and words are something people understand and archives staff tend have standard and comprehensible ways of explaining this work to potential depositors and users. We also have something that can be easily seen and demonstrated.To be honest I have often been amazed by how often our trustworthiness is taken at face value. Only a minority of depositors even visit the our archive before making their decision on deposit – they accept that if we say we have locks on the door and a fire detection system then that must be true.Users will ask questions about the provenance of the an archive but seem to show little concern about how we might ensure that the on-going authenticity and veracity of our holdings.
In a digital world much of this has changed – We have servers sitting somewhere – not even necessarily in our own buildings.We have the OAIS model which looks pretty but often seems very complicatedOnline access to faceless usersAs archivists we need to navigate this new world for ourselves and just as crucially we need to be able to explain this world to those who create archives and those who want to use them.The experience of LSE is that in the main both depositors and users are asking a lot more questions in this online world - and we need to answer them.One of our key strategic learning points has been that we need to have a strong narrative about what we want to do with digital archives, why we want to do it and how we will do it.
Differences in experience with different researchers can be seen in our experience with a couple of recent depositors.Begin with one of our long standing relationships with the Royal Economic Society – Held the RES archives since 1979 taking in regular deposits since then including minutes, correspondence, and papers relating to the management of the Economic. It was the latter which began our discussions with the RES about digital archiving as in common with many academic journals the EJ submission and management of articles is all digital.We began talking to the RES about these files in 2008 but were unable to make much progress for a long time – number of changes – the archives were no longer directly managed by the RES but by a publisher which added a level of complexity (but may be a common experience) and there was a lack of shared vocabulary – the archive were just getting to grips with the digital world and the RES staff were also just getting to grips with how the move the digital world was going to change their working practices.Discussions continued over a few years – we looked at some sample data, and talked, looked at more data and talked again.In 2012 we held a more formal meeting with the Chair, the administrative secretary and the head of publications (who has a keen interest in the history of economics). By this time we had a little more experience and the LSE Digital Library was almost launched. We were able to show the group our anticipated model of storage and access – talk directly about the limits of experience and technology.At the end of the meeting the RES were persuaded that they should be involved with our work of development and we are hoping to move forward with them in the future.High level of concern about confidentiality and access – would want users to continue to be governed by current arrangements for registration of users – likely to be sometime before they are happy with online, remote access.
Press for Change belongs a different model – it was founded in 1992 and has been the key lobbying and legal support group for transgendered people.Its main presence is as an online group through its website and its active members and supporters have always used digital communications.We were approached in 2012 - the organisation had very different concerns – they had struggled to find a place of deposit because of the subject matter and were therefore happy to deposit their archives in a repository which felt comfortable with this kind of material. Depositors knew more about their digital records and were able to negotiate the deposit agreement and the transfer of data.Also relatively open about access to the materials they were sending – including emails. This was an attitude informed by the organisations commitment to publicity and information.The material has now been transferred and Ellie will be saying more about the process of accepting the archives.
Initial correspondence with depositor. Record same sort of information in the same manner as paper. We have taken custody of the files.
We have intellectual control over the USB stick in that we have assigned reference numbers to it and created an official accession record. Now we need to find out what’s on there and take control over the actual content
This is an example of our write-blocker for USB devices. We also have one for internal disk drives, e.g. if we take on a whole computer. For our floppy disks we just ensure that the ‘write protect’ tab in enabled, and it’s very difficult to write onto a CD or DVD accidentally! We plug the USB stick into one end of the tablet, the other end is plugged into the PC. As far as the PC is concerned, it’s the same as plugging the USB straight in. The table prevents the host PC system from back-writing onto the USB stick, so you can’t accidentally delete or move files, or install updates. So the original media is not altered in anyway when we look at it.
Can customise virus checker provided by the School to suit our needs. Virus checking helps to preserve the security of our collections, but also our end-users know that we are not passing on any viruses to them when they access our material.
Using a forensic imager can create an exact bit-level copy of files. This protects the original media as we don’t have to reload it every time we want to look at something. It also copies the original metadata, directory structure etc., so this is always documented thus ensuring authenticity. Can be used to recover deleted and temporary files, need to use with caution.
Can make it more user friendly by just imaging the files, rather than things like unallocated space. Can also use FTK to preview files which aids appraisal, to export files from the image to secure storage (whilst maintaining metadata), and also to generate MD5 and SHA-1 checksums. Using checksums helps to maintain authenticity, because if an object has changed for whatever reason, its checksum will also change. Therefore we can periodically calculate and compare checksums and be assured that the file is still the same. Using the two different algorithms of MD5 and SHA-1 adds an extra layer of protection.
Gain understanding of what is actually in our possession in terms of file formats, versions etc. DROID maintains directory structure and provides a manifest for what we have, which we can use as a receipt of deposit. Will inform our preservation planning.
Have many objects in obsolete formats. Need to preserve these in a format that is open and common so that we can continue to perform preservation actions in future and also access in the long term. We will also keep the original file format for authenticity and audit trail purposes. Using Xena to conduct normalisation, developed by National Archives of Australia.
Talk about the repository perspective, which encompasses all digital collections preserved by the Library, and involves colleagues from most other teams within the Library.
Digital preservation is often the organisational necessity that people are unwilling to address.It is crucial to have strategies to articulate value and demonstrate benefit. To convince people who aren’t otherwise persuaded of the necessity.And, once you’ve got people to recognise the problem, you have to manage expectations – your own as well as theirs – about what can be achieved and the effort and amount of time it will take
This is how we’ve gone about eating the Trustworthy Digital Elephant at LSEWe proposed to develop our capabilities incrementally, and this is a very high-level summary of the functional areas we chose to address.In OAIS terms, you can think of this as a prioritisation or set of aspirations.We made the very clear decision to implement in phases, starting with preservation, moving on to ingest/management, and finally access.We didn’t actually stick to our initial plan but I’ll explain why.
This is more-or-less what LSE Digital Library looks likes today.The workflow that Ellie has been talking about is represented here – so cataloguing taking place in existing collection management tools, such as Calm – augmented by the digital preservation tools that we feel are necessary.This is in some respects the most complex and currently fluid area of our development. It highlights a lot of questions which have impact further down the line.Next important part – our preservation repository. Which is based on Fedora and the application framework Hydra.At the core, this is about bitstream preservation – replicating enough copies on enough different technologies in enough locations to assure ourselves first of the recoverability of anything we put in there, and second that it is the same thing that we put in there in the first place.It is worth saying that the preservation repository makes no assumptions about any period of time longer, over which time we might see further format problems. Because then again we might not. By normalising up front, where necessary, we assure ourselves of the current renderability of any file. Coupled with high confidence that this file is safe and retrievable, will do us for now. So it’s fair to say that our long-term approach is to hedge for as long as possible.The final piece is our access repository. Or repositories.We learnt pretty early on that requirements of preservation, including security and so forth, are not entirely opposed to those of access.Especially when we are talking about closed archival material.Business continuity requirements are opposed – being able to recover data without caring how long it takes, vs service availability.This can also lead to communication challenges with IT service providers.(Architectural considerations):Modularity (each functional area is entirely independent of the others, meaning we can replace as the need arises)Extensibility (we can adapt to changing collections and requirements)Flexibility (we can change our approach)
How did we make the case within the Library?We had to be able to talk about isolated benefits/risks to items or collections in terms that aligned with Library and University strategy – in terms that the people we would be talking to would care about.These tend to be student experience, or Library usage, so we had to show how digital preservation was core to the mission firstly of the Library and also to the role that the Library plays within the wider institution.We had to be able to talk meaningfully about our collections.Which means understanding what we have, and understanding what the potential risks are. Not just vague warning about digital black holes, but specific examples from our collections and how that would impact our real-world users.It also means proposing realistic pieces of development. We couldn’t propose an approach like a national institution, due to the vast difference in scale, but we also needed to be sensitive to our local context: the politics, who would be involved, their prejudices, how the IT works in terms of organisational preference for commercial or open source solutions, obstacles we might encounter along the way.It’s about understanding who you are talking to in this context and the messages that will be meaningful to them.A director isn’t going to care about degrading CDs or old file formats. But they are going to care about depositors losing their data and deciding to terminate long-standing relationships, headlines in the press about unverifiable research, or funders withdrawing grants due to an inability to meet expectations.You have to be able to convince everyone internally, firstly of the needs, but also that you know what you’re talking about. That you can deliver working solutions to well understood problems.
Used: DRAMBORA.Set of 13 high-level risks, not a detailed functional analysis.Showing how technical risks lead to user impact and ultimately affect the trust placed in you as a memory institution.
Articulating a convincing case can be hard, and to explain why I’ll take a slight tangent into my “iceberg model of digital libraries”This makes the basic point that most people are concerned with what they can see – a user interface they can kick the tires of and do useful stuff with, such as discover and access content.In physical terms this separation of access and preservation is not so dramatic – people can see items on shelves, know that professionals look after them and know that are there when they want them.There are further signifiers of trustworthiness all around them – security barriers, access control, environmental control.But in the digital world, the real extent of requirements for trustworthy digital preservation can be opaque. People either think it’s trivial (it’s not) or it is so hideously complex as to be impossible (it’s not that either)So, the irony is, you are often talking about the benefits of what is immediately apparent to a non-specialist audience, but making the case for the opaque world which sits below.
Some of the practicalities of how we have built our skills and organisational structureCore team: dedicated, permanent staff. Including digital library manager, developer, assistantWorking team: core team, plus collection specialists. Including digital archivist, metadata, preservation, sysadminService capacity: embedded in existing library teams. Including archives, collection development, collection preservation, bibliographic servicesKey to the implementation has been communication. Learning in both directions.
Moving towards this model. This is what works _for us_ it is not a blueprint for everyone.
External working relationshipsCommunity best practice investigation.4 sites visits, 7 additional organisations for desk research/interviewsDiscover how other people do this. There are only a handful of TDRs, but plenty of people who are working to solve their problems. Learn from them.Seeing implementation not as an attempt to cast a perfect solution from nowhere. But as a process of continually improving practice, that addresses _your_ most important requirements, and deprioritises others until they need to be addressed.
TDR is now the elephant.It is tempting to think – who wouldn’t want a TDR.But the reality is that starting with ISO whatever and working your way through is unlikely to be the best approach.It could be a useful tool to help along the way, but pragmatic solutions and an understanding that this will be take time and in all likelihood annoy at least some people within your organisation are a far better way to make tangible progress.It is probably the least-badly kept secret in the preservation community that no-one has actually read OAIS all the way through. But there is a lot in there which can be useful ammunition and a way of consolidating organisational understanding and direction.
Trust – we don’t yet have a fully developed framework for digital archives – standards are developing and evolving and we need to be flexible in our approaches.Best practice – in this environment we are still not able to full articulate what might be best practice – what we have is the ‘best practice right now’ and we need to be developing ‘better practice’The only way to learn in this environment is to get stuck in and learn by doing – need to hands on with files and programmes – see what works for your archive in terms of risks, advocacy, processes or storage.Following these three things will help develop a narrative about preserving digital archives which is coherent to archivists, depositors and users – clear that if we can’t explain this work to ourselves then we won’t be able to explain it to others.George Macdonald, the Scottish poet, said ‘to be trusted is a greater compliment than being loved’ – but I don’t see why we shouldn’t aim at both – to be trusted to care and preserve digital archives and loved because care enough to do it.