This course revision presents a rapid recap of all the tools covered in the KeepIt course. It reproduces selected slides from each of the presentations given during the course to illustrate three aspects of each of the tools encountered: what they do, what they look like, what we did with them. The presentation was given as part of the final module of a 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. For more on this and other presentations in this course look for the tag ’KeepIt course’ in the project blog http://blogs.ecs.soton.ac.uk/keepit/
Decarbonising Buildings: Making a net-zero built environment a reality
Keepit Course 5: Revision
1. Digital Preservation Tools for
Repository Managers
A practical course in five parts
presented by the KeepIt project
By
Chris Blakeley
Revision with Steve Hitchcock
A rapid recap of tools from the course:
what they do, what they look like, what we did with them
2. Tools Module 1
• The Data Asset Framework (DAF), Sarah
Jones, University of Glasgow, and Harry
Gibbs, University of Southampton
• The AIDA toolkit: Assessing Institutional
Digital Assets, Ed Pinsent, University of
London Computer Centre
3. … because good research needs good data
Themes addressed in DAF surveys
• Data: type / format, volume, description, creator, funder
• Creation: policy, naming, versioning, metadata & documentation
• Management: storage, backup, roles and responsibilities, planning
• Access: restrictions, rights, security, frequency, ease of retrieval, publish
• Sharing:collaborators, requirements to share, methods, concerns
• Preservation: selection / retention, repository services, obsolescence
• Gaps / needs: services, advice, support, infrastructure
www.data-audit.eu/
DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton
4. … because good research needs good data
The methodology
http://www.data-audit.eu/DAF_Methodology.pdf
www.data-audit.eu/
DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton
5. … because good research needs good data
How would you scope:
1) the range of data being created at your institution?
2) user expectations / requirements on the repository
to help manage and preserve those data?
• What would you want to find out?
- what would your key questions be?
• How would you go about collecting information?
• How would you ensure participation?
www.data-audit.eu/
DAF at KeepIt Digital preservation tools for repositories, 19/01/10, Southampton
6. Relevance to this Course
• AIDA can…
– Measure your ability to manage digital content
effectively
– Show how good you are sustaining continued
access
– Be directly relevant to managing a repository
(access, sharing, and usage)
– Helps you find out where you are
– Help you decide what to do next
7.
8. Exercise
• Divide into four teams
• One element from each leg, relating to one activity
• Agree on the scope of what you will assess - work on a
single Institution (real or imaginary)
• Assess the capacity for this activity
• Expected results:
– A score for the element in each leg and at each level (6 scores in
all)
– Explain why you arrived at that decision
– Roles / job titles of people consulted
– Outline evidentiary sources that might help
9. Tools Module 2
• Keeping Research Data Safe (KRDS), Costs,
Policy, and Benefits in Long-term Digital
Preservation, Neil Beagrie, Charles
Beagrie Ltd consultancy
• LIFE3: Predicting Long Term Preservation
Costs, Brian Hole, The British Library
10. What was Produced?
• A cost framework consisting of:
– activity model in 3 parts: pre-archive, archive,
support services
– Key cost variables divided into economic
adjustments and service adjustments
– Resources template for Transparent Costing
(TRAC)
• 4 detailed case studies (ADS, Cambridge,
KCL, Southampton)
• Data from other services.
11. Benefits Framework
KRDS2 Benefits Taxonomy
Dimension 1(Type of Outcome)
Direct Indirect (costs avoided)
Dimension 2 (When)
Near-Term Benefits Long-term Benefits
Dimension 3 (Who)
Private Public
12. Group Exercise
• Agree a spokesperson and “recorder”
• Using KRDS2 Benefits Taxonomy:
– Q1 Identify which benefits can be costed?
– Q2 Select 3 Key benefits (include costed and
uncosted)
– Q3 Identify the information you might need for
measuring them
• Report back at 12.10 !
13. LIFE3
LIFE3: Estimating preservation costs
The LIFE3 Project:
Aim: To develop the ability to estimate preservation costs across the
digital lifecycle
The Project is developing:
A series of costing models for each stage and element of the
digital lifecycle
An easy to use costing tool
Support to enable easy input of data
Integration to facilitate use of the results
Content Profile
Cost Predicted
Organisational
Estimation Lifecycle
Profile
Tool Cost
Context
13
15. LIFE3
Exercise
Excel model
The Content Profile
Refining the calculations
Feedback
Do you feel that this approach is sound?
Have we included all relevant factors?
Is the model suitable for the kind of content your repository deals
with?
Are we making correct assumptions, and is it clear what these are?
How could we improve it?
15
16. Tools Module 3
• Significant characteristics, Stephen Grace
and Gareth Knight, King’s College London
• PREMIS, Open Provenance Model
17. Preservation workflow
Check Analyse Action
•Format Preservation planning • Migration
identification, Characterisation: • Emulation
versioning Significant properties and • Storage selection
• File validation technical characteristics,
• Virus check provenance, format, risk
• Bit checking and factors
checksum calculation
Risk analysis
Tools
Tools
e.g. DROID
Plato (Planets)
JHOVE
PRONOM (TNA)
FITS
P2 risk registry (KeepIt)
INFORM (U Illinois)
18. A group task on format risks
1. Choose two formats to compare (e.g. Word vs
PDF, Word vs ODF, PDF vs XML, TIFF vs JPEG)
2. By working through the (surviving) list of format
risks select a winner (or a draw) between your
chosen formats for each risk category (1 point for
win)
3. Total the scores to find an overall winning format
4. Suggest one reason why the winning format using
this method may not be the one you would
choose for your repository
19.
Select object type Identify purpose of Determine expected Classify behaviours Associate structure
Analyse structure Review & finalise
for analysis technical properties behaviours into functions with each function
Behaviour Structure
subject
Determine expected behaviours Message text
• What activities would a user – any type of Line break
stakeholder – perform when using an
email? Paragraph
• Draw upon list of property descriptions underline
performed in the previous step, formal strikethrough
standards and specifications, or other Body background
information sources.
Body text colour
In-reply-to
Task 2: references
Message-id
Identify the type of actions that a user Trace-route
would be able to perform using the Sender display-name
email (Groups. 15 mins). Sender local-part
Sender domain-part
• E.g. Establish name of person who sent Recipient display-
email name
• E.g. May want to confirm that email Recipient local-part
Recipient domain-
originated from stated source. part
19
20. Exercise overview
• Analyse the content of an email
• Analyse structure of email message
• Determine purpose that each technical
property performs
• Consider how email will be used by
stakeholders
• Identify set of expected behaviours
• Classify set of behaviours into functions for
recording
20
24. Some revision from KeepIt Module 3
• Preservation workflow
– Recognised we have digital objects with formats and other characteristics we
need to identify and record. These can change over time, or may need to be
changed pre-emptively depending on a risk assessment, using a preservation
action. Risk is subjective.
• Significant properties
– We considered which characteristics might be significant using the function-
behaviour-structure (FBS) framework, and classifying the functions of
formatted emails
– We recognised that assessment of behaviour, and so of significance, can vary
according to the viewpoint of the stakeholder – e.g. creator, user, archivist
• Documentation
– We looked at two means to document these characteristics, and the changes
over time
1. Broad and established (PREMIS)
2. Focussed, and work-in-progress (Open Provenance Model)
• Provenance in action: transmission and recording
– Through a simple game we learned that if we don’t recognise the necessary
properties at the outset, and maintain a record through all stages of
transmission, the information at the end of the chain will likely not be the
same as you started with
25. Tools Module 4
• Eprints preservation apps, including the
storage controller, Dave Tarrant and
Adam Field, University of Southampton
• Plato, preservation planning tool from the
Planets project, Andreas Rauber and
Hannes Kulovits, TU Wien
33. Preservation Planning with Plato
Plato
Assists in analyzing the collection
- Profiling, analysis of sample objects via Pronom and other services
Allows creation of objective tree
- Within application or via import of mindmaps
Allows the selection of Preservation action tools
34. Preservation Planning with Plato
Plato
Runs experiments and documents results
Allows definition of transformation rules, weightings
Performs evaluation, sensitivity analysis,
Provides recommendation (ranks solutions)
35. Exercise Time! The Scenario
National library
Scanned yearbooks archive
GIF images
The purpose of this plan is to find a strategy on how to preserve this
collection for the future, i.e. choose a tool to handle our collection
with.
The tool must be compatible with our existing hardware and
software infrastructure, to install it within our server and network
environment.
The files haven't been touched for several years now and no
detailed description exists. However, we have to ensure their
accessibility for the next years.
Re-scanning is not an option because of costs and some pages
from the original newspapers do not exist anymore.
38. Tools Module 5
• TRAC, Trusted Repository Audit and
Certification: criteria and checklist
• DRAMBORA, Digital Repository Audit
Method Based On Risk Assessment,
Martin Donnelly, Digital Curation Centre,
University of Edinburgh
39. … because good research needs good data
Trustworthy Repositories Audit & Certification (TRAC)
Criteria and Checklist
• RLG/NARA assembled an International Task Force to address the
issue of repository certification
• TRAC is a set of criteria applicable to a range of digital repositories
and archives, from academic institutional preservation repositories to
large data archives and from national libraries to third-party digital
archiving services
• Provides tools for the audit, assessment, and potential
certification of digital repositories
• Establishes audit documentation requirements required
• Delineates a process for certification
• Establishes appropriate methodologies for determining the
soundness and sustainability of digital repositories
www.data-audit.eu www.repositoryaudit.eu
DRAMBORA and DAF, EDINA, 27th October 2009
41. To certify or not to certify?
That is the question
1. Take a spreadsheet with
all 84 TRAC criteria.
2. Select one.
3. Decide whether you
could certify your
repository for this, based
on where your repository
is now or where you
think it might be after
participating in this
course. by Cayusa
by fabiux
42. … because good research needs good data
DRAMBORA Method
• Discrete phases of (self-)assessment, reflecting
the realities of audit
• Preservation is fundamentally a risk
management process:
• Define Scope
• Document Context and Classifiers
• Formalise Organisation
• Identify and Assess Risks
• Builds audit into internal repository management
procedures
www.repositoryaudit.eu
KeepIt #5: University of Northampton, 30 March 2010
43. … because good research needs good data
Repository Administration
www.repositoryaudit.eu
KeepIt #5: University of Northampton, 30 March 2010
44. … because good research needs good data
Part I – Identify a risk (30 minutes)
Each group should identify one risk (based on your own
experiences wherever possible), and complete the
DRAMBORA worksheet.
Groups should complete:
• name and description of the risk;
• example manifestations of the risk;
• nature of the risk;
• risk owner(s);
• stakeholders who would be affected;
• if possible, relationships with other risks.
www.repositoryaudit.eu
KeepIt #5: University of Northampton, 30 March 2010
45. … because good research needs good data
Part II – Mitigate the risk (30 minutes)
Now identify what steps your archive might take to
manage and mitigate the identified risk over time…
Each group should complete:
• Risk management strategy/-ies;
• Risk management activities;
• Risk management activity owner(s).
www.repositoryaudit.eu
KeepIt #5: University of Northampton, 30 March 2010