1. Preservation metadata
Andrew Waugh
Senior Manager, Standards and Policy
Public Record Office Victoria
2. Structure of the talk
• What is preservation metadata?
• Recordkeeping metadata in theory
• NAA/ANZ recordkeeping metadata standard
• PREMIS – standard for preservation metadata
• Practical reading and implementing tips
• Conclusions
3. What is preservation?
• The ability to be able to access content for as
long as it is required
• Access means
– Being able to find the content
– Extract information from the content
– Understand the context of the content
– Be confident of the history of the content
4. Preservation metadata
• Preservation metadata is the information
necessary to maintain access to content
• Difference between short and long term
access is one of degree of metadata, not kind
• As preservation professionals, we are rarely
interested in the content, just managing it.
Preservation metadata is the basic information
that we use to do our job
5. Examples of preservation
metadata
• Identifier
• Creation date
• Title
• History information
• Relationship between objects
• Data formats
6. Recordkeeping Metadata
• The archival profession has been developing
recordkeeping (=preservation) metadata for
around a decade
• This work provides a useful framework to think
about preservation and metadata
7. RK Metadata Standards
• ISO 20381 Information and documentation –
Records management processes – Metadata for
records
– Part 1: Principles
– Part 2: Conceptual and implementation issues
• National Archives of Australia (and Archives New
Zealand) - Recordkeeping Metadata Standard
Version 2.0
– http://www.naa.gov.au/Images/AGRkMS_Final%20Edit_16
%2007%2008_Revised_tcm2-12630.pdf
• Forthcoming Australian/New Zealand Standard
8. Metadata from a records view
• Records are content, context, and structure
• Record management metadata is data
describing the context, content, and structure
of records and their management through time
(ISO 15489-1:2001, 3.12)
• Recordkeeping metadata is the key to
providing access (and hence preservation)
• In practice, metadata is everything except the
actual content of the record
9. Purpose of recordkeeping
metadata
• The purpose of recordkeeping metadata includes
– Protecting records as evidence
– Ensuring their accessibility and usability through time
– Facilitating the ability to understand records
– Helping ensure the authenticity, reliability and integrity of
records
– Supporting and managing access, privacy, and rights
– Supporting the migration of records from one
(preservation) system to another
10. Metadata at record capture
• Records are captured into a system, and
metadata is created/captured with them
• This metadata documents
– Environment in which records were created
– Purpose or business activity being undertaken
– Relationship with other records or aggregations
– Physical or technical structure of the record
– Logical structure of the record
11. Metadata after record capture
• Metadata captured after record creation
documents what happened to a record over
time
– demonstrates authenticity, reliability, usability, and
integrity)
• Answers the basic questions of who, what,
when, where, why
12. Metadata after disposal
• Metadata is a record itself, and some parts
may need to be kept after the record has been
disposed of to account for their existence,
management, and disposition
13. Four entity model
• Modern Australian recordkeeping metadata
models normally are expressed in terms of
entities
– Records (the objects to be preserved: record, file,
series…)
– Agents (people who create and use the records)
– The business transacted
– Mandates (the rules governing the business)
15. One, two, three, four entity models
• The four entity model can be flattened to
facilitate implementation
– A system could only store one entity (record)
which contains metadata for agents, business,
and mandates
– Practical because most metadata is captured at
creation, subsequent changes in relationships or
information less relevant
17. Identity metadata
• Distinguishes entity from all other entities in
the domain
– Entity type (e.g. record, agent)
– Aggregation (e.g. file, record)
– Registration Identifier (the actual identifier)
18. Description metadata
• Describes the entity to allow determination if
this is the entity sought
– Title
– Classification
– Abstract
– Place
– External Identifiers
• WARNING – description elements are
normally business specific
19. Use metadata
• Assists long-term access to the entity
– Technical environment
– Rights (who may legal use it & under what
conditions)
– Access (access control)
– Language
– Integrity
– Documentary form
20. Event plan
• Allows the entity to be managed
• Consists of management actions that are
planned to occur in the future
– Appraisal (To keep or not)
– Disposal (Implementation of appraisal decision)
– Preservation
– Access Control (Changes to)
– Rights (Changes to)
21. Event history
• Documents the trail of past events
• Who, what, when, why
– Event identifier
– Event date/time
– Event type
– Event description
– Event relation (mandate, agent)
22. Relation
• Links two (or more) entities
• Implicitly bi-directional, but need not be
implemented this way
• Relationships often have a time span
– Entity Identifiers (from, to)
– Relationship type
– Relationship description
– Relationship date range
23. NAA/ANZ metadata standard
• Same content, two standards
• NAA version
– Recordkeeping Metadata Standard Version 2.0
– http://www.naa.gov.au/Images/AGRkMS_Final%2
0Edit_16%2007%2008_Revised_tcm2-12630.pdf
– Based on five entities (Record, Agent, Business,
Mandate, Relationship)
– Defines 26 elements with 44 sub-elements
– Includes extensive element schemes
24. NAA/ANZ Elements
All Entities
Entity Type Mandatory Element
Category Conditional Element
Identifier* Optional Element
Name*
Date Range
Description
Record Agent Business Mandate Relationshp
Jurisdiction* Jurisdiction* Jurisdiction* Jurisdiction* Related Entity*
Security Class* Permissions* Security Class* Security Class* Change History*
Security Caveat* Contact* Permissions* Security Caveat*
Rights* Position* Coverage*
Language* Language*
Coverage*
Keyword*
Disposal*
Format
Extent*
Medium
Integrity Check
Location*
Document Form
Precedence
25. Future Australian Standard
• Work is in progress on an Australian Standard
for recordkeeping metadata
• Based on the NAA/ANZ metadata standard
• Focus on relationships
26. PREMIS
• Preservation metadata is the information a
respository uses to support the digital preseration
process
• Supports the viability, renderability,
understandability, authenticity, and identity of digital
objects
• Built on OAIS reference model
• Data dictionary & supporting materials
– http://www.loc.gov/standards/premis/
27. PREMIS scope
• Not intended to define all preservations elements,
only those that most repositories are likely to need to
know in order to support digital preservation
• Excludes
– Format specific metadata (even for a class of format)
– Repository specific metadata and business rules
– Descriptive metadata
– Detailed information about media or hardware
– Information about agents, apart from minimum required for
identification
– Information about rights and permissions, except those
that directly affect preservation functions
28. PREMIS Data Model
• From Understanding PREMIS http://www.loc.gov/standards/premis/understanding-premis.pdf
29. PREMIS Entities
• Intellectual Entity – set of content that is a single
intellectual unit – has no metadata in PREMIS
• Object Entity – things actually stored in a repository
– Representation Object – collection of all file objects
necessary to represent an intellectual entity
– File Object – discrete object on a computer file system
– Bitstream Object – portion of a file
• Event Entity – contains the history of an Object
• Rights Entity – rights and permissions about object
• Agent Entity – actors involved in events or rights
30. Elements for Object Entities
• Unique Identifier • Significant properties
• Fixity information (aspects that must be
• Size preserved)
• Environment
• Format
(infrastructure required
• Original Name to use)
• Creators • Storage media
• Inhibitors (things • Digital signatures
designed to prevent
use) • Relationship with other
entities
31. NAA/ANZ vs PREMIS
• NAA/ANZ • PREMIS
– Recordkeeping is about – Deliberately focuses on
relationships, so includes preserving the files that
the context of objects form a digital object –
which is often necessary context is important, but
to understand the object not documented
– Documents the – Documents critical
management plan for the information necessary to
object use objects
33. General observations
• Most metadata schemes are lengthy, but
contain relatively little information
• If you understand the typical structure, it is
easy to quickly pick out the information you
need
• Metadata schemes tend to be aspirational –
what the drafters thought you should do, often
beyond what can do or have to do
34. Metadata schemes
• Typical metadata schemes contain
– Entities (i.e. objects modelled)
• Definition
• Lists valid elements
– Elements (i.e. specific pieces of information)
• Definition
• Mandatory, optional, conditional flag
• Repeatable or not
• Structure (child elements)
– Element schemas (i.e. controls over the values that can be
used)
• Lists of valid values (e.g. States)
• Format controls (e.g. dates)
35. Implementation
• Metadata schemes are information models, not
implementation instructions
• Adopting a scheme means that your implementation
has the
– mandatory elements
– conditional elements (if relevant)
– (perhaps) some of the optional elements
– The element structure is correct
• Metadata schemes are often associated with a
representation standard (e.g. in XML)
– Still not an implementation – often just for exchange
36. Conclusions
• Preservation metadata is simply the
information that preservation professionals
use to ensure continued access to objects
• What is viewed as essential depends on your
discipline (what features is it necessary to
preserve?)
– E.g. archivists are concerned about context,
librarians less so
37. Conclusions (2)
• Typical preservation • Other common
metadata metadata
– Identity information – Description
– Technical details and – Management Plans
organisation of the – Relationships between
objects to be preserved objects
– Rights and access
– History of object
38. Conclusions (3)
• You only have to implement the logical model
and the mandatory elements
• Standards are usually aspirational – include
metadata that is nice to have, but not essential
• Specific representations (e.g. XML) are for
data exchange, not how you must implement
them internally