SlideShare una empresa de Scribd logo
1 de 48
The Adventures of Digi:
Ideas, Requirements
and Reality
       David Pearson
      National Library of
           Australia
     Future Perfect 2012

                                       Digi
                            By Imogene Pearson (7 years)
                                   (March 2012)
1.) Some Context




                              Digi
                   By Imogene Pearson (7 years)
                          (March 2012)
From a preservation point of view, the Library’s digital collections present:

•   A mix of materials needing to be kept in perpetuity, along with materials that can be
    discarded after specified periods or events;
•   Mixed levels of complexity in terms of object structure, relationships and dependencies;
•   Mixed levels of intellectual control;
•   A wide range of file formats (and carrier formats);
•   Different levels of complexity in preservation planning and processing;
•   Different timetables for preservation action;
•   A need for different preservation approaches, often at different scales; and
•   A need for recurring – and possibly changing - preservation action cycles over time, using a
    changing suite of tools.
NLA Image
2.) A caveat




               NAA Image
Ecology
Ecology or Layers of consciousness for the need for digital preservation intervention
                         (Given some need to access content over time)
 Unaware:
• I am unaware if I have any digital content; or
• I am unaware if I may have a problem accessing any of my digital content.

Aware - no response:
• I don’t think that I have a problem accessing any of my digital content;
• I recognise that I have a problem accessing some of my digital content;
• I recognise that I have a problem accessing some of my digital content. However, the
   problem is not my problem; or
• I recognise that I have a problem, but have no response in place - not even a limited one.

Aware – taking some action:
• I accept that I may have a problem accessing some of my digital content. I am taking limited
   actions to manage this problem; or
• I accept that I may have a problem accessing some of my digital content. The preservation
   mandate is a part of my enterprise or system ecology.
Another way of looking at it might be:




                                         David Pearson 2012
3.) What we have come to understand over
  time.




                   http://www.motifake.com/79532 via Google Images
Preservation responsibilities:

Preservation of the Library's digital collections involves three main goals:
• Maintaining access to reliable data at bit-stream level;
• Maintaining access to content encoded in the bit streams; and
• Maintaining access to the intended and available meaning of the content.

While specific preservation activities may focus on one or more of these goals, the Library’s
   preservation responsibility is only fulfilled when all three goals have been adequately
   addressed.

This responsibility applies across all digital collections, subject to curatorial and policy decisions
    for specific groups of digital objects.
Mission: The primary objective of preservation activities within the NLA is to maintain the
ability to meaningfully access digital collection content over time.
              ‘Logical on                                     ‘Logical on
                Physical                                        Physical
                 Stuff’                                          Stuff’


A                                                                                            B

                                             Contextual                              Dependency
                                        Information – About                      Information – About
           time                               Content                                Formats etc.

                                                           Systems to Ingest,
                                                          Manage, Report and
                                                              take Actions

                                   time
                                                          Systems to Access –
                                                          Master or Derivative


             ‘Stuffed?’                                                                 David Pearson 2012
                   Google Images
Required preservation processes

The Library must be able to:
• Understand what it holds in its collections;
• Understand what its preservation intentions are for every digital object and what it is
    entitled to do to realise its intentions;
• Understand what is required to provide access, existing inhibitors to access, and the current
    level of support the Library is able to provide;
• Evaluate and monitor the degree of risk arising from collection composition, preservation
    intentions and available level of support within the Library for digital collection content, and
    monitor for risk conditions arising during general Digital Library operations;
• Anticipate the effects of changes in support;
• Recognise planning triggers, and plan and take appropriate action on a scale appropriate to
    the size of the target; and
• Audit the effectiveness of its preservation arrangements and modify the arrangements if
    necessary.
Risk or ‘Risk-on’ (are you a splitter or a lumper?)

•   ‘parameter-based’ risks: a match against a criterion defined by Library staff to indicate a
    preservation risk – for example, video encoded with a codec considered to be problematic;
•   ‘exception’ risks: the value of a monitored parameter is outside a set of acceptable values;
•   ‘change’ risks: there has been a change in status for a monitored parameter for content – for
    example, the confidence in format identification for a particular file has changed;
•   ‘conflict’ risks: conflicting values for the parameter are reported by one or more tools – for
    example, file format identification returns conflicting values;
•   ‘unknown value’ risks: undetermined values for defined parameters – for example,
    undetermined values for file format and version; and
•   ‘access support’ risks: changes in level of support which affect the Library’s ability to access
    to content in accordance with preservation intent and significance – for example, reduction
    below an acceptable threshold in the availability of supporting software for a particular file
    format.
•   ‘content-based’ risks: characteristics of content that may not be identifiable from metadata –
    for example, presence of deprecated HTML tags.
Likely preservation treatment actions

Broad preservation action approaches that are likely to be required will include:

•   Format migration at the point of collecting;
•   Format migration on recognition of risks;
•   Format migration at the point of delivery;
•   Emulation of various levels of software and hardware environments;
•   Maintenance or supply of appropriate software or hardware;
•   Documenting known problems for which no other action can be taken; and
•   Deaccessioning or deletion.
Prioritising Preservation Treatment:

The Library expects to take into account indicators of ‘preservation intent’, ‘significance’, and
    ‘level of support’ within monitoring and reporting activities, and in evaluations of risk and
    prioritisation for preservation planning and action.




                         http://callmemilo.deviantart.com/art/Thunderbirds-are-GO-20717927
Preservation intent – indicates the expectations for preservation for content:

•   whether content is to be preserved;
•   who is responsible for preservation of the content;
•   the period over which content must be preserved;
•   the required level of support for access to the content over time; for example, that the
    Library intends to actively maintain the ability to both present and modify content, or only to
    present content, or does not intend to actively maintain access to content beyond its
    expected useful life.

•   Preservation intent may also extend to include more specific characteristics to be supported,
    based on curatorial input or constraints imposed by rights policies or agreements with rights
    holders.
Significance – indicates the relative priority required for taking preservation action to maintain
access to content, as determined by collection curators; for example, content rated as highly
significant would be prioritised for preservation planning and action before content of lower
significance.

Level of support – indicates how well a digital collection object is supported within the Library,
based on a combination of how much is known about the object and its components (including
their file formats), and the degree to which supporting software or hardware environments are
available.




                                                                                  NLA Image
4.) This got us thinking




                           Colin Webb 2009
Which turned into this




                         NLA 2011
Preservation assessment and reporting

The Library must be able to review the composition and characteristics of its digital collections to
    assess trends that may affect preservation management, to aid setup of preservation
    monitoring, planning and action, and to report on specific aspects of content when
    necessary.

A solution must enable staff to define and request, on both an ad hoc and scheduled basis:

•   summary reports of content, metadata characteristics and risks across collections or defined
    sets of managed content;

•   detailed metadata reports for individual items or sets of items; and

•   audit trail history reports for individual items or sets of items.
Reference knowledgebases (General)
Enable staff to create, update and maintain reference information
    knowledgebases on:
•   File formats and versions
•   Software and hardware components that support access to
    file formats and versions, for maintaining access to managed
    content; and
•   The level of support available for particular file formats and
    versions:
      – i. sets of software or hardware components available to
           support access to formats;
      – ii. functions supported, both for providing access to
           content and for use in preservation action – for example,
           presentation, modification, batch processing;
      – iii. fidelity of support – how well functions are
           supported; and
      – iv. known risks, including potential inhibitors to
           preservation, associated with formats or supporting
           software or hardware.
•   Preservation intent descriptions and parameters for sets of
    content.
Other systems are also required to interrelate in this
ecosystem such as:

•Preservation monitoring, reporting and prioritisation

•Preservation options and preservation action planning

•Preservation action evaluation
5.) Pres Intent (current NLA prototype)




                           David Pearson 2012
NLA 2011
Collections

Preservation Intent - Asian Collections and Overseas Collections Management — Version 1.0
Preservation Intent - Australian Books and Serials — Version 1.0
Preservation Intent - Dance — Version 1.0
Preservation Intent - Manuscripts — Draft
Preservation Intent - Maps — Version 1.0
Preservation Intent - Music — Draft
Preservation Intent - Newsapaper Digitisation — Version 1.0
Preservation Intent - Oral History — Unknown
Preservation Intent - Pictures — Version 1.0
Preservation Intent - Selective Web Harvesting — Version 1.0
Preservation Intent - Web Domain Harvests — Version 1.0
An attempt to systematise Pres Intent (requires some additional thinking)
This is how collections thought about it.
This is how we tended to think about it (a job for a new system).
6.) Info on Formats, software and level of support (some
prototyping)




                                          =




                                                           NLA 2011
7.) Level of support and Prioritisation




                                          NLA 2011
Level of support (an early concept model)




                                            DP 2011
Prioritising preservation treatment based on level of support

In evaluations of risk and prioritisation for preservation planning and action, we must take into
    the Level of Support/Access Risks and:

•   Any constraints imposed by rights policies or agreements; and
•   The amount of resources available.

Based on these factors, the Library (Management, Collections and Digi Pres) should be able to
    prioritise material to be preserved.
8.) Preservation actions and options generation




                                       NLA 2011
Options for preservation actions

We would like to be able to enable staff to:

•define types of preservation actions for use within preservation planning and evaluation.

•update and delete reference information on options for preservation action, both in general and
for particular formats or format types.

•link to information able from the software KB which provides information on what actions
specific software might be useful for and the proximity of the software to the format.

•Link to other linked data sources.
Pres action options generation
The Library must be able to test and evaluate preservation action plans to determine if they
satisfactorily achieve the preservation intent for managed content. For example, a solution
should:
•enable staff to develop and test executable preservation action plans for sets of managed
content. Including:
     –   Single and multiple step actions (combining manual and automated workflows)
     –   Replacing files/s and linkages in complex objects
     –   Linking to a specific emulation environment (if available)
     –   Replacing access software
     –   Specifying that no action is required


•Support simulations or testing of preservation actions against a content Testbed. For example,
enable staff to perform 'what if' simulations to determine impact of changes to availability of
support for access, including:
     –   a. Removal of software or hardware sets supporting access, to assess risks or impacts on access; and
     –   b. Addition or revision of software or hardware sets supporting access, to assess proposed remedial preservation
         action plans.


•enable staff to define quality assurance criteria for preservation action plan outcomes
9.) Preservation Options Evaluation




                                      NLA 2011
Preservation options evaluation

•   support import and integration of preservation-treated content and metadata, from either
    internal or external processes, including:
      – a. Verifying that preservation-treated digital content conforms to acceptance criteria for
         preservation outcomes for designated sets of digital content;
      – b. Enabling staff to quality assure and approve preservation-treated digital content for
         incorporation into the collection; and
      – c. After approval, send to preservation action scheduler for treatment of file/s,
         metadata and associated relationships.

•   support ‘rollback’ of updated versions of content, metadata and associated relationships to
    restore previous versions, if necessary.

•   enable staff to define and approve acceptance criteria for preservation action outcomes for
    sets of managed content.
10.) So what!

Currently, these ideas and requirements
    have become ‘partially real’. They still
    need to be implemented.

They formed the basis for the preservation
   requirements in a subsequent:
•      RFP (Request for Proposal) process;
   and
•      RFT (Request for Tender) process.

                                      http://www.wildsound-filmmaking-feedback-events.com/images/austin_powers_dr_evil.jpg
RFP
So all of these ideas where consolidated as
    requirements for a Request for Proposal which
    went to the market in July 2011.

A number of responses were received for:
• Core systems
• Preservation
• Digitisation
• Other Workflows                                   http://www.melbournesumos.com.au/pics/twister/Twister078.jpg



These were evaluated and some of the vendors
   were invited to participate in the next stage.
RFT
Based on the RFP, the NLA clarified the
    requirements for the next process.

A select group from the RFP process were
    invited to participated in a Request for
    Tender in which closed in late December
    2011.

                                               http://simpro.co/wp-content/uploads/2010/10/paperwork2.jpg
What version of reality
have we decided upon?




                                   Everything, for Everyone
                                           Forever




               Digi
    By Imogene Pearson (7 years)
           (March 2012)
                                     http://www.flickr.com/photos/ricksmit/15671245/

Más contenido relacionado

Similar a Dave Pearson The Adventures of Digi

Digital Preservation Planning: Just Do It!
Digital Preservation Planning: Just Do It!Digital Preservation Planning: Just Do It!
Digital Preservation Planning: Just Do It!valariek
 
Digital Preservation for DAMs
Digital Preservation for DAMsDigital Preservation for DAMs
Digital Preservation for DAMsEmily Kolvitz
 
Preserving repository content: practical steps for repository managers by Mig...
Preserving repository content: practical steps for repository managers by Mig...Preserving repository content: practical steps for repository managers by Mig...
Preserving repository content: practical steps for repository managers by Mig...JISC KeepIt project
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationGarethKnight
 
What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?
What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?
What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?National Library of Australia
 
Digital Preservation Policy at the Library of Congress
Digital Preservation Policy at the Library of CongressDigital Preservation Policy at the Library of Congress
Digital Preservation Policy at the Library of CongressChelcie Rowell
 
Preservation for 21st Century Library Collections
Preservation for 21st Century Library CollectionsPreservation for 21st Century Library Collections
Preservation for 21st Century Library CollectionsJolo Van Clyde Abatayo
 
ArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolMark Matienzo
 
Introduction to Digital Preservation
Introduction to Digital PreservationIntroduction to Digital Preservation
Introduction to Digital PreservationBill LeFurgy
 
NCompass Live: Digital Preservation, Part 1: Inventory and Selection
NCompass Live: Digital Preservation, Part 1: Inventory and SelectionNCompass Live: Digital Preservation, Part 1: Inventory and Selection
NCompass Live: Digital Preservation, Part 1: Inventory and SelectionNebraska Library Commission
 
Preservation planning
Preservation planningPreservation planning
Preservation planningSarah Jones
 
Digital library
Digital libraryDigital library
Digital libraryanueldhose
 
Building Digital Collections: Managing and Sharing
Building Digital Collections: Managing and SharingBuilding Digital Collections: Managing and Sharing
Building Digital Collections: Managing and SharingRecollection Wisconsin
 
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)dri_ireland
 
Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...
Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...
Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...Millie Gonzalez
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptxSoniaDevi15
 
NCompass Live: Digital Preservation, Part 3: Management and Providing Access
NCompass Live: Digital Preservation, Part 3: Management and Providing AccessNCompass Live: Digital Preservation, Part 3: Management and Providing Access
NCompass Live: Digital Preservation, Part 3: Management and Providing AccessNebraska Library Commission
 
What goes where? Bringing a new repository online at the Ohio State Universit...
What goes where? Bringing a new repository online at the Ohio State Universit...What goes where? Bringing a new repository online at the Ohio State Universit...
What goes where? Bringing a new repository online at the Ohio State Universit...Emily Frieda Shaw
 

Similar a Dave Pearson The Adventures of Digi (20)

Digital Preservation Planning: Just Do It!
Digital Preservation Planning: Just Do It!Digital Preservation Planning: Just Do It!
Digital Preservation Planning: Just Do It!
 
Digital Preservation for DAMs
Digital Preservation for DAMsDigital Preservation for DAMs
Digital Preservation for DAMs
 
Preserving repository content: practical steps for repository managers by Mig...
Preserving repository content: practical steps for repository managers by Mig...Preserving repository content: practical steps for repository managers by Mig...
Preserving repository content: practical steps for repository managers by Mig...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
Fundamental concepts in digital preservation
Fundamental concepts in digital preservation Fundamental concepts in digital preservation
Fundamental concepts in digital preservation
 
What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?
What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?
What are some of the things that the ‘Feds’ are doing about Digital ‘Stuff’?
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Digital Preservation Policy at the Library of Congress
Digital Preservation Policy at the Library of CongressDigital Preservation Policy at the Library of Congress
Digital Preservation Policy at the Library of Congress
 
Preservation for 21st Century Library Collections
Preservation for 21st Century Library CollectionsPreservation for 21st Century Library Collections
Preservation for 21st Century Library Collections
 
ArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management ToolArchivesSpace: Building a Next-Generation Archives Management Tool
ArchivesSpace: Building a Next-Generation Archives Management Tool
 
Introduction to Digital Preservation
Introduction to Digital PreservationIntroduction to Digital Preservation
Introduction to Digital Preservation
 
NCompass Live: Digital Preservation, Part 1: Inventory and Selection
NCompass Live: Digital Preservation, Part 1: Inventory and SelectionNCompass Live: Digital Preservation, Part 1: Inventory and Selection
NCompass Live: Digital Preservation, Part 1: Inventory and Selection
 
Preservation planning
Preservation planningPreservation planning
Preservation planning
 
Digital library
Digital libraryDigital library
Digital library
 
Building Digital Collections: Managing and Sharing
Building Digital Collections: Managing and SharingBuilding Digital Collections: Managing and Sharing
Building Digital Collections: Managing and Sharing
 
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
Rebecca Grant - Archiving and Digital Preservation (Figshare Fest)
 
Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...
Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...
Reflections of a Digital Steward: Recommendations for Scholarship and Preserv...
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
 
NCompass Live: Digital Preservation, Part 3: Management and Providing Access
NCompass Live: Digital Preservation, Part 3: Management and Providing AccessNCompass Live: Digital Preservation, Part 3: Management and Providing Access
NCompass Live: Digital Preservation, Part 3: Management and Providing Access
 
What goes where? Bringing a new repository online at the Ohio State Universit...
What goes where? Bringing a new repository online at the Ohio State Universit...What goes where? Bringing a new repository online at the Ohio State Universit...
What goes where? Bringing a new repository online at the Ohio State Universit...
 

Más de Future Perfect 2012

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paperFuture Perfect 2012
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paperFuture Perfect 2012
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Future Perfect 2012
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryFuture Perfect 2012
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchFuture Perfect 2012
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemFuture Perfect 2012
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineFuture Perfect 2012
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationFuture Perfect 2012
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessFuture Perfect 2012
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesFuture Perfect 2012
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsFuture Perfect 2012
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveFuture Perfect 2012
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingFuture Perfect 2012
 

Más de Future Perfect 2012 (20)

Working Across Organizations white paper
Working Across Organizations white paperWorking Across Organizations white paper
Working Across Organizations white paper
 
Ensuring Data Integrity white paper
Ensuring Data Integrity white paperEnsuring Data Integrity white paper
Ensuring Data Integrity white paper
 
Bigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie LeanBigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie Lean
 
Steve Knight by Design
Steve Knight by DesignSteve Knight by Design
Steve Knight by Design
 
Michael Parsons Passion
Michael Parsons PassionMichael Parsons Passion
Michael Parsons Passion
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage Library
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake Research
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation Ecosystem
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right Combination
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for Success
 
Andrew Waugh Business Systems
Andrew Waugh Business SystemsAndrew Waugh Business Systems
Andrew Waugh Business Systems
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in Databases
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying Formats
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation Perspective
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud Computing
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Dave Pearson The Adventures of Digi

  • 1. The Adventures of Digi: Ideas, Requirements and Reality David Pearson National Library of Australia Future Perfect 2012 Digi By Imogene Pearson (7 years) (March 2012)
  • 2. 1.) Some Context Digi By Imogene Pearson (7 years) (March 2012)
  • 3. From a preservation point of view, the Library’s digital collections present: • A mix of materials needing to be kept in perpetuity, along with materials that can be discarded after specified periods or events; • Mixed levels of complexity in terms of object structure, relationships and dependencies; • Mixed levels of intellectual control; • A wide range of file formats (and carrier formats); • Different levels of complexity in preservation planning and processing; • Different timetables for preservation action; • A need for different preservation approaches, often at different scales; and • A need for recurring – and possibly changing - preservation action cycles over time, using a changing suite of tools.
  • 5. 2.) A caveat NAA Image
  • 6. Ecology Ecology or Layers of consciousness for the need for digital preservation intervention (Given some need to access content over time) Unaware: • I am unaware if I have any digital content; or • I am unaware if I may have a problem accessing any of my digital content. Aware - no response: • I don’t think that I have a problem accessing any of my digital content; • I recognise that I have a problem accessing some of my digital content; • I recognise that I have a problem accessing some of my digital content. However, the problem is not my problem; or • I recognise that I have a problem, but have no response in place - not even a limited one. Aware – taking some action: • I accept that I may have a problem accessing some of my digital content. I am taking limited actions to manage this problem; or • I accept that I may have a problem accessing some of my digital content. The preservation mandate is a part of my enterprise or system ecology.
  • 7. Another way of looking at it might be: David Pearson 2012
  • 8. 3.) What we have come to understand over time. http://www.motifake.com/79532 via Google Images
  • 9. Preservation responsibilities: Preservation of the Library's digital collections involves three main goals: • Maintaining access to reliable data at bit-stream level; • Maintaining access to content encoded in the bit streams; and • Maintaining access to the intended and available meaning of the content. While specific preservation activities may focus on one or more of these goals, the Library’s preservation responsibility is only fulfilled when all three goals have been adequately addressed. This responsibility applies across all digital collections, subject to curatorial and policy decisions for specific groups of digital objects.
  • 10. Mission: The primary objective of preservation activities within the NLA is to maintain the ability to meaningfully access digital collection content over time. ‘Logical on ‘Logical on Physical Physical Stuff’ Stuff’ A B Contextual Dependency Information – About Information – About time Content Formats etc. Systems to Ingest, Manage, Report and take Actions time Systems to Access – Master or Derivative ‘Stuffed?’ David Pearson 2012 Google Images
  • 11. Required preservation processes The Library must be able to: • Understand what it holds in its collections; • Understand what its preservation intentions are for every digital object and what it is entitled to do to realise its intentions; • Understand what is required to provide access, existing inhibitors to access, and the current level of support the Library is able to provide; • Evaluate and monitor the degree of risk arising from collection composition, preservation intentions and available level of support within the Library for digital collection content, and monitor for risk conditions arising during general Digital Library operations; • Anticipate the effects of changes in support; • Recognise planning triggers, and plan and take appropriate action on a scale appropriate to the size of the target; and • Audit the effectiveness of its preservation arrangements and modify the arrangements if necessary.
  • 12. Risk or ‘Risk-on’ (are you a splitter or a lumper?) • ‘parameter-based’ risks: a match against a criterion defined by Library staff to indicate a preservation risk – for example, video encoded with a codec considered to be problematic; • ‘exception’ risks: the value of a monitored parameter is outside a set of acceptable values; • ‘change’ risks: there has been a change in status for a monitored parameter for content – for example, the confidence in format identification for a particular file has changed; • ‘conflict’ risks: conflicting values for the parameter are reported by one or more tools – for example, file format identification returns conflicting values; • ‘unknown value’ risks: undetermined values for defined parameters – for example, undetermined values for file format and version; and • ‘access support’ risks: changes in level of support which affect the Library’s ability to access to content in accordance with preservation intent and significance – for example, reduction below an acceptable threshold in the availability of supporting software for a particular file format. • ‘content-based’ risks: characteristics of content that may not be identifiable from metadata – for example, presence of deprecated HTML tags.
  • 13. Likely preservation treatment actions Broad preservation action approaches that are likely to be required will include: • Format migration at the point of collecting; • Format migration on recognition of risks; • Format migration at the point of delivery; • Emulation of various levels of software and hardware environments; • Maintenance or supply of appropriate software or hardware; • Documenting known problems for which no other action can be taken; and • Deaccessioning or deletion.
  • 14. Prioritising Preservation Treatment: The Library expects to take into account indicators of ‘preservation intent’, ‘significance’, and ‘level of support’ within monitoring and reporting activities, and in evaluations of risk and prioritisation for preservation planning and action. http://callmemilo.deviantart.com/art/Thunderbirds-are-GO-20717927
  • 15. Preservation intent – indicates the expectations for preservation for content: • whether content is to be preserved; • who is responsible for preservation of the content; • the period over which content must be preserved; • the required level of support for access to the content over time; for example, that the Library intends to actively maintain the ability to both present and modify content, or only to present content, or does not intend to actively maintain access to content beyond its expected useful life. • Preservation intent may also extend to include more specific characteristics to be supported, based on curatorial input or constraints imposed by rights policies or agreements with rights holders.
  • 16. Significance – indicates the relative priority required for taking preservation action to maintain access to content, as determined by collection curators; for example, content rated as highly significant would be prioritised for preservation planning and action before content of lower significance. Level of support – indicates how well a digital collection object is supported within the Library, based on a combination of how much is known about the object and its components (including their file formats), and the degree to which supporting software or hardware environments are available. NLA Image
  • 17. 4.) This got us thinking Colin Webb 2009
  • 18. Which turned into this NLA 2011
  • 19. Preservation assessment and reporting The Library must be able to review the composition and characteristics of its digital collections to assess trends that may affect preservation management, to aid setup of preservation monitoring, planning and action, and to report on specific aspects of content when necessary. A solution must enable staff to define and request, on both an ad hoc and scheduled basis: • summary reports of content, metadata characteristics and risks across collections or defined sets of managed content; • detailed metadata reports for individual items or sets of items; and • audit trail history reports for individual items or sets of items.
  • 20. Reference knowledgebases (General) Enable staff to create, update and maintain reference information knowledgebases on: • File formats and versions • Software and hardware components that support access to file formats and versions, for maintaining access to managed content; and • The level of support available for particular file formats and versions: – i. sets of software or hardware components available to support access to formats; – ii. functions supported, both for providing access to content and for use in preservation action – for example, presentation, modification, batch processing; – iii. fidelity of support – how well functions are supported; and – iv. known risks, including potential inhibitors to preservation, associated with formats or supporting software or hardware. • Preservation intent descriptions and parameters for sets of content.
  • 21. Other systems are also required to interrelate in this ecosystem such as: •Preservation monitoring, reporting and prioritisation •Preservation options and preservation action planning •Preservation action evaluation
  • 22. 5.) Pres Intent (current NLA prototype) David Pearson 2012
  • 24. Collections Preservation Intent - Asian Collections and Overseas Collections Management — Version 1.0 Preservation Intent - Australian Books and Serials — Version 1.0 Preservation Intent - Dance — Version 1.0 Preservation Intent - Manuscripts — Draft Preservation Intent - Maps — Version 1.0 Preservation Intent - Music — Draft Preservation Intent - Newsapaper Digitisation — Version 1.0 Preservation Intent - Oral History — Unknown Preservation Intent - Pictures — Version 1.0 Preservation Intent - Selective Web Harvesting — Version 1.0 Preservation Intent - Web Domain Harvests — Version 1.0
  • 25.
  • 26. An attempt to systematise Pres Intent (requires some additional thinking)
  • 27. This is how collections thought about it.
  • 28. This is how we tended to think about it (a job for a new system).
  • 29. 6.) Info on Formats, software and level of support (some prototyping) = NLA 2011
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36. 7.) Level of support and Prioritisation NLA 2011
  • 37. Level of support (an early concept model) DP 2011
  • 38.
  • 39. Prioritising preservation treatment based on level of support In evaluations of risk and prioritisation for preservation planning and action, we must take into the Level of Support/Access Risks and: • Any constraints imposed by rights policies or agreements; and • The amount of resources available. Based on these factors, the Library (Management, Collections and Digi Pres) should be able to prioritise material to be preserved.
  • 40. 8.) Preservation actions and options generation NLA 2011
  • 41. Options for preservation actions We would like to be able to enable staff to: •define types of preservation actions for use within preservation planning and evaluation. •update and delete reference information on options for preservation action, both in general and for particular formats or format types. •link to information able from the software KB which provides information on what actions specific software might be useful for and the proximity of the software to the format. •Link to other linked data sources.
  • 42. Pres action options generation The Library must be able to test and evaluate preservation action plans to determine if they satisfactorily achieve the preservation intent for managed content. For example, a solution should: •enable staff to develop and test executable preservation action plans for sets of managed content. Including: – Single and multiple step actions (combining manual and automated workflows) – Replacing files/s and linkages in complex objects – Linking to a specific emulation environment (if available) – Replacing access software – Specifying that no action is required •Support simulations or testing of preservation actions against a content Testbed. For example, enable staff to perform 'what if' simulations to determine impact of changes to availability of support for access, including: – a. Removal of software or hardware sets supporting access, to assess risks or impacts on access; and – b. Addition or revision of software or hardware sets supporting access, to assess proposed remedial preservation action plans. •enable staff to define quality assurance criteria for preservation action plan outcomes
  • 43. 9.) Preservation Options Evaluation NLA 2011
  • 44. Preservation options evaluation • support import and integration of preservation-treated content and metadata, from either internal or external processes, including: – a. Verifying that preservation-treated digital content conforms to acceptance criteria for preservation outcomes for designated sets of digital content; – b. Enabling staff to quality assure and approve preservation-treated digital content for incorporation into the collection; and – c. After approval, send to preservation action scheduler for treatment of file/s, metadata and associated relationships. • support ‘rollback’ of updated versions of content, metadata and associated relationships to restore previous versions, if necessary. • enable staff to define and approve acceptance criteria for preservation action outcomes for sets of managed content.
  • 45. 10.) So what! Currently, these ideas and requirements have become ‘partially real’. They still need to be implemented. They formed the basis for the preservation requirements in a subsequent: • RFP (Request for Proposal) process; and • RFT (Request for Tender) process. http://www.wildsound-filmmaking-feedback-events.com/images/austin_powers_dr_evil.jpg
  • 46. RFP So all of these ideas where consolidated as requirements for a Request for Proposal which went to the market in July 2011. A number of responses were received for: • Core systems • Preservation • Digitisation • Other Workflows http://www.melbournesumos.com.au/pics/twister/Twister078.jpg These were evaluated and some of the vendors were invited to participate in the next stage.
  • 47. RFT Based on the RFP, the NLA clarified the requirements for the next process. A select group from the RFP process were invited to participated in a Request for Tender in which closed in late December 2011. http://simpro.co/wp-content/uploads/2010/10/paperwork2.jpg
  • 48. What version of reality have we decided upon? Everything, for Everyone Forever Digi By Imogene Pearson (7 years) (March 2012) http://www.flickr.com/photos/ricksmit/15671245/

Notas del editor

  1. I was asked to give a presentation on some of the ideas which the Digital Preservation team at the NLA has been working on over the last year. These ideas have formed the basis for requirements and subsequent tender to replace key components of our Digital Library Infrastructure. The NLA wants to, either: source a product to provide this functionality; Work with a product to extend this functionality; or Build this functionality ourselves.
  2. Like many Libraries, the NLA has very diverse and complex ecosystem. In relation to preservation requirements we have to consider: Lots of stuff (around 1.5 PB of total data) Lots of relationships (especially in our Domain Harvests) Mixed levels of intellectual control (catalogued at the file level and the box level) Many different format types – requiring possibly different and recurring actions at different times in the life of a digital object. Because we do not mandate formats that are accepted into the ecosystem, the NLA will have many formats that we are unable to identify and support.
  3. A general break down of the collections is shown: The largest proportion of the content in the repository is digitised materials (primarily newspapers, however digitised materials can be found in almost all other collections) More problematic for us is born digital materials which is also found in most collections – but in lesser volumes Arguable the most problematic collection area is web archives (domain harvest and selectively harvested) because of: size contains potentially anything complex relationships
  4. First, a caveat.
  5. Pete and Jay from the NLNZ and myself have been talking for some time about an ecology or layers of consciousness for the need for digital preservation intervention. What is presented to you is not a perfect representation of reality. However, it is useful when we are trying to explore if our aims, goals and expectations of preservation are even remotely compatible. At the NLA we have been trying to change the perception of the library over the last 5-10 years to steer us towards integrated preservation systems. We are currently in the process of trying to achieve this last state. If you in the audience are somewhere else in this ecology, perhaps the rest of my talk will be gobbledegook.
  6. We can see this ecology in another way: High vers. Low resources High vers. Low awareness Long term vers. Short term retention
  7. The following is a mixture of observations, common practice and some new ideas.
  8. In order to preserve content over time we believe that we need to do the following: Maintain access to the bit-streams; Maintain access to content encoded in the bit streams; and Maintaining access to the meaning of the content. We need to have all of these components covered in order to have a chance to preserve content over time.
  9. Thus, the primary mission of the digital preservation section at the NLA is to: Maintain the ability to meaningfully access digital collection content over time. For example, two models are presented In model A = doing nothing - over time will lose access to not only content but also the bits In model B = through managed systems were are more likely to maintain access to the bits and through pres actions the content However, if we don’t have the context then we will be literally ‘preserving in the dark’.
  10. Furthermore, our preservation process need to allow us to: Understand what is in the collection; Understand why we have it and what we want to do with it; Understand how to access it; Understand when access is going to become or is problematic; Continue to take steps to maintain access; Audit our arrangements.
  11. In preservation we talk a lot about Risk. The concept of risk is itself risky because we have potentially many different types of risk. Also, obsolescence information is subjective and relative. Much of what we know is a best guess. The only real concrete information that we can get is: can ‘we’ access the format; What is our level and the vendors level of support when its it likely that we won’t be able to? Does a format have characteristics which are problematic, and may therefore be more risky. Also, when we do use risk metrics is some kind of meaningful way we tend to lump them into one bucket. However, there are different kinds of risk which are useful for different circumstances in a repository. We have started to refine risk based on high level use-cases classification. For example: parameter-based risk – e.g. specific M/D which we consider good or bad; exception risk – e.g. the format is not valid or did not validate; change risk – e.g. a number of files have failed fixity checks; conflict risk – e.g. tool X says it is a tiff, tool Y says that it is a pdf!; unknown value risk – e.g. our tools cannot identify 100,000 files in this transfer; access support risk - we can no longer get access 1,000,000 files in format X;
  12. We are going to need many different types of preservation actions. we want to be both proactive and reactive. Be able to see the current state of the repository as well as be able to run ‘what if scenarios’. Understand if we need to take any action on a file. Do we need to take actions on all files in the repository of a particular type, or only those that belong to a particular group (e.g. Tiff’s in a particular collection). These actions could be as simple as replacing the access software (not touching the file). Or as complex as replacing a file and links inside a complex web object. Or even building and maintaining emulation environments over time. We also want be able to get rid of stuff we don’t want (e.g. may not be our responsibility or should not have been taken into the collection in the first place). If we are going to take an action on a file we want to know what about the file is important to the collection owners.
  13. Based on the last point we think that the system at the NLA must take into account: Preservation Intent; Significance; and Level of support of formats and therefore access to the content.
  14. We need a system that can express Pres Intent – does the content need to be preserved. If so, who is responsible for it, how long and what aspects? As we don’t believe in the ‘it is impossible to define significant properties for digital objects school of thought and because we find significant properties so problematic – we have adopted a middle position, expressing pres intent at a fairly hight level (e.g. want to view, edit, navigate, manipulate the content) This is a collaborative process of defining and articulating how the collections see their content and their required level of support for access to the content over time. Including which specific aspects they think are important.
  15. Also, we need to build a system in which an intellectual entity and any given level of granularity can be recorded as being significant. We also need to be able know what is the ‘Level of Support’ for any given digital object within our ecosystem is, at a given time. e.g. given that we can identify what it is - how well (or not) do we maintain access to the content in this file – This will help us to work out priorities and what pres action/s we will need to take.
  16. So, some of these ideas are expressed on this early painting that we found on a cave wall.
  17. Then we took most of the fun out of it. On your right are knowledgebases and systems that deal with: Formats Software Level of Support Pres Intent Priority Pres Actions Pres Options Pres Evaluation
  18. This model is based on being able to access: both human and machine accessible information. consistent preservation metadata which has been recorded and maintained for every digital object in the repository. There will also need to be consistent specific M/D for particular format types (if identified). A summary of this information, which can be grouped into defined sets of managed content (e.g. collections) needs to be readily available.
  19. To start to build a system which looks at pres intent, significance, and level of support and other risk metrics we need to be careful about the level of detail in our system – is it best to have relative indicators or will we drown in the detail? Having said this, we require: Relevant information on formats and versions in our system. Relevant information on software and versions and dependencies that can access particular formats in our system. To be able to build relationships between formats and software – specifically what can open or open edit a format, we need to know: What software is available? Do I have it? What is the external and internal support The proximity of the software to the format e.g. was this software made for this format or is it generic software Take into account Pres Intent and Significance. Does it mater? Use other risk indicators carefully in a measured and meaningful way Reporting on level of support based on these relationships we can determine if we can maintain access to the format and what priority (if any) should be given to its treatment.
  20. There are other parts of this system which we have not prototyped. However, these will need to be built as a part of the new system. These are: Preservation monitoring, reporting and prioritisation Preservation options and preservation action planning Preservation action evaluation
  21. So I will describe this part of the ecology within the red box.
  22. These are the preservation intent statements which we have currently compiled.
  23. We started with agreed statement of Pres Intent for each collection: This was divided into a number of parts: Context of the collection and what they collect; The Preservation Intent of the collection for their identified material; Identified collecting issues/limitations in how the material is collected; Other issues which may effect preservation.
  24. We then started to look at how we might systematise this info at a high level. This raised some very interesting questions about vocabulary and granularity. This partially worked but not to my satisfaction.
  25. This table summarises the previous screen For example, the fields can be characterised as: Owner Description of material Intent: preserve (yes/no), time, what aspects (e.g. view, edit, navigate, manipulate content) Responsibility/Authorisation Detailed Notes Interestingly, the collections tended to view their material based on ingest workflows and catalogue level records. Not files or formats.
  26. We have a slightly different view on how to describe their material. We tend to think about it in terms of files and formats and not workflows However, resolving this in a systematic way will be a job for our next system.
  27. Now I will describe this part of the ecology within the red box. We have prototyped some of these systems which I will briefly show you
  28. This is the File Format Home page It contains Formats that are relevant to our environment The Levels of abstraction are - format family followed by versions  
  29. If we take a look at the entry for TIFF We spent a lot of time thinking about descriptions headings for the free text – what makes a sustainable format – however, this is subjective information, but helpful. We have also have a controlled vocab which is integrated with the text field. We currently have a staff member working full time populating these fields for 6 months and hopefully longer
  30. On the version page we have listed the software that can be used to identify this format (we could have many). This info will be linked back to format M/D summaries from the repository You can also see the relationship to our software registered in the system - this is expressed by the vocab (open, open/edit and transcode). We could list other info as required.
  31. This is the Software home page. It lists software relevant to our environment We have not concentrated on this.
  32. If we choose software like Photoshop CS 5 We have a vocab and free text descriptions We see: version releases; plug-ins; support levels at the NLA; support levels from the vendor; software and hardware requirements; etc. We came to the conclusion that in a relative system major release were what we record.
  33. And the most important aspect for establishing the level of support is the format to software relationship The summary list created by these two knowledgebases shows the list of format that this software can access These relationships can be used in other knowledgebases and to run reports on access configurations, possible migration paths, and what if scenarios.
  34. Another important part of our future system will be the level of support and prioritisation KBs
  35. A stated, the key to level of support is the relationships between format to many software instances. However, we could only build that part of this system as, at this time as we: currently cannot get consistent info for all files from the repository (except No. of files by collection); can only connect to text fields on the pres intent screen; have no consistent significance info in the system; have no systematise risk system metrics in our current system. This will change soon!
  36. Another way of look at the level of support to give us access risk could be: The overall level of support by format (including vendor support, internal support and proximity); Other risk metrics (e.g. has it been deemed obsolete in the outside world) The number of files affected; The preservation intent – by collection; Any significance info;
  37. Prioritisation of treatment, could be based on a summary of all the previous fields - including: Any constraints imposed by rights policies or agreements; and Amount of resources available. This summary could give the NLA collections and management the information that they require to prioritise want they want preserved.
  38. We have a number of other modules in this model. For example, Pres options & action planning
  39. We would like to know what options that we can support. For example: Report on relationships between specific options which have been linked to specific formats (e.g. migration). Report on specific software in our KB which are noted as being relevant for specific preservation actions. For example, tell us all the software we have which can access X and is registered as a migration path. Link to other information available through other link data sources.
  40. The part of the system that looks at generating options should also: enable staff to define, approve and prioritise preservation action plans for sets of managed content support preservation action plans which include: multiple steps and combining manual and automated workflows. replacing files and linkages within a complex object Link to a specific emulation environment Replace existing software to change the level of support Specific the action – no action is required It should also be able to support simulating changes to the environment.
  41. And finally, pres options evaluation
  42. Ultimately, we want to be able to tell if what we have planned is any good - before we start any processing happening in the repository that could take some time.
  43. Currently, these ideas and requirements have become ‘partially real’ (almost like ‘Mostly Dead’ from the movie Princess Bride). They still need to be implemented. They formed the basis for the preservation requirements in a subsequent: RFP (Request for Proposal) process; and RFT (Request for Tender) process.
  44. RFP When to market July 2011 A number of responses were received for: Core systems Preservation Digitisation Other Workflows Select vendors were invited to participate in the new stage.
  45. RFT Closed at the end of Dec 2011.
  46. So which version of reality have we decided upon? The evaluation report has recommended that the Library proceed to contract negotiations with selected tenders for each scope of work. Currently the Library is preparing a submission for ministerial approval prior to commencement of contract negotiations with vendors. Thanks for your time.