SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
Ensuring Data Integrity in a Digital Preservation Archive
                    Future Perfect Conference 2012: Digital Preservation by Design (March 2012)



Abstract                                                     Digital Preservation and the Church
   This paper discusses the challenges of, and               History Department
some working solutions to, a key requirement of                 Today, the Church History Department
digital preservation—ongoing data integrity of               has ultimate responsibility for preserving
the archive. The solutions were developed                    records of enduring value that originate
cooperatively by three vendors in conjunction                from its ecclesiastical leaders and within the
with the Church of Jesus Christ of Latter-day                various Church departments, the Church’s
Saints. State-of-the-art, in-drive data validation           educational institutions, and its affiliations.
plays a key role in ensuring ongoing data                       To fulfill its responsibility, the Church
integrity.                                                   History Department has implemented a
                                                             Digital Records Preservation System
Introduction to the Church of Jesus                          (DRPS) that is based on Ex Libris Rosetta.
Christ of Latter-day Saints                                     Rosetta provides configurable
                                                             preservation workflows and advanced
   The Church of Jesus Christ of Latter-day
                                                             preservation planning functions, but only
Saints is a worldwide Christian church with
                                                             writes a single copy of an Archival
                      more than 14.4 million
                                                             Information Package1 (AIP) to a storage
                      members and 28,784
                                                             device for permanent storage. An
                      congregations. With
                                                             appropriate storage layer must be
                      headquarters in Salt
                                                             integrated with Rosetta in order to provide
                      Lake City, Utah
                                                             the full capabilities of a digital preservation
                      (USA), the Church
                                                             archive, including AIP replication.
                      operates 136 temples,
                                                                After investigating a host of potential
                      three universities, a
                                                             storage layer solutions, the Church History
business college, and thousands of
                                                             Department chose NetApp StorageGRID to
seminaries and institutes of religion around
                                                             provide the Information Lifecycle
the world that enroll more than 700,000
                                                             Management (ILM) capabilities that were
students in religious training.
                                                             desired. In particular, StorageGRID’s data
   The Church has a scriptural mandate to
                                                             integrity, data resilience, and data
keep records of its proceedings and
                                                             replication capabilities were attractive.
preserve them for future generations.
                                                                In order to support ILM migration of
Accordingly, the Church has been creating
                                                             AIPs from disk to tape, StorageGRID
and keeping records since 1830, when it
                                                             utilizes IBM Tivoli Storage Manager (TSM)
was organized. A Church Historian’s Office
                                                             as an interface to tape libraries.
was formed in the 1840s, and in 1972 it was
renamed the Church History Department.

                                                                                                       1
DRPS also employs software extensions
developed by Church Information and
Communications Services (shown in the
red boxes below and described later).




                                                               Nativity scene from biblevideos.lds.org


                                                           Each department has a detailed records
                                                        management plan that specifies which of its
                                                        collections are appraised as having
                                                        enduring value. Typically, less than a tenth
                                                        of the collections are targeted for
                                                        preservation. In the future, selected Church
                                                        websites will also be preserved. And, a
                                                        multi-petabyte backlog of audiovisual
                                                        collections is presently being ingested into
                                                        DRPS.
      Architecture of the Church History Department’s
            Digital Records Preservation System
                           (DRPS)
                                                        Data Corruption in a Digital
   Within a decade, the Church anticipates              Preservation Archive
that it will have generated a cumulative                   A critical requirement of a digital
archival capacity of more than 100                      preservation system is the ability to
petabytes for a single copy of AIPs.                    continuously ensure data integrity of its
Therefore, the total cost of ownership of               archive. This requirement differentiates a
DRPS archival storage must be minimized.                tape archive from other tape farms.
   An internal study showed that the total                 Modern IT equipment, including servers,
cost of automated tape cartridges would be              storage, network switches and routers,
a third of the corresponding cost of disk               incorporate advanced features to minimize
arrays. Therefore, the Church History                   data corruption. Nevertheless, undetected
Department currently uses automated tape                errors still occur for a variety of reasons.
libraries for DRPS archival storage.                       Whenever data files are written, read,
   Internal departments of the Church                   stored, transmitted over a network, or
generate multiple petabytes of records                  processed, there is a small but real
annually. Record format types range from                possibility that corruption will occur.
documents and images to videos of the                   Causes range from hardware and software
birth, life, death, and resurrection of the             failures to network transmission failures
Lord Jesus Christ that were given to the                and interruptions. Bit flips (also called bit
world as a free gift last Christmas by the              rot) within data stored on tape also cause
Church (available at biblevideos.lds.org).              data corruption.
                                                                                                         2
Recently, data integrity of the entire         Church Information and Communications
DRPS tape archive was validated. This             Services (ICS) that create SHA-1 fixity
validation run encountered a 3.3x10-14 bit        information for producer files before they
error rate.                                       are transferred to DRPS for ingest (see the
   The USC Shoah Foundation Institute for         DRPS architecture shown previously).
Visual History and Education has observed            Within Rosetta, SHA-1 fixity checks are
a 2.3x10-14 bit error rate within its tape        performed three times—(i) when the
archive, which required the preservation          deposit server receives a Submission
team to flip back 1500 bits per 8 petabytes       Information Package1 (SIP), (ii) during the
of archive capacity.2                             SIP validation process, and (iii) when a file
   These real life measurements—one taken         is moved to permanent storage.
from a large archive and the other from a            Rosetta also provides the capability to
relatively small archive—provide a credible       perform fixity checks on files after they
estimation of the amount of data corruption       have been written to permanent storage,
that will occur in a digital preservation tape    but the ILM features of StorageGRID do not
archive. Therefore, working solutions must        utilize this capability. Therefore,
be implemented to detect and correct these        StorageGRID must take over control of the
bit errors.                                       fixity information once files have been
                                                  ingested into the grid.
DRPS Solutions to Data Corruption                    By collaborating with Ex Libris on this
                                                  process, ICS and Ex Libris have been
    In order to continuously ensure data
                                                  successful in making the fixity information
integrity of its tape archive, DRPS employs
                                                  hand off from Rosetta to StorageGRID.
fixity information.
                                                     This is accomplished with a web service
    Fixity information is a checksum (i.e.,
                                                  developed by ICS that retrieves SHA-1 hash
integrity value) calculated by a secure hash
                                                  values generated independently by
algorithm to ensure data integrity of an AIP
                                                  StorageGRID when the files are written to
file throughout preservation workflows and
                                                  the StorageGRID gateway node. Ex Libris
after the file has been written to the archive.
                                                  developed a Rosetta plug-in that calls this
    By comparing fixity values before and
                                                  web service and compares the StorageGRID
after records are written, transferred across
                                                  SHA-1 hash values with those in the
a network, moved or copied, DRPS can
                                                  Rosetta database, which are known to be
determine if data corruption has taken
                                                  correct.
place during the workflow or while the AIP
                                                     Turning now to the storage layer of
is stored in the archive. DRPS uses a variety
                                                  DRPS, StorageGRID is constructed around
of hash values, cyclic redundancy check
                                                  the concept of object storage. To ensure
values, and error-correcting codes for such
                                                  object data integrity, StorageGRID provides
fixity information.
                                                  a layered and overlapping set of protection
    In order to implement fixity information
                                                  domains that guard against data corruption
as early as possible in the preservation
                                                  and alteration of files that are written to the
process, and thus minimize data errors,
                                                  grid.
DRPS provides ingest tools developed by
                                                                                                3
The highest level domain utilizes the       minimizes resource use, but is not secure
SHA-1 fixity information discussed above.      against deliberate alteration.
   A SHA-1 hash value is generated for             Second, a key-based hash value is
each AIP (or object) that Rosetta writes to    appended. This value can be verified using
permanent storage (i.e., to StorageGRID).      the key that is stored as part of the
Also called the Object Hash, the SHA-1         metadata managed by StorageGRID.
hash value is self-contained and requires no   Although this hash value takes more
external information for verification.         resources to implement than the CRC
   Each object contains a SHA-1 object hash    checksum described above, it is secure
of the StorageGRID formatted data that         against all forms of tampering as long as
comprise the object. The object hash is        the key is protected.
generated when the object is created (i.e.,        The CRC checksum is verified during
when the gateway node writes it to the first   every StorageGRID object operation—i.e.,
storage node).                                 store, retrieve, transmit, receive, access, and
   To assure data integrity, the object hash   background verification. But, as with the
is verified every time the object is stored    object hash, the key-based hash value is
and accessed. Furthermore, a background        only verified when the object is accessed.
verification process uses the SHA-1 object         Once a file has been correctly written to a
hash to verify that the object, while stored   StorageGRID storage node (i.e., its data
on disk, has neither become corrupt nor has    integrity has been ensured through both
been altered by tampering.                     SHA-1 object hash and CRC fixity checks),
   Underneath the SHA-1 object hash            StorageGRID invokes the TSM Client
domain, StorageGRID also generates a           running on the archive node server in order
Content Hash when the object is created.       to write the file to tape.
Since objects consist of AIP content data          As this happens, the SHA-1 (object hash)
plus StorageGRID metadata, the content         fixity information is not handed off to TSM.
hash provides additional protection for AIP    Rather, it is superseded with new fixity
content files.                                 information composed of various cyclic
   Because the content hash is not self-       redundancy check values and error-
contained, it requires external information    correcting codes that provide TSM end-to-
for verification, and therefore is checked     end logical block protection when writing the
only when the object is accessed.              file to tape.
   Each StorageGRID object has a third and         Thus the DRPS fixity information chain
fourth domain of data protection applied,      of control is altered when StorageGRID
and two different types of protection are      invokes TSM. Nevertheless, validation of
utilized.                                      the file’s data integrity continues seamlessly
   First, a cyclic redundancy check (CRC)      until it is written to tape.
checksum is added that can be quickly
computed to verify that the object has not
been corrupted or accidentally altered. This
CRC provides a verification process that

                                                                                             4
The process begins when the TSM server          C1 code can be checked once again to verify
calculates and appends a CRC value to each         the written data.
logical block of the file before transferring it       A successful read-while-write operation
to a tape drive for writing. Each appended         assures that no data corruption has
CRC is called the “original data CRC” for          occurred from the time the file’s logical
that logical block.                                block was transferred from the TSM server
   When the tape drive receives a logical          until it is written to tape. And using these
block, it computes its own CRC for the data        ECCs and CRCs, the tape drive can validate
and compares it to the original data CRC. If       logical blocks at full line speed as they are
an error is detected, a check condition is         being written!
generated, forcing a re-drive or a                     During a read operation (i.e., when
permanent error—effectively guaranteeing           Rosetta accesses an AIP), data is read from
protection of the logical block during             the tape and all three codes (C1, C2, and the
transfer.                                          original data CRC) are decoded and
   In addition, as the logical block is loaded     checked. A read error is generated if any
into the drive’s main data buffer, two other       process indicates an error.
processes occur—                                       The original data CRC is then appended
   (1) Data received at the buffer is cycled       to the logical block when it is transferred to
back through an on-the-fly verifier that           the TSM server so it can be independently
once again validates the original data CRC.        verified by that server, thus completing the
Any introduced error will again force a re-        TSM end-to-end logical block protection
drive or a permanent error.                        cycle.
   (2) In parallel, an error-correcting code           This advanced and highly efficient TSM
(ECC) is computed and appended to the              end-to-end logical block protection is
data. Referred to as the “C1 code,” this ECC       enabled with state-of-the-art functions
protects data integrity of the logical block       available with IBM LTO-5 and TS1140 tape
as it goes through additional formatting           drives.
steps—including the addition of an                     When the TSM server sends the data
additional ECC, referred to as the “C2             over the network to a TSM client, CRC
code.”                                             checking is done once again to ensure
   As part of these formatting steps, the C1       integrity of the data as it is written to the
code is checked every time data is read            StorageGRID storage node.
from the data buffer. Thus, protection of the          From there, StorageGRID fixity checking
original data CRC is essentially                   occurs, as explained previously for object
transformed to protection from the more            access—including content hash and key-
powerful C1 code.                                  based hash value checking—until the data
   Finally, the data is read from the main         is transferred to Rosetta for delivery to its
buffer and is written to tape using a read-        requestor, thus completing the DRPS data
while-write process. During this process,          integrity validation cycle.
the just written data is read back from tape
and loaded into the main data buffer so the

                                                                                                5
occur after AIPs are written correctly to
       DRPS Data Integrity Validation
                                                                         tape.
  SHA-1
  control
            DRPS Ingest Tools        SHA-1 created for producer files


  SHA-1                              SHA-1 checked upon ingest           Conclusion
  control                            and write to permanent storage

                                     Web service retrieves StorageGRID      The Church of Jesus Christ of Latter-day
            Storage Extensions       SHA-1, then Rosetta plug-in
                                     compares with Rosetta SHA-1         Saints is making a substantial investment to
  SHA-1
                StorageGRID
                                     SHA-1 created for ingested files    continuously ensure data integrity of its
  control                            SHA-1 and other fixity checked
                                     during write to storage nodes       DRPS archive as described in this paper.
  CRCs,            IBM               TSM end-to-end logical block
  ECCs      Tivoli Storage Manager
                                     protection
                                                                         The benefits of preserving the Church’s
                                                                         exalting and inspiring records cannot be
                                                                         measured in financial terms, however.
    Summary of the DRPS data integrity validation cycle                  Those benefits include building character
                                                                         and strengthening families—both of which
Ensuring Ongoing Data Integrity
                                                                         are designed to foster both personal and
    Unfortunately, continuously ensuring                                 family happiness.
data integrity of a DRPS AIP does not end
once the AIP has been written correctly to                               References
tape. Periodically, the tape(s) containing the                           1 CCSDS 650.0-B-1BLUE BOOK, “Reference Model
AIP needs to be checked to uncover errors                                for an Open Archival Information System (OAIS),”
(i.e., bit flips) that may have occurred since                           Consultative Committee for Space Data Systems
the AIP was correctly written.                                           (2002)
                                                                         2 Private conversation with Sam Gustman (CTO) at
    Fortunately, IBM LTO-5 and TS1140 tape
drives can perform this check without                                    the USC Shoah Foundation Institute August 19, 2009

having to stage the AIP to disk, which is
clearly a resource intensive task—especially
for an archive with a capacity measured in
hundreds of petabytes!
    IBM LTO-5 and TS1140 drives can
perform data integrity validation in-drive,
which means a drive can read a tape and
concurrently check the AIP logical block
CRC and ECCs discussed above (C1, C2,
and the original data CRC). Good or bad
status is reported as soon as these internal
checks are completed. And this is done
without requiring any other resources!
    Clearly, this advanced capability
enhances the ability of DRPS to perform
periodic data integrity validations of the
entire archive more frequently, which will
facilitate the correction of bit flips that

                                                                                                                          6

Más contenido relacionado

Similar a Ensuring Data Integrity white paper

Digital preservation geoscinfo
Digital preservation geoscinfoDigital preservation geoscinfo
Digital preservation geoscinfosmtcd
 
In Search of Simplicity: Redesigning the Digital Bleek and Lloyd
In Search of Simplicity: Redesigning the Digital Bleek and LloydIn Search of Simplicity: Redesigning the Digital Bleek and Lloyd
In Search of Simplicity: Redesigning the Digital Bleek and LloydLighton Phiri
 
Digital library
Digital libraryDigital library
Digital libraryanueldhose
 
Digital library and metadata
Digital library and metadataDigital library and metadata
Digital library and metadataramncsi
 
Digital Library
 Digital Library Digital Library
Digital LibraryShiv Kumar
 
Data Library Services at the University of Edinburgh
Data Library Services at the University of EdinburghData Library Services at the University of Edinburgh
Data Library Services at the University of EdinburghRobin Rice
 
Digital preservation and curation of information.presentation
Digital preservation and curation of information.presentationDigital preservation and curation of information.presentation
Digital preservation and curation of information.presentationPrince Sterling
 
Two day-long training on "DSpace" Institutional Repository
Two day-long training on "DSpace" Institutional RepositoryTwo day-long training on "DSpace" Institutional Repository
Two day-long training on "DSpace" Institutional RepositoryNur Ahammad
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...ManjulaPatel
 
Website designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practicesWebsite designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practicesCss Founder
 
Consortium on Digitization of Indian Agricultural Library Resources
Consortium on Digitization of Indian Agricultural Library  ResourcesConsortium on Digitization of Indian Agricultural Library  Resources
Consortium on Digitization of Indian Agricultural Library ResourcesDevakumar Jain
 
ICTD departmental meeting presentation on repository development
ICTD departmental meeting presentation on repository developmentICTD departmental meeting presentation on repository development
ICTD departmental meeting presentation on repository developmentChris Awre
 
Digital library
Digital libraryDigital library
Digital libraryparvathykj
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...faflrt
 

Similar a Ensuring Data Integrity white paper (20)

Digital Libray
Digital LibrayDigital Libray
Digital Libray
 
Digital preservation geoscinfo
Digital preservation geoscinfoDigital preservation geoscinfo
Digital preservation geoscinfo
 
In Search of Simplicity: Redesigning the Digital Bleek and Lloyd
In Search of Simplicity: Redesigning the Digital Bleek and LloydIn Search of Simplicity: Redesigning the Digital Bleek and Lloyd
In Search of Simplicity: Redesigning the Digital Bleek and Lloyd
 
Digital library
Digital libraryDigital library
Digital library
 
Prometheus
PrometheusPrometheus
Prometheus
 
Prometheus
PrometheusPrometheus
Prometheus
 
Digital library and metadata
Digital library and metadataDigital library and metadata
Digital library and metadata
 
ARTICLE_MEDICI
ARTICLE_MEDICIARTICLE_MEDICI
ARTICLE_MEDICI
 
Digital Library
 Digital Library Digital Library
Digital Library
 
Data Library Services at the University of Edinburgh
Data Library Services at the University of EdinburghData Library Services at the University of Edinburgh
Data Library Services at the University of Edinburgh
 
Digital preservation and curation of information.presentation
Digital preservation and curation of information.presentationDigital preservation and curation of information.presentation
Digital preservation and curation of information.presentation
 
Two day-long training on "DSpace" Institutional Repository
Two day-long training on "DSpace" Institutional RepositoryTwo day-long training on "DSpace" Institutional Repository
Two day-long training on "DSpace" Institutional Repository
 
The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...The Role of OAIS Representation Information in the Digital Curation of Crysta...
The Role of OAIS Representation Information in the Digital Curation of Crysta...
 
Inroduction to Dspace
Inroduction to DspaceInroduction to Dspace
Inroduction to Dspace
 
Website designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practicesWebsite designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practices
 
Consortium on Digitization of Indian Agricultural Library Resources
Consortium on Digitization of Indian Agricultural Library  ResourcesConsortium on Digitization of Indian Agricultural Library  Resources
Consortium on Digitization of Indian Agricultural Library Resources
 
Dlindia
DlindiaDlindia
Dlindia
 
ICTD departmental meeting presentation on repository development
ICTD departmental meeting presentation on repository developmentICTD departmental meeting presentation on repository development
ICTD departmental meeting presentation on repository development
 
Digital library
Digital libraryDigital library
Digital library
 
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...Reference Model for an Open Archival Information Systems (OAIS): Overview and...
Reference Model for an Open Archival Information Systems (OAIS): Overview and...
 

Más de Future Perfect 2012

Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Future Perfect 2012
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryFuture Perfect 2012
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchFuture Perfect 2012
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemFuture Perfect 2012
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineFuture Perfect 2012
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveFuture Perfect 2012
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationFuture Perfect 2012
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessFuture Perfect 2012
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesFuture Perfect 2012
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiFuture Perfect 2012
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsFuture Perfect 2012
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveFuture Perfect 2012
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingFuture Perfect 2012
 
Cassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWCassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWFuture Perfect 2012
 

Más de Future Perfect 2012 (20)

Bigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie LeanBigger Hard Drive Jamie Lean
Bigger Hard Drive Jamie Lean
 
Steve Knight by Design
Steve Knight by DesignSteve Knight by Design
Steve Knight by Design
 
Michael Parsons Passion
Michael Parsons PassionMichael Parsons Passion
Michael Parsons Passion
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Joe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage LibraryJoe Coleman Biodiversity Heritage Library
Joe Coleman Biodiversity Heritage Library
 
James Smithies Academic Earthquake Research
James Smithies Academic Earthquake ResearchJames Smithies Academic Earthquake Research
James Smithies Academic Earthquake Research
 
Shaun Hendy Innovation Ecosystem
Shaun Hendy Innovation EcosystemShaun Hendy Innovation Ecosystem
Shaun Hendy Innovation Ecosystem
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
Steve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data ArchiveSteve Mc Eachern Australian Data Archive
Steve Mc Eachern Australian Data Archive
 
Parul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right CombinationParul Sharma Sally Vermaaten Right Combination
Parul Sharma Sally Vermaaten Right Combination
 
Alison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for SuccessAlison Fleming Michael Upton Collaborating for Success
Alison Fleming Michael Upton Collaborating for Success
 
Andrew Waugh Business Systems
Andrew Waugh Business SystemsAndrew Waugh Business Systems
Andrew Waugh Business Systems
 
Gabe Nault Data Integrity
Gabe Nault Data IntegrityGabe Nault Data Integrity
Gabe Nault Data Integrity
 
Clare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in DatabasesClare Somerville Trish O’Kane Data in Databases
Clare Somerville Trish O’Kane Data in Databases
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Dave Pearson The Adventures of Digi
Dave Pearson The Adventures of DigiDave Pearson The Adventures of Digi
Dave Pearson The Adventures of Digi
 
Jay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying FormatsJay Gattuso Persistently Identifying Formats
Jay Gattuso Persistently Identifying Formats
 
Jeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation PerspectiveJeff Rothenberg Digital Preservation Perspective
Jeff Rothenberg Digital Preservation Perspective
 
Stuart Wakefield Cloud Computing
Stuart Wakefield Cloud ComputingStuart Wakefield Cloud Computing
Stuart Wakefield Cloud Computing
 
Cassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSWCassie Findlay Digital Transformation SRNSW
Cassie Findlay Digital Transformation SRNSW
 

Último

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Ensuring Data Integrity white paper

  • 1. Ensuring Data Integrity in a Digital Preservation Archive Future Perfect Conference 2012: Digital Preservation by Design (March 2012) Abstract Digital Preservation and the Church This paper discusses the challenges of, and History Department some working solutions to, a key requirement of Today, the Church History Department digital preservation—ongoing data integrity of has ultimate responsibility for preserving the archive. The solutions were developed records of enduring value that originate cooperatively by three vendors in conjunction from its ecclesiastical leaders and within the with the Church of Jesus Christ of Latter-day various Church departments, the Church’s Saints. State-of-the-art, in-drive data validation educational institutions, and its affiliations. plays a key role in ensuring ongoing data To fulfill its responsibility, the Church integrity. History Department has implemented a Digital Records Preservation System Introduction to the Church of Jesus (DRPS) that is based on Ex Libris Rosetta. Christ of Latter-day Saints Rosetta provides configurable preservation workflows and advanced The Church of Jesus Christ of Latter-day preservation planning functions, but only Saints is a worldwide Christian church with writes a single copy of an Archival more than 14.4 million Information Package1 (AIP) to a storage members and 28,784 device for permanent storage. An congregations. With appropriate storage layer must be headquarters in Salt integrated with Rosetta in order to provide Lake City, Utah the full capabilities of a digital preservation (USA), the Church archive, including AIP replication. operates 136 temples, After investigating a host of potential three universities, a storage layer solutions, the Church History business college, and thousands of Department chose NetApp StorageGRID to seminaries and institutes of religion around provide the Information Lifecycle the world that enroll more than 700,000 Management (ILM) capabilities that were students in religious training. desired. In particular, StorageGRID’s data The Church has a scriptural mandate to integrity, data resilience, and data keep records of its proceedings and replication capabilities were attractive. preserve them for future generations. In order to support ILM migration of Accordingly, the Church has been creating AIPs from disk to tape, StorageGRID and keeping records since 1830, when it utilizes IBM Tivoli Storage Manager (TSM) was organized. A Church Historian’s Office as an interface to tape libraries. was formed in the 1840s, and in 1972 it was renamed the Church History Department. 1
  • 2. DRPS also employs software extensions developed by Church Information and Communications Services (shown in the red boxes below and described later). Nativity scene from biblevideos.lds.org Each department has a detailed records management plan that specifies which of its collections are appraised as having enduring value. Typically, less than a tenth of the collections are targeted for preservation. In the future, selected Church websites will also be preserved. And, a multi-petabyte backlog of audiovisual collections is presently being ingested into DRPS. Architecture of the Church History Department’s Digital Records Preservation System (DRPS) Data Corruption in a Digital Within a decade, the Church anticipates Preservation Archive that it will have generated a cumulative A critical requirement of a digital archival capacity of more than 100 preservation system is the ability to petabytes for a single copy of AIPs. continuously ensure data integrity of its Therefore, the total cost of ownership of archive. This requirement differentiates a DRPS archival storage must be minimized. tape archive from other tape farms. An internal study showed that the total Modern IT equipment, including servers, cost of automated tape cartridges would be storage, network switches and routers, a third of the corresponding cost of disk incorporate advanced features to minimize arrays. Therefore, the Church History data corruption. Nevertheless, undetected Department currently uses automated tape errors still occur for a variety of reasons. libraries for DRPS archival storage. Whenever data files are written, read, Internal departments of the Church stored, transmitted over a network, or generate multiple petabytes of records processed, there is a small but real annually. Record format types range from possibility that corruption will occur. documents and images to videos of the Causes range from hardware and software birth, life, death, and resurrection of the failures to network transmission failures Lord Jesus Christ that were given to the and interruptions. Bit flips (also called bit world as a free gift last Christmas by the rot) within data stored on tape also cause Church (available at biblevideos.lds.org). data corruption. 2
  • 3. Recently, data integrity of the entire Church Information and Communications DRPS tape archive was validated. This Services (ICS) that create SHA-1 fixity validation run encountered a 3.3x10-14 bit information for producer files before they error rate. are transferred to DRPS for ingest (see the The USC Shoah Foundation Institute for DRPS architecture shown previously). Visual History and Education has observed Within Rosetta, SHA-1 fixity checks are a 2.3x10-14 bit error rate within its tape performed three times—(i) when the archive, which required the preservation deposit server receives a Submission team to flip back 1500 bits per 8 petabytes Information Package1 (SIP), (ii) during the of archive capacity.2 SIP validation process, and (iii) when a file These real life measurements—one taken is moved to permanent storage. from a large archive and the other from a Rosetta also provides the capability to relatively small archive—provide a credible perform fixity checks on files after they estimation of the amount of data corruption have been written to permanent storage, that will occur in a digital preservation tape but the ILM features of StorageGRID do not archive. Therefore, working solutions must utilize this capability. Therefore, be implemented to detect and correct these StorageGRID must take over control of the bit errors. fixity information once files have been ingested into the grid. DRPS Solutions to Data Corruption By collaborating with Ex Libris on this process, ICS and Ex Libris have been In order to continuously ensure data successful in making the fixity information integrity of its tape archive, DRPS employs hand off from Rosetta to StorageGRID. fixity information. This is accomplished with a web service Fixity information is a checksum (i.e., developed by ICS that retrieves SHA-1 hash integrity value) calculated by a secure hash values generated independently by algorithm to ensure data integrity of an AIP StorageGRID when the files are written to file throughout preservation workflows and the StorageGRID gateway node. Ex Libris after the file has been written to the archive. developed a Rosetta plug-in that calls this By comparing fixity values before and web service and compares the StorageGRID after records are written, transferred across SHA-1 hash values with those in the a network, moved or copied, DRPS can Rosetta database, which are known to be determine if data corruption has taken correct. place during the workflow or while the AIP Turning now to the storage layer of is stored in the archive. DRPS uses a variety DRPS, StorageGRID is constructed around of hash values, cyclic redundancy check the concept of object storage. To ensure values, and error-correcting codes for such object data integrity, StorageGRID provides fixity information. a layered and overlapping set of protection In order to implement fixity information domains that guard against data corruption as early as possible in the preservation and alteration of files that are written to the process, and thus minimize data errors, grid. DRPS provides ingest tools developed by 3
  • 4. The highest level domain utilizes the minimizes resource use, but is not secure SHA-1 fixity information discussed above. against deliberate alteration. A SHA-1 hash value is generated for Second, a key-based hash value is each AIP (or object) that Rosetta writes to appended. This value can be verified using permanent storage (i.e., to StorageGRID). the key that is stored as part of the Also called the Object Hash, the SHA-1 metadata managed by StorageGRID. hash value is self-contained and requires no Although this hash value takes more external information for verification. resources to implement than the CRC Each object contains a SHA-1 object hash checksum described above, it is secure of the StorageGRID formatted data that against all forms of tampering as long as comprise the object. The object hash is the key is protected. generated when the object is created (i.e., The CRC checksum is verified during when the gateway node writes it to the first every StorageGRID object operation—i.e., storage node). store, retrieve, transmit, receive, access, and To assure data integrity, the object hash background verification. But, as with the is verified every time the object is stored object hash, the key-based hash value is and accessed. Furthermore, a background only verified when the object is accessed. verification process uses the SHA-1 object Once a file has been correctly written to a hash to verify that the object, while stored StorageGRID storage node (i.e., its data on disk, has neither become corrupt nor has integrity has been ensured through both been altered by tampering. SHA-1 object hash and CRC fixity checks), Underneath the SHA-1 object hash StorageGRID invokes the TSM Client domain, StorageGRID also generates a running on the archive node server in order Content Hash when the object is created. to write the file to tape. Since objects consist of AIP content data As this happens, the SHA-1 (object hash) plus StorageGRID metadata, the content fixity information is not handed off to TSM. hash provides additional protection for AIP Rather, it is superseded with new fixity content files. information composed of various cyclic Because the content hash is not self- redundancy check values and error- contained, it requires external information correcting codes that provide TSM end-to- for verification, and therefore is checked end logical block protection when writing the only when the object is accessed. file to tape. Each StorageGRID object has a third and Thus the DRPS fixity information chain fourth domain of data protection applied, of control is altered when StorageGRID and two different types of protection are invokes TSM. Nevertheless, validation of utilized. the file’s data integrity continues seamlessly First, a cyclic redundancy check (CRC) until it is written to tape. checksum is added that can be quickly computed to verify that the object has not been corrupted or accidentally altered. This CRC provides a verification process that 4
  • 5. The process begins when the TSM server C1 code can be checked once again to verify calculates and appends a CRC value to each the written data. logical block of the file before transferring it A successful read-while-write operation to a tape drive for writing. Each appended assures that no data corruption has CRC is called the “original data CRC” for occurred from the time the file’s logical that logical block. block was transferred from the TSM server When the tape drive receives a logical until it is written to tape. And using these block, it computes its own CRC for the data ECCs and CRCs, the tape drive can validate and compares it to the original data CRC. If logical blocks at full line speed as they are an error is detected, a check condition is being written! generated, forcing a re-drive or a During a read operation (i.e., when permanent error—effectively guaranteeing Rosetta accesses an AIP), data is read from protection of the logical block during the tape and all three codes (C1, C2, and the transfer. original data CRC) are decoded and In addition, as the logical block is loaded checked. A read error is generated if any into the drive’s main data buffer, two other process indicates an error. processes occur— The original data CRC is then appended (1) Data received at the buffer is cycled to the logical block when it is transferred to back through an on-the-fly verifier that the TSM server so it can be independently once again validates the original data CRC. verified by that server, thus completing the Any introduced error will again force a re- TSM end-to-end logical block protection drive or a permanent error. cycle. (2) In parallel, an error-correcting code This advanced and highly efficient TSM (ECC) is computed and appended to the end-to-end logical block protection is data. Referred to as the “C1 code,” this ECC enabled with state-of-the-art functions protects data integrity of the logical block available with IBM LTO-5 and TS1140 tape as it goes through additional formatting drives. steps—including the addition of an When the TSM server sends the data additional ECC, referred to as the “C2 over the network to a TSM client, CRC code.” checking is done once again to ensure As part of these formatting steps, the C1 integrity of the data as it is written to the code is checked every time data is read StorageGRID storage node. from the data buffer. Thus, protection of the From there, StorageGRID fixity checking original data CRC is essentially occurs, as explained previously for object transformed to protection from the more access—including content hash and key- powerful C1 code. based hash value checking—until the data Finally, the data is read from the main is transferred to Rosetta for delivery to its buffer and is written to tape using a read- requestor, thus completing the DRPS data while-write process. During this process, integrity validation cycle. the just written data is read back from tape and loaded into the main data buffer so the 5
  • 6. occur after AIPs are written correctly to DRPS Data Integrity Validation tape. SHA-1 control DRPS Ingest Tools SHA-1 created for producer files SHA-1 SHA-1 checked upon ingest Conclusion control and write to permanent storage Web service retrieves StorageGRID The Church of Jesus Christ of Latter-day Storage Extensions SHA-1, then Rosetta plug-in compares with Rosetta SHA-1 Saints is making a substantial investment to SHA-1 StorageGRID SHA-1 created for ingested files continuously ensure data integrity of its control SHA-1 and other fixity checked during write to storage nodes DRPS archive as described in this paper. CRCs, IBM TSM end-to-end logical block ECCs Tivoli Storage Manager protection The benefits of preserving the Church’s exalting and inspiring records cannot be measured in financial terms, however. Summary of the DRPS data integrity validation cycle Those benefits include building character and strengthening families—both of which Ensuring Ongoing Data Integrity are designed to foster both personal and Unfortunately, continuously ensuring family happiness. data integrity of a DRPS AIP does not end once the AIP has been written correctly to References tape. Periodically, the tape(s) containing the 1 CCSDS 650.0-B-1BLUE BOOK, “Reference Model AIP needs to be checked to uncover errors for an Open Archival Information System (OAIS),” (i.e., bit flips) that may have occurred since Consultative Committee for Space Data Systems the AIP was correctly written. (2002) 2 Private conversation with Sam Gustman (CTO) at Fortunately, IBM LTO-5 and TS1140 tape drives can perform this check without the USC Shoah Foundation Institute August 19, 2009 having to stage the AIP to disk, which is clearly a resource intensive task—especially for an archive with a capacity measured in hundreds of petabytes! IBM LTO-5 and TS1140 drives can perform data integrity validation in-drive, which means a drive can read a tape and concurrently check the AIP logical block CRC and ECCs discussed above (C1, C2, and the original data CRC). Good or bad status is reported as soon as these internal checks are completed. And this is done without requiring any other resources! Clearly, this advanced capability enhances the ability of DRPS to perform periodic data integrity validations of the entire archive more frequently, which will facilitate the correction of bit flips that 6