A presentation by Geof Huth on the PDF/A preservation format presented at a meeting of the Mid-Atlantic Regional Archives Conference in Bethlehem, Pennsylvania, on October 21, 2011. This presentation puts PDF/A in the context of a digital preservation program and explains the uses of its format and some of the details of this international standard.
2. +
File Format Confusion
From 5,000 to 15,000 extant file formats
Most are proprietary
The numbers add complexity to preservation
Real preservation formats are few in number
And we can really count on none of them
3. +
Two General Classes of Formats
Proprietary
Controlled by one company
Underlying code is a trade secret
If the company goes under, the file format becomes obsolete
Open
Controlled by a standards body, a consortium, wiki-like bodies
Code is free and open to all
In absence of an “owner,” can still use the code to make a reader
Neither Guarantees Preservation
But open formats give you an opening to preservation
4. +
Proprietary Formats
Tend to be rich in features
Limited readers for each format
Limited ability to exchange data
Difficult for long-term accessibility
Greater associated costs
5. +
Advantages of Open Formats
More choice in what application to use
Better exchange of data
Better support of long-term preservation
Possible lower costs
Ability to create own readers
6. +
Format/Software Confusion
Software
Creates a file in the format
Reads the file for you
Allows you to interact with the file
Format
Is the specific technical form in which a certain file exists
Can be created by one software product or many
Examples
Adobe Acrobat (and many others) vs PDF
Microsoft Word vs .doc (and .docx, etc.)
7. +
Criteria for Preservation Formats
(and Files)
Ubiquitous
Long-lived
Documented
Metadata-supporting
Accurate
Open
Uncompressed
Unencrypted
8. +
When to Use a Preservation Format
Creation
Begin with a format you know will last
If so, choose a format that allows modification to a file
Recordation
When information becomes a record, save it in a chosen format
This freezes the file and demonstrates it is a record
Archiving
Convert to persistent formats those records needed long-term
The conversion preserves the records and marks is as permanent
Early Action Can Save Money and Time
9. +
Normalization
(action at the point of archiving)
Conversion to a format
Not expected to change
Not expected to disappear
Not expected to become unreadable
Usually conversion to a different format from original
Generally how preservation formats are used
Still, may cause data loss or corruption
10. +
Options for Preservation of Text
American Standard Coding for Information Interchange (ASCII)
Unicode
Portable Document Format / Archive (PDF/A)
Extensible Markup Language (XML)
Open Document Format (ODF) (ISO/IEC 26300:2006)
Office Open XML (OOXML) (ISO/IEC 29500:2008)
11. +
What is Portable Document
Format?
Originally developed by Adobe in 1991
Specifications made available for free in 2001
Format made an open international standard in 2008
Includes text and image features
12. +
Advantages of PDF
Has accessibility across platforms
Saves look and searchability of original
Embeds fonts (if desired)
Allows copying of text from files
Remains fairly stable and universal
Is difficult to modify
Has enhanced document security
Supports authenticity
13. +
Disadvantages of PDF
Won’t always perfectly represent original
Some files are more difficult to convert
Some formatting may be lost if saved back to original file format
Limited ability to modify
A complex format saving image and text
Tends to be larger than a word processing document
14. +
PDF’s Advantage over Others
Image and text in one bundle
Intelligent text
Accepts importance of format to meaning
Ubiquity of format and readers
15. +
Conversion Practices
Have necessary fonts installed
Ensure lossless compression
Important for embedded images
When converting PDF to PDF/A
Eliminate prohibited features
Check beforehand or fix during
16. +
Flavors of the PDF Standard
PDF (vanilla)
PDF/A (for archival preservation)
PDF/X (for publishing)
PDF/E (for engineering drawings)
PDF/VT (for variable data and transactional printing)
PDF/UA (for accessibility—in development)
PDF/H (for healthcare records—a guide, not a standard)
GeoPDF (for geospatial records—only based on standards)
17. +
Portable Document Format /
Archive Standards
PDF/ A-1
ISO Standard 19005-1:2005
Based on PDF Reference 1.4 (Acrobat 5)
PDF/A-2
ISO Standard 19005-2:2011
Based on PDF Reference 1.7
Published 20 June 2011
New versions of PDF/A expected
18. +
Uses of PDF/A
Standard textual documents
Paper documents
Word-processing and PDF documents
Sequences of related digital images
Documents where appearance matters
Static documents
19. +
Less Appropriate for PDF/A
Webpages
Databases
Spreadsheets
Dynamic documents
20. +
Creating PDF/As
Need a product that can produce one
Like Adobe Acrobat 8 Professional
Can convert documents individually
Opening and converting one at a time
Can use batch processing
Converting multiple documents at once
Supported by Acrobat 8
21. +
General Goals of PDF/A
Specifies limited stable set of features
To ensure long-term validity
Eliminate features that are not “archival”
An open preservation standard
Format designed to be a preservation standard
22. +
Required in PDF/A
All fonts embedded
Unlimited legal use of embedded fonts
Device-independent color
Metadata describing the file
File must self-identify the PDF/A version
23. +
Excluded from PDF/A-1
Audio and video content
JavaScript and executable files
Encryption
LZW and JPEG 2000 image compression
Reference to outside content
Transparency
Embedded files
24. +
Differences in PDF/A-2
Allows embedding of OpenType fonts
Allows JPEG2000 image compression
Supports transparent objects
Supports layers, which can be hidden for viewing
Defines use of digital signatures
Defines rules via PDF Advanced Electronic Signatures (PAdES)
Specifies requirements for custom XMP metadata
Allows embedded files, but in only one context
In a PDF/A-2 you can embed PDF/A files
Allows creation of sets of documents in a single file (e.g. emails)
All PDF/A-1s are compliant with PDF/A-2 standard
PDF/A-2 is an extension of PDF/A-1
25. +
PDF/A-1 Conformance Levels
PDF/A-1, Level A (full compliance)
Preserves document’s logical structure
Preserves text stream in reading order
Requires language specification
Requires UNICODE mapping
PDF/A-1, Level B (minimal compliance)
Preserves visual appearance
Doesn’t require as much descriptive info
Less “accessible” format
26. +
Flavors of PDF/A
PDF/A-1a (a = accessible)
RGB Color
CMYK Color
PDF/A-1b (b = basic)
Same color choices
PDF/A-2a (extension of A-1a)
PDF/A-2b (extension of A-1b)
PDF/A-2u (u = Unicode)
Must use Unicode
Does not require representation of logical structure
28. +
PDF/A Validation Tools
Adobe Acrobat Preflight Function (www.adobe.com)
Callas Software pdfaPilot (www.callassoftware.com)
PDF Tools AG's 3-Heights PDF Validator (www.pdf-tools.com)
29. +
Formats are Not Everything
Preservation Programs Require Work
Conversion procedures
Quality control
Version control
Environmental controls
Metadata creation and maintenance
Metadata about the records and their information
Metadata about your preservation actions
Data management controls (backups, etc.)
Ensuring that chosen normalized formats are still valid
Vigilance
Notas del editor
JPEG2000 compression was introduced after release of PDF/A-1 standard Transparency not defined well enough by time of PDF/A-1 standard Transparency found in dropped shadows, cross fades, and highlighting Layers allows layers in maps and engineering drawings to be hidden to help viewer see the data better