Digital representation of medieval manuscripts and their key elements – ranging from beautiful illuminations to ancient hidden diagrams and texts – pose significant challenges for the application of appropriate technologies that are efficient and useful to scholars. While users and institutions tend to focus on the technologies and their technical capabilities, one of the most significant elements in development of digital representations of manuscripts is the ability to share and archive digital data for philology, scholarship and preservation research and analysis. Large datasets need to be created and archived with clear storage and access procedures to ensure data integrity and full knowledge of the digital content. Only with common standards, work processes and access can advanced digitization technologies be used for the study of medieval manuscripts in libraries. These are being used in institutions ranging from the ancient library of St. Catherine’s Monastery in the Sinai to the Library of Congress, Walters Art Museum and University of Pennsylvania Library in the United States. Wherever they are located, each is grappling with the challenges of collecting and preserving digital information from medieval manuscripts and codices for future generations.
These libraries use advanced camera systems to capture high-resolution images of manuscripts. Some of these institutions are also conducting spectral imaging studies of manuscripts with advanced collection and digital processing to reveal erased information – such as the earliest copies of Archimedes diagrams and treatises – without damaging the upper layer of text and artwork. These technologies yield large collections of quality digital images for access and study, but the data that becomes the digital counterpart must be effectively stored, managed and preserved to be truly useful for study. Integrating complex sets of digital images and hosting them on the Web for global users poses a complex set of challenges.
Boost PC performance: How more available memory can improve productivity
Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology
1. Pen to Pixel:
Bringing Appropriate Technologies to
Digital Manuscript Philology
Michael B. Toth
R. B. Toth Associates
rbtoth.com
http://www.thedigitalwalters.org/
On behalf of the Walters Art Museum Digitization
Team, especially:
Lynley Herbert,
Ariel Tabritha, Diane Bockrath, Kimber Wiegand,
Doug Emery
Supported by the US National Endowment for the Humanities
2. Walters Art Museum
W.562, 2b
Koran
9th century AH / 15th CE
Walters Art Museum, Baltimore, Maryland
Digital Imaging System
12. Digital Manuscript Challenges
“…an ultimate challenge to creators and users of digital
tools wishing to produce useful and reliable digital counter-
parts to these medieval sources of knowledge and
testimonies of intellectual creativity.”
• Complex, Changing Technical Climate
• Range of Digital Products & Formats
• Need for Integrity of Entire Data Set
• Demand for Continual & Faster Access
• User Repurposing of Content
• Restrictions on Access and Use
13. Simplicity of Data
1. Access to data
• By People
• By Machines
2. Licensing
• Global Storage &
Access
20. The Digital Walters
Over 10 Terabytes of Data
ng!
wiing!
d grro w
nd g o
.. .. .. a n
a
Islamic Parchment Total
to Pixel
No. of 172 107 279
Manuscripts
No. of TEI 170 37 207
Descriptions
Distinct Images 46,857 34,084 80,941
Image Files 187,266 134,698 321,964
Data Size 5.99 TB 4.09 TB 10.08 TB
21. Data & Metadata
• Long-term data set viability beyond the
lifetime of current technologies
– Adherence to existing broadly accepted
standards
– Simple, flat metadata records
• Integration of metadata with images,
supporting data and scholarly products
22. Cataloging & Metadata
• Metadata Integrated with Digital Object
– Adherence to broadly accepted standards
– Simple, flat metadata records
• Persistent Identifiers
• Accepted Standards
– Standardized Vocabularies
– Metadata Schema
– xml to support conversion to other formats
(e.g. MARC, MODS, EAD)
• Documentation & Preserve Standards
24. Standardiz
e
• Cataloging
• Metadata
• File Format
• Imaging and Color
• Resolution or Fidelity
• Vocabulary and Geographic Names
• Foreign Language and English
• Intellectual Property
• Storage
• Quality and Quality Control
• Others
25. Preservation & Access
Owner of Archimedes Palimpsest:
• Preserve data in “flat files”
– Do not tailor data for Web interfaces
• Host data on “spinning disks”
– Did not want digital product to end up on media that
could become obsolete, with limited access
• Make broadly available on Internet
– Do not place restrictions on use
26. Data Layout
Access ReadMe Data
Walters
Manuscripts
Technical
ReadMe Supplemental
Access
Other
Books
33. Cataloging Information
• Manuscript level: all information that applies the
manuscript as a whole, including an abstract, physical
dimensions and features of the manuscript, like size,
extent, collation, and binding.
• Manuscript item level: all information that applies to
the intellectual divisions of the book, including the titles
of works, rubrics, incipits, colophons, layout information
about the written surface.
• Manuscript piece level: all information for the items
imaged (i.e., binding pieces, flyleaves, and folios),
including item name, folio number, and, for illuminated
pieces, detailed descriptions of the art work.
35. Manuscript DCMI Elements
• Identifier: the shelf mark for manuscripts (e.g., W.582), and the image
serial number for images (e.g., W582_000001)
• Creator: always the Walters Art Museum
• Contributor: one entry for each project participant responsible for the
creation of the manuscript’s data set
• Date: the date of web page or image creation
• Title: the title of the manuscript (e.g, “Walters Ms. W.579, Prayer”)
• Description: a description of the manuscript or image
• Source: source of the object used to create the image or image collection
• Type: Image for individual images; Collection for all images of a manuscript
• Format: image/tiff for images, text/html for a manuscript web page
• Subject: keywords describing the manuscript or imaged folio
• Rights: license and usage terms
36. License and use: UPDATED! 6 February 2013
All License and use:images and descriptions provided here are licensed for use under the
Walters manuscript UPDATED! 6 February 2013
Creative Commons Attribution-Share Alike 3.0 Unported License are licensed for use under the
All Walters manuscript images and descriptions provided here and the
Creative Commons Attribution-Share Alike 3.0 Unported License and the
GNU Free Documentation License.
You are Free to download andLicense. images and descriptions on this website under the licenses
GNU free Documentation use the
named are freeYou do not need to apply to the Walters prior to using the images. We ask only that
You above. to download and use the images and descriptions on this website under the licenses
you cite the source of the not needas the Walters Art Museum. to using the images. We ask only tha
named above. You do images to apply to the Walters prior
Additionally, we request that images of any work created using these materials be sent to the
you cite the source of the a copy as the Walters Art Museum.
Curator of Manuscripts andthat a copy ofat the Walters Art Museum, 600 N. Charles Street, the
Additionally, we request Rare Books any work created using these materials be sent to
Baltimore, of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street,
Curator MD 21201, mss-curator@thewalters.org.
Note these terms 21201, mss-curator@thewalters.org.
Baltimore, MD mark a change from our previous license, which placed a noncommercial
restriction on the use of these materials. The previous license, which placed a noncommercial this
Note these terms mark a change from our noncommercial restriction no longer applies, and
license supersedes use previously advertised license, and replaces that foundlonger applies, and thi
restriction on the the of these materials. The noncommercial restriction no in many of the
license supersedes the previously advertised license, and replaces that found in many of the
archival TIFF image headers.
This change follows theheaders. Art Museum’s licensing policy. More information on the Walters’
archival TIFF image Walters
This change follows the Walters Art Museum’s licensing policy. More information on the Walters’
intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
37. Metadata xml Information
• /manuscript: top-level container of metadata for a manuscript’s images
• /manuscript/image_object: description of the manuscript, primarily Dublin Core
metadata, with the number of images captured in the imageCount element
• /manuscript/images: container for the manuscript’s image data
• /manuscript/images/image: information about a single capture and its
derivatives, including:
– /manuscript/images/image/index: the order of the image in the set, beginning with 0
– /manuscript/images/image/image_subject: the folio number or name of the piece imaged
• /manuscript/images/image/capture: detailed information about the image’s
capture extracted from the imaging software database
• /manuscript/images/image/masterDerivation: description of how the archival
TIFF image was generated from the camera raw file, including cropping and
color correction information
• /manuscript/images/image/jhoveData: XML output of the JHOVE utility run on
the archival TIFF file
• /manuscript/images/image/derivative: three elements containing cropping and
scaling information needed to generate the 300 PPI, SAP, and thumbnail files
from the archival TIFF
40. Standard Workflows
for Data Management
• Transfer & archive digital data for research
and analysis by the curatorial, scholarly,
preservation and imaging communities
• Clear access procedures
− Ensuring data integrity for digital storage
repositories,
− Preventing introduction of mislabeled and
incorrect metadata
41. Quality Control
• Data Quality
– Automate data handling to avoid error
– Audit trail for manual data manipulation
• Quality Management
– Implement processes for quality review
– Verification and Validation
• Documentation
– Define metrics &
quality goals
42. Data Management System
• Internal Digital Asset Management System
– Internal Server
• Image Files
• Catalog Data
• Access Infrastructure
• Security
• Backup
– Internet Systems
Consortium
43. IDR Access Model
Johns Hopkins Metadata Application
Metadata
Metadata
Agent Agent Metadata
(METS) Metadata
(METS) (METS)
(METS)
Preservation
Metadata:
Event Implementation Request
Event Strategies Event
(PREMIS)
Digital
Digital Digital
Representation Digital
Representation Representation
e.g. TIFF Representation
e.g. TIFF e.g. TIFF
Image e.g. TIFF
Image Image
Image
Dublin Core TEI
Dublin Core TEI
Metadata
Metadata
Initiative
Initiative
(DCMI)
(DCMI)
44. Preservation of the Data
Preservation Heresy:
Preservation Heresy:
The Digital information is closer to the original
The Digital information is closer to the original
than the Artifact itself
than the Artifact itself
<
“I don’t use the parchment. The parchment is gone! As far as the
“I don’t use the parchment. The parchment is gone! As far as the
scholars are concerned, there is no parchment. You only work from
scholars are concerned, there is no parchment. You only work from
digital images on the laptop – that’s the only thing that matters for the
digital images on the laptop – that’s the only thing that matters for the
reading.” – Dr. Reviel Netz, 14 Jan WYPR
reading.” – Dr. Reviel Netz, 14 Jan WYPR
45. What Will Happen to the Data?
“There’s a big technical issue that has me worried.
“There’s a big technical issue that has me worried.
The information on the Net is not all simple text. It’s
The information on the Net is not all simple text. It’s
structured, whether it’s Microsoft Word documents or
structured, whether it’s Microsoft Word documents or
PDFs. That means the information is only really
PDFs. That means the information is only really
accessible if you understand how to interpret the bits.
accessible if you understand how to interpret the bits.
What happens when files are there and we don’t
What happens when files are there and we don’t
know how to interpret them anymore?
know how to interpret them anymore?
“If you have a CD but the form isn’t known anymore. II
“If you have a CD but the form isn’t known anymore.
have 5 1/4-in. diskettes, but nothing to read them.
have 5 1/4-in. diskettes, but nothing to read them.
Even 3 1/2-in. diskette readers are becoming hard to
Even 3 1/2-in. diskette readers are becoming hard to
come by. The physical source media change.
come by. The physical source media change.
We may lose the ability to read them.”
We may lose the ability to read them.”
Vint Cerf,
Vint Cerf,
Google Internet Evangelist, recipient of US Presidential Medal of
Google Internet Evangelist, recipient of US Presidential Medal of
Freedom, and basic architecture of the Internet.
Freedom, and basic architecture of the Internet.
July 30, 2007 (Computerworld)
July 30, 2007 (Computerworld)
46. Digital Preservation
Impermanence of Digitized Data
• Dynamic technology, media and
formats
• Rapid obsolescence
• Regular reformatting required
• Ensure utility of data
• Broad distribution to service providers
• Standardized formats & encoding
47. License
All artworks in the photographs are in public domain due to age. The photographs of two-
dimensional objects are also in the public domain. Photographs of three-dimensional objects and
all descriptions have been released under the Creative Commons Attribution-Share Alike 3.0
Unported License and the GNU Free Documentation License.
You are free to download and use the images and descriptions on this website under the licenses
named above, but if you desire digital images at a higher resolution, for scholarly or commercial
publication, please contact our photo services department.
48. Trusted Digital Repository
• Compliance with the Reference Model for an
Open Archival Information System (OAIS)
• Administrative responsibility
• Organizational viability
• Financial sustainability
• Technological and procedural suitability
• System security
• Procedural accountability
1. Title Slide: Meeting the Challenge: Digitizing Islamic Manuscripts at the Walters
Data - all core data: images, transcriptions, metdaata _ checkusm Documents - internal and external documentation ResearchContirib - importtant data that is not integrate with core data set: conservation information, speical or experimental images Supplemental -- Source files for other core data files: folio-by-folio transcriptions are derived from work length transcriptions: Floating Bodies, Method