4. • A bit about me
• About this session
• Why do we digitize?
• Context: Libraries,
Archives and
Museums (LAMs)
Introduction
5. 01.1 Introduction
Just a bit about me ...
@UDCMRK
• Twitter@udcmrk
• www.udc793.org
• www.linkedin.com/i
n/martinkalfatovic
• … an inordinate
fondness for Dodos
6. 01.2.1 Introduction
About this session
Taking a broad overview of standards, lingo, hardware, software and
planning considerations, this session will get everyone current and
on a level playing field to proceed through the subsequent topics.
The session will help you establish the foundational vocabulary to
both enrich your SEI experience and increase your capacity to
communicate with your colleagues about the basics of digital
reformatting.
The session will introduces the practices, standards, and challenges
evident across the spectrum of cultural heritage institutions
acquiring, managing, and providing access to digital collections. The
session considers the digital curation life-cycle as well as lightly
touching on funding and aggregators.
7. About this session
1. Lecture/Discussion
2. Vocabulary Building
3. Exercises
01.2.2 Introduction
8. 01.3 Introduction
Context
Libraries | Archives | Museums
In principle, the work of art has always
been reproducible. Objects made by
humans could always be copied by
humans. Replicas were made by
pupils in practicing for their craft, by
masters in disseminating their works,
and, finally, by third parties in pursuit of
profit. But the technological
reproduction of artworks is something
new. Having appeared intermittently in
history, at widely spaced intervals, it is
now being adopted with ever-
increasing intensity.
Das Kunstwerk im Zeitalter seiner technischen
Reproduzierbarkeit. Walter Benjamin (1936)
9. 01.3 Introduction
Context
Libraries | Archives | Museums
Digitization has magnified
our ability to reproduce art,
books, and even objects,
with increasing rapidity,
ease, and added
functionality.
10. 01.4 Introduction
• Provide online access to
collections
• Make digitized material and
metadata available through
online catalogs AND for reuse on
other platforms.
• Maximize value to the largest
audience in new and creative
ways.
• Advance the preservation by
reducing wear and tear on the
originals.
Why do we digitize?
Based on NARA strategic plan
11. 01.4 Introduction
Why do we digitize?
Based on NARA strategic plan
• Provide access to those
materials that can no longer be
accessed in their original format.
• Maximize the efficient and
effective use of resources to
carry out digitization and achieve
cost-saving benefits whenever
possible.
• Improve our service to
customers by responding to their
evolving expectations
12. • Let’s all count in
binary!
• Analog vs. Binary :
Wave vs. Sample
• Bytes vs. Bits
Basics
13. 02.1.1 Basics
BASE 10 (Decimal)
0 1 10 100 1,000 10,000 100,000 1,000,000
BASE 2 (Binary)
0 1 2 4 8 16 32 64 128 256 512 1,024 2,048 5,096
BASE 3 (Ternary) – not on the test!
0 1 2 3 9 27 81 243 729 2,187 6,561 19,683
Basics
Let’s All Count in Binary!
19. 02.2 Basics
Analog vs. Binary : Wave vs. Sample
By Hyacinth - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=30716342
By Hyacinth - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=23867344
20. 02.3 Basics
Bytes vs Bits
BIT
A bit is the smallest unit of information that can be stored or
manipulated on a computer; it consists of either zero or one.
AKA a bit a binary digit, especially when working with the 0 or
1 values.
BYTE
A byte is how many bits are needed to represent letters of
the alphabet and other characters. For example, the letter
"A" would be 01000001. 8 bits = 1 byte
WORD
Groups of 4 Bytes (translated into Hexadecimal – Base 16!),
e.g. 4B 4A 57 00 = K J W <null>
23. I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
In mathematics and digital electronics, a binary number is a
number expressed in the binary numeral system or base-2
numeral system which represents numeric values using two
different symbols: typically 0 (zero) and 1 (one). The base-2
system is a positional notation with a radix of 2. Because of its
straightforward implementation in digital electronic circuitry
using logic gates, the binary system is used internally by almost
all modern computers and computer-based devices. Each digit
is referred to as a bit.
Term: Binary
25. I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
An analog signal has a theoretically infinite resolution. In
practice an analog signal is subject to electronic noise and
distortion introduced by communication channels and signal
processing operations, which can progressively degrade the
signal-to-noise ratio (SNR). In contrast, digital signals have a
finite resolution.
Term: Analog
27. I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
The byte is a unit of digital information that most commonly
consists of eight bits. Historically, the byte was the number of
bits used to encode a single character of text in a computer and
for this reason it is the smallest addressable unit of memory in
many computer architectures. The size of the byte has
historically been hardware dependent and no definitive
standards existed that mandated the size. The de facto standard
of eight bits is a convenient power of two permitting the values 0
through 255 for one byte.
Term: Byte
29. I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
The process of recording an analog signal in a digital form. In
relation to content of this site, it describes the process of
translating analog signal data emanating from an object (light or
sound) into a digitally encoded format. Audio, still and moving
images are commonly digitized for increased access or for
preservation purposes.
Term: Digitization
31. I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
The prefix bronto, as used in the term brontobyte, has been
used to represent anything from 1015 to 1027 bytes, most often
1027
Term: Brontobytes
32. • Get ready!
• Prioritization: Some
things are more
equal than others
• Tools to help you
Get Ready and
Prioritize
• Copyright!
Measure Twice, Cut Once
33. 03.1 Measure Twice …
Measure Twice, Cut Once
Get ready!
Staffing Resources
Acknowledge that digitizing for public access is a significant
business process that crosses multiple business units.
Develop a separate human resource plan to support this
digitization business process.
•
• IT Infrastructure
• Along with staffing, require an IT plan to support digitization
that includes bandwidth, storage, the ability to share images
and metadata across business units, among other
requirements.
34. 03.1 Measure Twice …
Policy and Guidance for Digitization Activities
• Promulgate policy and guidance that provides further
implementation direction as business units begin
implementing the strategy
• Technical Digitization Standards
• Develop technical digitization requirements for the
approaches outlined above to ensure uniformity and
standardization.
• Funding Strategies
• Seek out and explore other options and relationships to
digitize and make content available
Measure Twice, Cut Once
Get ready!
35. 03.2 Measure Twice …
Candidates for digitization projects will be prioritized according
to established criteria for significance and use.
•
• Candidates for digitization projects will be prioritized in order
to achieve a demonstrated high priority preservation benefit
for the agency.
•
• Funding is available or likely to be available and sustainable
for the project.
Measure Twice, Cut Once
Prioritization: Some things are more equal
than others
36. 03.3 Measure Twice
Starting up
Tools to help you Get ready & Prioritize
• Digitization Plans
• Digital Asset Management Plan
• Web Access Plan
37. 03.4 Measure Twice …
Copyright
https://xkcd.com/14/
Sometimes I just can't get outraged over copyright law ...
38. Copyright
To promote the Progress
of Science and useful Arts,
by securing for limited
Times to Authors and
Inventors the exclusive
Right to their respective
Writings and Discoveries.
Article I, Section 8, Clause 8 of the United
States Constitution
… but most of the time I am … but still …
03.4 Measure Twice …
40. 04.1 Staffing
Staffing a digitization project
Depending on the size of
the institution, staff
members may fill a number
of roles. Also, do not forget
that in addition to your
regular staff, your
volunteers, interns, and
student help can participate
in the digitization process
(with the proper training
and supervision).
41. 04.2 Staffing
Staffing a digitization project
How
• In-house staffing
• Outsourcing
• Hybrid approach
42. 04.3 Staffing
Staffing a digitization project
Who
• Director / CEO
• Project Manager
• Curator
• Technical Staff
• Conservator
• Scanning Operators
43. 04.3.1 Staffing
Staffing a digitization project
Director / CEO
As with any LAM activity,
the overall responsibility for
all functions ultimately rests
with the director. Strong
leadership and vision for
digitization is necessary for
a successful program.
44. 04.3.2 Staffing
Staffing a digitization project
Project Manager
Manage goals,
expectations, identify
further staffing, equipment,
liaison between
departments and staff,
create workplans and
associated documents,
manages funds.
45. 04.3.3 Staffing
Staffing a digitization project
Curator
Or, the person in charge of
a collection. In addition to
their responsibilities of
caring for the collections,
curators are also generally
responsible for the display
of the objects in coherent
and informative or
educational ways.
46. 04.3.4 Staffing
Staffing a digitization project
Technical Staff
Database development,
web/database integration,
CGI (Common Gateway
Interface) script writing, Perl
programming, and related
activities that simplify the
process of getting objects
to the scanning operations
and the resulting files in a
usable state.
47. 04.3.5 Staffing
Staffing a digitization project
Conservator
Depending upon the types
of collections, consultation
with the preservation /
conservation staff in varying
degrees will be necessary
to determine if (and how)
the items can be digitized
and/or photographed.
48. 04.3.6 Who
Staffing a digitization project
Scanning Operators
Scanning, photography,
handling materials
(packing, shipping) and
other such skills are just a
few that may be required of
staff doing the actual
conversion.
49. Getting Technical
• Pixel tricks
• Color and not color
• File formats
• Cost and
implementation
factors
• File naming
• Compression
• Bundling file
formats
50. 05.1 Getting Technical
Getting Technical
Pixel Tricks
Pixel
An abbreviation of picture element,
this term may refer to a component
of either a digital image or a digital
sensor. In the case of a digital
image, the pixel is the smallest
discrete unit of information in the
image's structure. In the case of the
sensor in a scanner or digital
camera, a pixel is the smallest
photosensitive component or cell
providing a response to light (or
photons).
51. 05.1.1 Getting Technical
Getting Technical
Why do we care?
Remember back to wave vs.
sample? Pixels can be thought
of as those elements of the
samples that fall within the
wave.
54. 05.1.3 Getting Technical
Getting Technical
Pixel Parameters
Sampling Frequency
This parameter measures the physical
pixel count in pixels per inch (ppi), pixels
per mm, etc. This parameter informs us
about the size of the original and also
provides part of the data needed to
determine the level of detail recorded in
the file.
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
55. 05.1.3 Getting Technical
Getting Technical
Pixel Parameters
Sharpening
Sharpening artificially enhances details to
create the illusion of greater definition.
There are three major sharpening
processes in a typical imaging pipeline:
capture sharpening (through camera
setting adjustment), image sharpening in
post processing, and output sharpening
for print or display purposes. Sharpening
is usually implemented through image
edge enhancement, such as filtering
techniques using unsharp masks and
inverse image diffusion.
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
57. 05.1.3 Getting Technical
Getting Technical
Pixel Parameters
Reproduction Scale Accuracy
This parameter measures the relationship
between the size of the original object to
the size of that object in the digital image.
This parameter is measured in relation to
the pixels per inch (ppi) or pixels per mm
(ppmm) of the original digital capture.
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
58. 05.2 Getting Technical
• Bitonal
• Grayscale
• Color
• Additive color
• Subtractive color
Getting Technical
Color and Not Color
60. 05.2.2 Getting Technical
Getting Technical
Color and Not Color: CMYK
The CMYK color model (process color,
four color) is a subtractive color
model, used in color printing, and is
also used to describe the printing
process itself. CMYK refers to the four
inks used in some color printing: cyan,
magenta, yellow, and key (black).
Though it varies by print house, press
operator, press manufacturer, and
press run, ink is typically applied in the
order of the abbreviation.
By Viliam Furík - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=39316936 (upper)
By SharkD at English Wikipedia Later version uploaded by Jacobolus, Dacium
at en.wikipedia. - Transferred from en.wikipedia to Commons., Public Domain,
https://commons.wikimedia.org/w/index.php?curid=3791468 (lower)
61. 05.2.3 Getting Technical
Getting Technical
Color and Not Color: RBG
The RGB color model is an additive
color model in which red, green, and
blue light are added together in various
ways to reproduce a broad array of
colors. The name of the model comes
from the initials of the three additive
primary colors, red, green, and blue.
The main purpose of the RGB color
model is for the sensing,
representation, and display of images
in electronic systems, such as
televisions and computers
Wikipedia https://en.wikipedia.org/wiki/RGB_color_model
62. 05.2.4 Getting Technical
Getting Technical
Color and Not Color: RBG vs. CYMK
By RGB_CMYK_4.jpg: Annette Shacklett derivative work: Marluxia.Kyoshu [Public domain], via Wikimedia Commons
A comparison of RGB and CMYK color
spaces. The image demonstrates the
difference between the RGB and CMYK
color gamuts. The CMYK color gamut is
much smaller than the RGB color gamut,
thus the CMYK colors look muted. If you
were to print the image on a CMYK device
(an offset press or maybe even a ink jet
printer) the two sides would likely look
much more similar, since the combination
of cyan, yellow, magenta and black cannot
reproduce the range (gamut) of color that
a computer monitor displays. This is a
constant issue for those who work in print
production. Clients produce bright and
colorful images on their computers and
are disappointed to see them look muted
in print.
64. Getting Technical
File Formats
Master
A digital file (images, video, audio)
which has been stored in its original
captured state. These master files
are also referred to as master
copies, preservation masters or
preservation copies.
05.3.1 Getting Technical
65. Getting Technical
File Formats
Archival
A file that is composed of one or
more computer files along with
metadata. Archive files are used to
collect multiple data files together
into a single file for easier portability
and storage, or simply to compress
files to use less storage space.
Archive files often store directory
structures, error detection and
correction information, arbitrary
comments, and sometimes use
built-in encryption.
05.3.1 Getting Technical
66. Getting Technical
File Formats
Access
Often used to for low resolution
images, (thumbnails, screen images)
that are made available the Internet.
See also delivery copy and surrogate
image. This could be an identical
copy of the original file or perhaps a
lower quality version with a smaller
file size. Sometimes called delivery or
surrogate or derivative.
05.3.1 Getting Technical
68. File Formats
Three Factors
• Cost
• System
Implementation
• Sustainability
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.4 Getting Technical
69. File Formats
Cost Factors
• Implementation Cost
• Cost of Software Tools
• Cost of equipment needed to
produce files
• Storage Cost
• Network Cost
• Ongoing Cost of Production
• Cost of Providing Access
• Cost of Preservation
Processing
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.4.1 Getting Technical
70. File Formats
Implementation Factors
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
• Level of difficulty/complexity
• Technical Complexity
• Toolset Complexity
• Availability of tools
• Ease and accuracy for OCR
• Ease and accuracy of File
validation
• Ease and accuracy of
monitoring of quality
05.4.2 Getting Technical
71. File Formats
Sustainability Factors
• Disclosure
• Adoption
• Transparency
• Self-Documentation
• Native Embedded Metadata Capabilities
• Embedded Metadata Capabilities Through
Extension
• Level of Work Necessary to Embed Native
Metadata
• Level of Work Necessary to Embed
Metadata Through Extension
• Geo-referencing Metadata
• Level of Effort to Embed Geo-referencing
Metadata
• Impact of Patents
• Technical Protection Mechanisms
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.4.3 Getting Technical
73. File Formats
File Naming (Zhèngmíng | 正名)
05.5 Getting Technical
The Master replied, “What is necessary
is to rectify names.” “So! indeed!” said
Tsze-lu. “You are wide of the mark! Why
must there be such rectification?”
“Therefore a superior man considers it
necessary that the names he uses may
be spoken appropriately, and also that
what he speaks may be carried out
appropriately. What the superior man
requires is just that in his words there
may be nothing incorrect.”
From The Analects of Confucius, Book 13, Verse 3 (James R. Ware, translated in
1980):
74. File Formats
File Naming: Some Guidelines
Semantic Names
There is meaning encoded in the
name, like:
• “BCA_03_04_00_145”
• Syrnium fulvescens from
Biologia Centrali-Americana).
Aves. Volume IV (1879-1904) by
Osbert Salvin and F. DuCane
Godman (4th volume of the 3rd
part of Biologia Centrali-
Americana, plate 145
05.5.1 Getting Technical
BCA_03_04_00_145.jpg
75. File Formats
File Naming: Some Guidelines
Practical
• Use barcodes, accession
numbers, etc.
•
“39088002738714-0003”
• Wiener Farbenkabinet (1794)
05.5.2 Getting Technical
39088002738714-0003.jpg
76. File Formats
File Naming: Three Parts
• Prefix
• Ordinal Position
• Suffix
05.5.3 Getting Technical
39088002738714-0003.jpg
77. File Formats
File Naming: Three Parts
Prefix
Bca_03_04_00
This is the part
that’s either
semantic or
practical
05.5.3 Getting Technical
Ordinal
145
Position of the
item in relation
to a compound
object
Suffix
jpg
File type
Bca_03_04_00_145.jpg
78. File Formats
File Naming: Last Thoughts
05.5.3 Getting Technical
• Stick with three letter
extension for the suffix
(.tif, .jpg, .jp2, .png)
• Keep file names the same
length (padding with Zeros
not spaces!)
• Better to be consistent
than right!
79. 05.6 Technical
File Formats
Compression / Lossy / Lossless
Some file format types,
specifically JPEG, JPEG
2000, and TIFF, allow you to
compress the file size.
Compression leads to loss of
data (since you are making
the files smaller).
Be aware of this data loss
when compressing files.
80. 05.7 Technical
File Formats
Bundling file formats
When you need to move
many associated files around,
you may wish to “bundle”
them to pull them all together
into one package. Common
formats are:
• ZIP
• TAR
• RAR
• 7z
82. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
An abbreviation of picture element, this term may refer to
a component of either a digital image or a digital sensor.
In the case of a digital image, the pixel is the smallest
discrete unit of information in the image's structure. In the
case of the sensor in a scanner or digital camera, a pixel
is the smallest photosensitive component or cell
providing a response to light (or photons).
Term: Pixel
84. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
A term used to describe the an abrupt and unnatural
transition over and edge feature. Also referred to as
"staircasing" because of the jagged and abrupt transition.
Term: Pixilation
86. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
PPI stands for pixels per inch, commonly used in
describing the resolution capabilities of an imaging
device such as a scanner or the resolution of a digital
image. The terms DPI (dots per inch) and PPI are used
somewhat interchangeably today.
Term: PPI
88. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Pixel Dimensions are the horizontal and vertical
measurements of an image expressed in pixels. The pixel
dimensions may be determined by multiplying both the
width and the height by the DPI. A digital camera will also
have pixel dimensions, expressed as the number of
pixels horizontally and vertically that define its resolution
(e.g., 2,048 by 3,072). Calculate the DPI achieved by
dividing a document's dimension into the corresponding
pixel dimension against which it is aligned.
Term: Pixel Dimensions
90. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Color is the visual perceptual property corresponding in
humans to the categories called red, blue, yellow, etc.
Color derives from the spectrum of light (distribution of
light power versus wavelength) interacting in the eye with
the spectral sensitivities of the light receptors. Color
categories and physical specifications of color are also
associated with objects or materials based on their
physical properties such as light absorption, reflection, or
emission spectra.
Term: Color
92. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
The number of bits used to represent each pixel in an image. The term
can be confusing since it is sometimes used to represent bits per pixel
and at other times, the total number of bits used multiplied by the
number of total channels. For example, a typical color image using 8
bits per channel is often referred to as a 24-bit color image (8 bits x 3
channels). Color scanners and digital cameras typically produce 24 bit
(8 bits x 3 channels) images or 36 bit (12 bits x 3 channels) capture,
and high-end devices can produce 48 bit (16 bit x 3 channels) images.
A grayscale scanner would generally be 1 bit for monochrome or 8 bit
for grayscale (producing 256 shades of gray). Bit depth is also referred
to as color depth.
Term: Bit depth (image)
94. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
A subtractive color model used in printing that is based
on cyan (C), magenta (M), yellow (Y) and black (K).
These are typically referred to as process colors. Cyan
absorbs the red component of white light, magenta
absorbs green, and yellow absorbs blue. In theory, the
mix of the three colors will produce black, but a black ink
is used to increase the density of black in a print.
Term: CMYK
96. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
An additive color model based on the three primary
colors of red (R), blue (B) and green (G).
Term: RGB
98. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Grayscale is a range of monochromatic shades from
black to white. Therefore, a grayscale image contains
only shades of gray and no color.
Term: Grayscale
100. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
A bitonal image is represented by pixels consisting of 1
bit each, which can represent two tones (typically black
and white), using the values 0 for black and 1 for white or
vice versa.
Term: Bitonal
102. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Generally referring to pictorial images where there is a
non-broken range of tones from white to black that may
have every shade of gray represented. There are
theoretically an infinite number of tones. Traditional
photography (photochemical photography) produces
continuous tone images. When reformatting pictorial
items, it is important to distinguish continuous tone
originals from printed halftones, since these two classes
are likely to require different strategies and methods for
making the digital images.
Term: Continuous tone
104. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
The choice of colorspace determines how many unique
colors are potentially possible in your digital file, and how
fine the gradations are between shades of color.
Each colorspace was designed for a specific purpose,
none is superior to the others for all applications.
However, FADGI recommends selecting an appropriate
colorspace from the recommendations in the charts in
this document.
Term: Color Space
106. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Optical Character Recognition (OCR) is a technology that
allows dots or pixels representing machine generated
characters in a raster image to be converted into digitally
coded text. In addition to recognizing and coding text,
OCR programs attempt to recognize and code the
structural elements of a document page, such as
columns and non-text graphical elements.
Term: OCR
108. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
File that represents the best copy produced by a
digitizing organization, with best defined as meeting the
objectives of a particular project or program. These
objectives differ from one content category to another
and the specifications to be recommended at this Web
site (forthcoming) will be tailored to fit a variety of
common categories and objectives. In some cases, an
archive may produce more than one archival master file.
Term: Archival master file
110. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
The choice of master file format is a decision which
affects how your digitized materials can be used and
managed. There is no one correct master file format for
all applications, all format choices involve compromises
between quality, access and lifecycle management. The
FADGI star system tables list the most appropriate
master file formats for each imaging project type.
Selection of the most appropriate format within these
recommended choices is an important decision that
should be consistent with your long term archive strategy.
Term: Master File Format
112. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
FADGI anticipates continual evolution in the availability of
access file formats, each new format designed to provide
specific advantages over others for a specific application.
Care should be taken when selecting access formats to
insure long term viability.
Term: Access File Format
114. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Vector graphics is the use of polygons to represent
images in computer graphics. Vector graphics are based
on vectors, which lead through locations called control
points or nodes. Each of these points has a definite
position on the x and y axes of the work plane and
determines the direction of the path; further, each path
may be assigned a stroke color, shape, curve, thickness,
and fill.
Term: Vector graphics
116. II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
These formats encapsulate their constituent files and,
save for a directory that provides the filenames, do not
describe the content and the relationships that may
obtain between files. (In this, they differ from what are
often called wrappers.) Archetypes include ZIP, StuffIt,
and TAR, the latter associated with the UNIX operating
system. Simple bundling formats tend to be generic, i.e.,
they may be used for a wide range of content types.
Term: Bundling file format
120. 06.1.2 DIY
Do It Yourself!
Describing the Collection
The sheer number of metadata
standards in the cultural heritage
sector is overwhelming, and their
inter-relationships further
complicate the situation. This
visual map of the metadata
landscape is intended to assist
planners with the selection and
implementation of metadata
standards
Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley
121. 06.1.2 DIY
Do It Yourself!
Describing the Collection
Each of the 105 standards listed
here is evaluated on its strength
of application to defined
categories in each of four axes:
community, domain, function, and
purpose. The strength of a
standard in a given category is
determined by a mixture of its
adoption in that category, its
design intent, and its overall
appropriateness for use in that
category.
Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley
123. 06.2.1 DIY
Do It Yourself!
Scanner vs. Camera
The World’s First Digital
Camera (1975) by Kodak and
Steve Sasson
124. 06.2.1 DIY
Do It Yourself!
Camera
At the heart of a digital
camera is the sensor. The
size and density of the sensor
determines the pixel count of
the resulting image.
The sensor in combination
with an optical lense creates
the digital image.
By C-M - own Image, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=2150801
125. 06.2.3 DIY
Do It Yourself!
Scanner
A scanner is a much different
device; the sensor array is
long and thin.
By moving across the target
at varying speeds and angles,
higher or lower resolution
outputs can be generated.
By Scanner_a_plat_fonctionnement.png: User:Jean-noderivative work: Pluke (talk) -
Scanner_a_plat_fonctionnement.png, FAL, https://commons.wikimedia.org/w/index.php?curid=17009063
126. 06.2.4 DIY
Do It Yourself!
A Note on Sensors
CCD vs CMOS
Charge-Coupled Devices vs.
Complementary Metal–
Oxide–Semiconductor
Both types of sensor
accomplish the same task of
capturing light and converting
it into electrical signals.
By Filya1 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6304562
127. 06.3 DIY
Do It Yourself!
Digitization Equipment
There are many options for
equipment to create digital
files from collections objects:
• Scanners
• Slide/Negative scanners
• Specialized tools
The World’s First Digital Camera (1975) by Kodak and Steve Sasson
128. 06.3.1 DIY
Do It Yourself!
Digitization Equipment: Flatbed Scanners
• 25-50 Page Automated
Document Feeder
• Flat Bed Scanning
• Support for either TWAIN and/or
ISIS interface drivers
• USB or SCSI Interface
• Support for largest expected
documents
• Duplex (Automatic) scanning
(two sides at one pass)
• Optical 200 x 200 - 600 x 600
Dots Per Inch (DPI)
129. 06.3.2 DIY
Do It Yourself!
Digitization Equipment: MFM / Slide / Neg
Remember to keep in mind
your original size with doing
transparencies and
negatives. You will need to
scan at a much higher PPI to
create an image closer in
size to the original for display
or printing
• Microfilm scanners
• Slide & Negative scanners
130. 06.3.4 DIY
Do It Yourself!
Specialized Tools
• Digital cameras
• Scanning back
cameras
• High speed book
scanners
• 3D scanners
131. 06.3.4.1 DIY
Do It Yourself!
Digital Cameras
Digital cameras come if a
vast array of sizes,
shapes, and formats.
DT ATOM
Frame footprint: 24.625” wide x 31” deep
With table top: 25.625” wide x 31.25” deep
Height with Column: 65.5”
Light Arm Span: 78.5”
132. 06.3.4.1 DIY
Do It Yourself!
Digital Cameras
DT BC100
Materials
Frame is made from black anodized extruded
aluminum, custom brackets are made from
black anodized aircraft grade aluminum.
Overall Dimensions
7’H x 6’5” W x 5’D
Footprint
6’W x 5’D
Glass Platen Dimensions
24.9” x 17.48” on each side
Book Binding Limitations
6” binding
Working Table Height
30”
Accessory & Monitor Shelf Dimensions
Side shelves are 19” x 34” black laminate
Compressor
.5 HP with a 6.3 gallon tank
133. 06.3.4.2 DIY
Do It Yourself!
Scanning Back Cameras
Scanning back cameras
provide generally higher
resolution by replacing the
sensor array with a sensing
device that scans across the
image created by the camera
lens.
• More megapixel images
• Slower scanning times
XF Phase One
DT RCam
DT RG3040
Images from Digital Transitions
http://dtdch.com/
134. 06.3.4.3 DIY
Do It Yourself!
High Speed Book Cameras
For high throughput, robotic
scanners are available for
some materials, such as
books.
135. 06.3.4.4 DIY
Do It Yourself!
3D Equipment
There are a wide variety of 3D
digitization tools and
processes. What type of
equipment and process you
use should be carefully
thought out with the end-use
of the digitization in the
forefront.
Smithsonian 3D Imaging Team
136. 06.4 DIY
Do It Yourself!
Where to do it: The Gray Room
Having dedicated workspace
for your digitization is, of
course, optimal. In reality,
digitization will occur
wherever it is most practical.
Still, if you have the luxury of
dedicated space, here are
some guidelines on building it
out.Internet Archive, San Francisco
137. 06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
The working environment
should be painted/decorated
a neutral, matte gray with a
60% reflectance or less to
minimize flare and perceptual
biases.
FADGI Guidelines
138. 06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
Monitors should be positioned
to avoid reflections and direct
illumination on the screen.
FADGI Guidelines
Smithsonian Libraries Scanning Room
139. 06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
ISO 12646 requires the room
illumination be less than 32 lux when
measured anywhere between the
monitor and the observer, and the
light a color temperature of
approximately 5000K. Consistent
room illumination is a fundamental
element of best practice in imaging.
Changes in color temperature or light
level from a window, for example, can
dramatically affect the perception of
an image displayed on a monitor.
FADGI Guidelines
Smithsonian Libraries Scanning Room
140. 06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
Each digitization station should
be in a separate room, or
separated from each other by
sufficient space and with
screening to minimize the light
from one station affecting
another. It is critically important
to maintain consistent
environmental conditions within
the working environment.
FADGI Guidelines
Internet Archive, San Francisco
141. 06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
Care should be taken to
maintain the work environment
at the same temperature and
humidity as the objects are
normally kept in. Variations can
cause stress to some materials
and in severe cases may
damage the originals. The use
of a datalogger in both imaging
and storage areas is highly
recommended.
FADGI Guidelines
Smithsonian Libraries Scanning Room
142. 06.5 DIY
Do It Yourself!
Quality Control
Quality Control (QC), or Quality
Assurance (QA), is key to
maintaining the overall quality
and fidelity of any digitization
project. Differing levels of QC
may be needed for the type of
project and materials being
digitized.
In large scale projects, 100%
QC will rarely be feasible.
http://www.sil.si.edu/imagegalaxy/imagegalaxy_imageDetail.cfm?id_image=7403
144. 06.6 DIY
Do It Yourself!
Storage
Now that we’ve
created all this
data, we need to
store it
somewhere …
https://flic.kr/p/86miWv
145. 06.6.1 DIY
Do It Yourself!
Storage
• Short Term
• Medium Term
• Long Term
https://flic.kr/p/86miWv
146. 06.6.2 DIY
Do It Yourself!
Primary magnetic storage
• Diskettes
• Hard disks (both fixed
and removable)
• High capacity floppy
disks
• Disk cartridges
• Magnetic tape
Smithsonian Data Center
https://flic.kr/p/cGFn2f
147. 06.6.3 DIY
Do It Yourself!
Primary optical storage
• Compact Disk Read
Only Memory (CD
ROM)
• Digital Video Disk Read
Only Memory (DVD
ROM)
• CD Recordable (CD R)
• CD Rewritable (CD
RW)
Smithsonian Data Center
https://flic.kr/p/cGFn4E
148. 06.6.3 DIY
Do It Yourself!
Solid-state storage
Solid-state storage is a type of
non-volatile computer storage
that stores and retrieves digital
information using only
electronic circuits, without any
involvement of moving
mechanical parts. (Wikipedia)
Examples:
• SSD
• Flash driveInternet Archive
https://flic.kr/p/dnDS11
149. 06.6.4 DIY
Do It Yourself!
Acronyms
• DAS
• NAS
• SAN
• DAM
Internet Archive
https://flic.kr/p/8Ms4QV
150. 06.6.4.1 DIY
Do It Yourself!
Direct-attached storage (DAS)
... is a traditional mass
storage, that does not
use any network. This
is still a most popular
approach. This
retronym was coined
recently, together with
NAS and SAN.
(Wikipedia)Internet Archive
https://flic.kr/p/8Ms4QV
151. 06.6.4.2 DIY
Do It Yourself!
Network-attached storage (NAS)
… is mass storage
attached to a computer
which another computer
can access at file level
over a local area network,
a private wide area
network, or in the case of
online file storage, over the
Internet.. (Wikipedia)
Internet Archive
https://flic.kr/p/8Ms4QV
152. 06.6.4.3 DIY
Do It Yourself!
Storage area network (SAN)
... is a specialized network, that
provides other computers with
storage capacity. The crucial
difference between NAS and
SAN is the former presents and
manages file systems to client
computers, whilst the latter
provides access at block-
addressing (raw) leve.
(Wikipedia)
Internet Archive
https://flic.kr/p/8Ms4QV
153. 06.6.4.3 DIY
Do It Yourself!
Digital asset management (DAM)
… consists of
management tasks and
decisions surrounding
the ingestion,
annotation,
cataloguing, storage,
retrieval and
distribution of digital
assets. (Wikipedia)Internet Archive
https://flic.kr/p/8Ms4QV
154. 06.6.5 DIY
Do It Yourself!
Digital Preservation
http://www.xkcd.com/1683/
155. 06.6.5 DIY
Do It Yourself!
Digital Preservation
https://xkcd.com/242/
There are two
kinds of
preservationists:
those who have
lost data and those
who will.
Minimum Digitization Capture
Recommendations (2013)
156. 06.7 DIY
Do It Yourself!
Recap of Scanning
• Scan at best resolution you
can afford to store
• Manuscripts and text: 300
ppi
• Photographs: 400-800 ppi
• Graphic materials: 600-
800 ppi
• Maps: 600 ppi (up to 36”)
or 300-400 pp (greater
than 36”)
• Calibrate monitor and
scanning devicesSmithsonian Libraries Scanning Room
157. 06.8 DIY
Do It Yourself!
Recap of Process
• Create master (uncompressed)
file
• For analog content:
Scan/sample
• For born-digital content:
Convert
• Name the file in a consistent way
• Perform quality control;edit as
needed
• Save master on stable, long-term
storage
• Create derivative or access file
• Share access files as needed
http://xkcd.com/730/
158. 06.9 DIY
Do It Yourself!
Digitization Life-cycle
• Create master (uncompressed)
file
• For analog content:
Scan/sample
• For born-digital content:
Convert
• Name the file in a consistent way
• Perform quality control;edit as
needed
• Save master on stable, long-term
storage
• Create derivative or access file
• Share access files as needed
Biodiversity Heritage Library Digitization Life-Cycle
159. 06.10 DIY
Do It Yourself!
Better, Faster, Cheaper!
Now that we’ve
mastered
digitization. How
do we scale it to
100’s, 1000’s,
millions! of
objects?Smithsonian Natural History Museum
165. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Any orderly arrangement of individual sensor elements.
In digital imaging, there are primarily three array types;
two dimensional or area arrays, one dimensional or linear
arrays, and tri-linear arrays consisting of three
consecutive linear arrays of red, green, and blue
sensitive sensor elements.
Term: Array
167. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Light existing in the environment that is not produced by
the imaging system. Ambient light can be natural or
artificial light. Ambient light is generally uncontrolled and
can be highly variable, posing a possible risk to image
quality. The level of ambient light should be minimized in
relation to the level of light produced by the imaging
system.
Term: Ambient light
169. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The comparison of instrument performance to a standard
of higher accuracy. The standard is considered the
reference and the more correct measure. Calibrations
should be performed against a specified tolerance.
Term: Calibration
171. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Exchangeable image file format (Exif) describes a
metadata set to accompany TIFF, JPEG, and RIFF WAV
formatted image files. Exif was prepared by the Technical
Standardization Committee on AV & IT Storage Systems
and Equipment and is Published by the Japan
Electronics and Information Technology Industries
Association (JEITA ). The Exif 2.2 specification (JEITA
CP-3451) is in nearly universal use by camera
manufacturers.
Term: Exif
173. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Embedded metadata used for image management. IPTC
metadata is primarily composed of descriptive,
administrative, and rights metadata, as opposed to the
technical nature of Exif. IPTC metadata was developed
and is controlled by the IPTC.
Term: IPTC Metadata
175. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The Metadata Encoding and Transmission Standard
(METS) is a metadata standard for encoding descriptive,
administrative, and structural metadata regarding objects
within a digital library, expressed using the XML schema
language of the World Wide Web Consortium (W3C). The
standard is maintained as part of the MARC standards of
the Library of Congress, and is being developed as an
initiative of the Digital Library Federation (DLF).
Term: METS
177. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The relationship between the horizontal and vertical
dimensions of an image. The horizontal dimension is
normally listed first. For example, a 4 (vertical) by 6 inch
(horizontal) print has an aspect ratio of 3:2.
Term: Aspect ratio
179. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
General term to describe a broad range of undesirable
flaws or distortions in digital reproductions produced
during capture or data processing. Some common forms
of image artifacts include noise, chromatic aberration,
blooming, interpolation, and imperfections created by
compression, among others. In digital sound recordings,
the effect of lossy compression is often cited as
accounting for audible artifacts, although several other
types of artifacts may also be present.
Term: Artifact (defect)
181. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
A sampling effect that leads to spatial frequencies being
falsely interpreted as other spatial frequencies.
Term: Aliasing
183. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Data compressed using a lossless compression
technique will allow the decompressed data to be exactly
the same as the original data before compression, bit for
bit.The compression of data is achieved by coding
redundant data in a more efficient manner than in the
uncompressed format. The Compression ratios that can
be achieved with lossless compression are generally
much lower than those that can be achieved using lossy
compression techniques.
Term: Compression, lossless
185. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Data compressed using a lossy compression technique
results in the loss of information. The decompressed data
will not be identical to the original uncompressed data.
Conservative lossless compression can result in a form
of lossy compression referred to as visually lossless
compression.
Term: Compression, lossy
187. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
A form or manner of lossy compression where the data
that is lost after the file is compressed and
decompressed is not detectable to the eye; the
compressed data appearing identical to the
uncompressed data.
Term: Compression, visually lossless
189. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The ratio of a files uncompressed size over its
compressed size. A file compressed ten-fold over its
uncompressed size would be described as having a ten-
to-one compression, expressed as 10:1. Some formats
such as JPEG and JPEG 2000 allow the user to specify
the compression ratio.
Term: Compression ratio
191. III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Federal Agencies Digitization Guidelines Initiative. A
collaborative effort by federal agencies formed as a
group in 2007 to define common guidelines, methods,
and practices to digitize historical content in a sustainable
manner.
Term: FADGI
192. A Little Bit about Smithsonian …
Rapid Capture and 3D Imaging
194. Courtesy of Keri Thompson, SIL
Courtesy of Karen Weiss, AAA
feat. Archives of American Art case study
195.
196. What Is Rapid Capture?
• Rapid Capture is more than just taking a digital picture
of an object or specimen. . .
• Rapid Capture Workflows are comprehensive, end-to-
end digitization workflows.
• Rapid Capture workflows follow a collection object or
specimen from its shelf in permanent storage all the
way to its potential destination as a virtual object online,
available for access by the public.
07.1 Smithsonian
198. 123456AB
C
123456ABC
CDIS
DAMS Hot
Folder IngestDAMS
TMS
IDS
Staging
Rapid Capture
Digitization:
Object & Data
Workflow
Barcode put in
Filename and/or IPTC
Title field:
Object Path
Data Path
DataMatrix
Barcode
Generates Derivative
Media Image
Generates Derivative
Metadata
123456AB
C
123456AB
C
123456ABD
123456ABD
123456ABE
123456ABE
199. • Show what rapid capture looks like
Rapid Capture In Action
NMAH Numismatics
Hillery York, NMAH Collection Mgr, moves objects from staging to the capture station
200. Rapid Capture Impact
Access:
From the shelf to the public in less
than 24 hours.
Throughput:
Flat objects: 100,000+ to1.8M per year
Non-flat objects/specimens: 30,000 to60,000 per year
07.1 Smithsonian
201. What we’ve learned
Moving fast requires a holistic approach.
Moving
Collections
Digitizing
Collections
Moving
Data
Object handling,
cleaning, etc.
Dedicated hardware,
Quality control
Network / Systems
07.1 Smithsonian
215. 08 Resources
You Are Not Alone!
There are a wealth of
resources to help with
digitization project of
all types …
Resources
216. 08 Resources
Digital Library Federation
Strategy meets practice at the
Digital Library Federation (DLF).
Through its programs, working
groups, and initiatives, DLF
connects the vision and research
agenda of its parent organization,
the Council on Library and
Information Resources (CLIR), to an
active and exciting network of
practitioners working in digital
libraries, archives, labs, and
museums. DLF is a place where
ideas can be road-tested, and from
which new strategic directions can
emerge.
Resources
https://www.diglib.org/
217. 08 Resources
Museums and the Web
The Museums and the Web
Bibliography comprises all papers
published on MW conference websites
or in annual selected proceedings.
Entries can be filtered by year and are
listed alphabetically by the primary
author's name. Clicking a paper title
shows details including an abstract and
a live URL link if appropriate. Clicking an
author's name lists all papers by that
author. This bibliography is a work in
progress as we standardize all entries..
Resources
http://www.museumsandtheweb.com/
218. 08 Resources
Federal Agencies Digitization Guidelines Initiative
Federal Agencies Digitization guidelines
Initiative. Formed as a group in 2007 to
define common guidelines, methods,
and practices to digitize historical
content in a sustainable manner. Two
separate working groups were formed.
The Federal Agencies Still Image
Digitization Working Group will
concentrate its efforts on image content
such as books, manuscripts, maps, and
photographic prints and negatives.
The Federal Agencies Audio-Visual
Working Group is focusing its work on
sound, video, and motion picture film.
Resources
http://www.digitizationguidelines.gov/
221. THANKS
TO…
• Sarah Osborne Bender
• Smithsonian Digitization Program Office
Günter Waibel
Adam Metallo
Vincent Rossi
• Richard Naples (Smithsonian Libraries)
• Keri Thompson (Smithsonian Libraries)
• Jacqueline Chapman (Smithsonian Libraries)
Notas del editor
53
The old prioritization methodology focused on whittling down collections to a manageable size.
The new prioritization methodology is focused more on the physical characteristics of collections; identifying large, homogenous collections.
These are known as “digistreets”
The opportunity is the massive economies of scale that industrial-scale digitization provides.
- Many of you have seen our Rapid Capture Projects during Open Houses and as of now, 5 units have been deeply immersed in Rapid Capture through our Pilot Projects. . .
But for those that haven’t had the opportunity to see Rapid Capture in action, let me give a quick definition of what we’re doing. . .
First and foremost. . .
These are known as “digistreets”
The opportunity is the massive economies of scale that industrial-scale digitization provides.
Rapid Capture workflows follow a collection object or specimen from its shelf in permanent storage, to its digitization, to storage of the image in DAMS and of the collection records in your unit’s CIS, and finally, all the way to its potential destination as a virtual object online, available for access by the public.
In addition to the various types of objects we’ve digitized at each pilot project, we’ve also integrated various SI collections management systems into our rapid capture workflows to include SIRIS at Gardens, TMS at NMAAHC & Freer Sackler and MIMSY-XG at American History.
Additionally, during each pilot project we’ve introduced new techniques, technologies and processes. For example at Gardens we introduced new tools that support Quality Control in the digitization workflow and at American History we integrated Transcription Center into the Rapid Capture workflow.
In our next pilot project with Natural History we’ll integrate the EMu Collection Management System into our workflow, and we’ll also introduce barcoding into the rapid capture workflow
The impact of these RCPP have shown us several things. . .
- In typical Rapid Capture workflows, its possible for an object or specimen to go from permanent storage, where it hasn’t been seen by the public in years, decades or perhaps ever, to publicly accessible Smithsonian websites in as little as a few hours!
- And because Rapid Capture workflows are fine tuned, improved & continuously optimized at every step of the way, depending on the collection object or specimen, high quality digital assets can be generated at throughput rates of 150 objects per day to 700 to 1,300 specimens, and upwards to 6,000 objects or specimens per day!
To take advantage of the opportunities presented by comprehensive, end-to-end rapid capture workflows, we’ve learned we need to take a holistic approach to our workflows; workflows which include not only the digitization process itself, but the object or specimen handling that comes before it and the movement of mass amounts of data that come after it.
We’ve shown what we can do with Human Driven Rapid Capture Workflows. . .
… and the DPO is actively researching technologies available in the digitization community from around the world. . .
To include semi-automated conveyor based digitization systems that have demonstrated their efficiency with high volume, flat collections.
As well as robotic systems that can digitize small objects such as insects and large objects such as painting at extremely high resolutions.
The bottom line is we’re constantly looking for ways to expand our capabilities, improve efficiencies and reduce costs all while maintaining the highest quality levels.