SlideShare una empresa de Scribd logo
1 de 221
Martin R. Kalfatovic | 8 June 2016 | Chapel Hill, NC
DIGITIZATION
BASICS for
LIBRARIES
ARCHIVES
MUSEUMS
and
ARCHIVES
MUSEUMS
LIBRARIES
and
• A bit about me
• About this session
• Why do we digitize?
• Context: Libraries,
Archives and
Museums (LAMs)
Introduction
01.1 Introduction
Just a bit about me ...
@UDCMRK
• Twitter@udcmrk
• www.udc793.org
• www.linkedin.com/i
n/martinkalfatovic
• … an inordinate
fondness for Dodos
01.2.1 Introduction
About this session
Taking a broad overview of standards, lingo, hardware, software and
planning considerations, this session will get everyone current and
on a level playing field to proceed through the subsequent topics.
The session will help you establish the foundational vocabulary to
both enrich your SEI experience and increase your capacity to
communicate with your colleagues about the basics of digital
reformatting.
The session will introduces the practices, standards, and challenges
evident across the spectrum of cultural heritage institutions
acquiring, managing, and providing access to digital collections. The
session considers the digital curation life-cycle as well as lightly
touching on funding and aggregators.
About this session
1. Lecture/Discussion
2. Vocabulary Building
3. Exercises
01.2.2 Introduction
01.3 Introduction
Context
Libraries | Archives | Museums
In principle, the work of art has always
been reproducible. Objects made by
humans could always be copied by
humans. Replicas were made by
pupils in practicing for their craft, by
masters in disseminating their works,
and, finally, by third parties in pursuit of
profit. But the technological
reproduction of artworks is something
new. Having appeared intermittently in
history, at widely spaced intervals, it is
now being adopted with ever-
increasing intensity.
Das Kunstwerk im Zeitalter seiner technischen
Reproduzierbarkeit. Walter Benjamin (1936)
01.3 Introduction
Context
Libraries | Archives | Museums
Digitization has magnified
our ability to reproduce art,
books, and even objects,
with increasing rapidity,
ease, and added
functionality.
01.4 Introduction
• Provide online access to
collections
• Make digitized material and
metadata available through
online catalogs AND for reuse on
other platforms.
• Maximize value to the largest
audience in new and creative
ways.
• Advance the preservation by
reducing wear and tear on the
originals.
Why do we digitize?
Based on NARA strategic plan
01.4 Introduction
Why do we digitize?
Based on NARA strategic plan
• Provide access to those
materials that can no longer be
accessed in their original format.
• Maximize the efficient and
effective use of resources to
carry out digitization and achieve
cost-saving benefits whenever
possible.
• Improve our service to
customers by responding to their
evolving expectations
• Let’s all count in
binary!
• Analog vs. Binary :
Wave vs. Sample
• Bytes vs. Bits
Basics
02.1.1 Basics
BASE 10 (Decimal)
0 1 10 100 1,000 10,000 100,000 1,000,000
BASE 2 (Binary)
0 1 2 4 8 16 32 64 128 256 512 1,024 2,048 5,096
BASE 3 (Ternary) – not on the test!
0 1 2 3 9 27 81 243 729 2,187 6,561 19,683
Basics
Let’s All Count in Binary!
02.1.2 Basics
Basics
Let’s All Count in Binary!
1 - 1
2 - 10
3 - 11
4 - 100
5 - 101
6 - 110
7 - 111
8 - 1000
9 - 1001
10 - 1010
11 - 1011
12 - 1100
13 - 1101
14 - 1110
15 - 1111
16 - 10000
17 - 10001
18 - 10010
19 - 10011
20 - 10100
91 - 1011011
92 - 1011100
93 - 1011101
94 - 1011110
95 - 1011111
96 - 1100000
97 - 1100001
98 - 1100010
99 - 1100011
100 - 1100100
02.1.3 Basics
Basics
Let’s All Count in Binary!
02.1.4 Basics
Basics
Let’s All Count in Binary!
Exercises
Exercise 1: Let’s Convert Between Binary and Decimal
02.2 Basics
Analog vs. Binary : Wave vs. Sample
By Hyacinth - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=30716342
By Hyacinth - Own work, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=23867344
02.3 Basics
Bytes vs Bits
BIT
A bit is the smallest unit of information that can be stored or
manipulated on a computer; it consists of either zero or one.
AKA a bit a binary digit, especially when working with the 0 or
1 values.
BYTE
A byte is how many bits are needed to represent letters of
the alphabet and other characters. For example, the letter
"A" would be 01000001. 8 bits = 1 byte
WORD
Groups of 4 Bytes (translated into Hexadecimal – Base 16!),
e.g. 4B 4A 57 00 = K J W <null>
02.4 Basics
Naming the Bytes
1 Bit = Binary Digit
8 Bits = 1 Byte
1000 Bytes = 1 Kilobyte
1000 Kilobytes = 1 Megabyte
1000 Megabytes = 1 Gigabyte
1000 Gigabytes = 1 Terabyte
1000 Terabytes = 1 Petabyte
1000 Petabytes = 1 Exabyte
1000 Exabytes = 1 Zettabyte
1000 Zettabytes = 1 Yottabyte
1000 Yottabytes = 1 Brontobyte
1000 Brontobytes = 1 Geopbyte
https://flic.kr/p/Knb8k
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Term: Binary
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
In mathematics and digital electronics, a binary number is a
number expressed in the binary numeral system or base-2
numeral system which represents numeric values using two
different symbols: typically 0 (zero) and 1 (one). The base-2
system is a positional notation with a radix of 2. Because of its
straightforward implementation in digital electronic circuitry
using logic gates, the binary system is used internally by almost
all modern computers and computer-based devices. Each digit
is referred to as a bit.
Term: Binary
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Term: Analog
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
An analog signal has a theoretically infinite resolution. In
practice an analog signal is subject to electronic noise and
distortion introduced by communication channels and signal
processing operations, which can progressively degrade the
signal-to-noise ratio (SNR). In contrast, digital signals have a
finite resolution.
Term: Analog
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Term: Byte
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
The byte is a unit of digital information that most commonly
consists of eight bits. Historically, the byte was the number of
bits used to encode a single character of text in a computer and
for this reason it is the smallest addressable unit of memory in
many computer architectures. The size of the byte has
historically been hardware dependent and no definitive
standards existed that mandated the size. The de facto standard
of eight bits is a convenient power of two permitting the values 0
through 255 for one byte.
Term: Byte
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Term: Digitization
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
The process of recording an analog signal in a digital form. In
relation to content of this site, it describes the process of
translating analog signal data emanating from an object (light or
sound) into a digitally encoded format. Audio, still and moving
images are commonly digitized for increased access or for
preservation purposes.
Term: Digitization
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Term: Brontobytes
I
DIGITIZATION VOCABULARY
Vocabulary I: Introduction and Basics
Definition:
The prefix bronto, as used in the term brontobyte, has been
used to represent anything from 1015 to 1027 bytes, most often
1027
Term: Brontobytes
• Get ready!
• Prioritization: Some
things are more
equal than others
• Tools to help you
Get Ready and
Prioritize
• Copyright!
Measure Twice, Cut Once
03.1 Measure Twice …
Measure Twice, Cut Once
Get ready!
Staffing Resources
Acknowledge that digitizing for public access is a significant
business process that crosses multiple business units.
Develop a separate human resource plan to support this
digitization business process.
•
• IT Infrastructure
• Along with staffing, require an IT plan to support digitization
that includes bandwidth, storage, the ability to share images
and metadata across business units, among other
requirements.
03.1 Measure Twice …
Policy and Guidance for Digitization Activities
• Promulgate policy and guidance that provides further
implementation direction as business units begin
implementing the strategy
• Technical Digitization Standards
• Develop technical digitization requirements for the
approaches outlined above to ensure uniformity and
standardization.
• Funding Strategies
• Seek out and explore other options and relationships to
digitize and make content available
Measure Twice, Cut Once
Get ready!
03.2 Measure Twice …
Candidates for digitization projects will be prioritized according
to established criteria for significance and use.
•
• Candidates for digitization projects will be prioritized in order
to achieve a demonstrated high priority preservation benefit
for the agency.
•
• Funding is available or likely to be available and sustainable
for the project.
Measure Twice, Cut Once
Prioritization: Some things are more equal
than others
03.3 Measure Twice
Starting up
Tools to help you Get ready & Prioritize
• Digitization Plans
• Digital Asset Management Plan
• Web Access Plan
03.4 Measure Twice …
Copyright
https://xkcd.com/14/
Sometimes I just can't get outraged over copyright law ...
Copyright
To promote the Progress
of Science and useful Arts,
by securing for limited
Times to Authors and
Inventors the exclusive
Right to their respective
Writings and Discoveries.
Article I, Section 8, Clause 8 of the United
States Constitution
… but most of the time I am … but still …
03.4 Measure Twice …
Staffing a digitization project
• How
• Who
04.1 Staffing
Staffing a digitization project
Depending on the size of
the institution, staff
members may fill a number
of roles. Also, do not forget
that in addition to your
regular staff, your
volunteers, interns, and
student help can participate
in the digitization process
(with the proper training
and supervision).
04.2 Staffing
Staffing a digitization project
How
• In-house staffing
• Outsourcing
• Hybrid approach
04.3 Staffing
Staffing a digitization project
Who
• Director / CEO
• Project Manager
• Curator
• Technical Staff
• Conservator
• Scanning Operators
04.3.1 Staffing
Staffing a digitization project
Director / CEO
As with any LAM activity,
the overall responsibility for
all functions ultimately rests
with the director. Strong
leadership and vision for
digitization is necessary for
a successful program.
04.3.2 Staffing
Staffing a digitization project
Project Manager
Manage goals,
expectations, identify
further staffing, equipment,
liaison between
departments and staff,
create workplans and
associated documents,
manages funds.
04.3.3 Staffing
Staffing a digitization project
Curator
Or, the person in charge of
a collection. In addition to
their responsibilities of
caring for the collections,
curators are also generally
responsible for the display
of the objects in coherent
and informative or
educational ways.
04.3.4 Staffing
Staffing a digitization project
Technical Staff
Database development,
web/database integration,
CGI (Common Gateway
Interface) script writing, Perl
programming, and related
activities that simplify the
process of getting objects
to the scanning operations
and the resulting files in a
usable state.
04.3.5 Staffing
Staffing a digitization project
Conservator
Depending upon the types
of collections, consultation
with the preservation /
conservation staff in varying
degrees will be necessary
to determine if (and how)
the items can be digitized
and/or photographed.
04.3.6 Who
Staffing a digitization project
Scanning Operators
Scanning, photography,
handling materials
(packing, shipping) and
other such skills are just a
few that may be required of
staff doing the actual
conversion.
Getting Technical
• Pixel tricks
• Color and not color
• File formats
• Cost and
implementation
factors
• File naming
• Compression
• Bundling file
formats
05.1 Getting Technical
Getting Technical
Pixel Tricks
Pixel
An abbreviation of picture element,
this term may refer to a component
of either a digital image or a digital
sensor. In the case of a digital
image, the pixel is the smallest
discrete unit of information in the
image's structure. In the case of the
sensor in a scanner or digital
camera, a pixel is the smallest
photosensitive component or cell
providing a response to light (or
photons).
05.1.1 Getting Technical
Getting Technical
Why do we care?
Remember back to wave vs.
sample? Pixels can be thought
of as those elements of the
samples that fall within the
wave.
Getting Technical
05.1.1 Getting Technical
Analog vs. Binary : Wave vs. Sample
Exercises
Exercise 2: Worksheet: Calculate PPI
05.1.3 Getting Technical
Getting Technical
Pixel Parameters
Sampling Frequency
This parameter measures the physical
pixel count in pixels per inch (ppi), pixels
per mm, etc. This parameter informs us
about the size of the original and also
provides part of the data needed to
determine the level of detail recorded in
the file.
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.1.3 Getting Technical
Getting Technical
Pixel Parameters
Sharpening
Sharpening artificially enhances details to
create the illusion of greater definition.
There are three major sharpening
processes in a typical imaging pipeline:
capture sharpening (through camera
setting adjustment), image sharpening in
post processing, and output sharpening
for print or display purposes. Sharpening
is usually implemented through image
edge enhancement, such as filtering
techniques using unsharp masks and
inverse image diffusion.
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.1.3 Getting Technical
Getting Technical
Sharpening
Basic image Autosharpen Extreme sharp
05.1.3 Getting Technical
Getting Technical
Pixel Parameters
Reproduction Scale Accuracy
This parameter measures the relationship
between the size of the original object to
the size of that object in the digital image.
This parameter is measured in relation to
the pixels per inch (ppi) or pixels per mm
(ppmm) of the original digital capture.
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.2 Getting Technical
• Bitonal
• Grayscale
• Color
• Additive color
• Subtractive color
Getting Technical
Color and Not Color
05.2.1 Getting Technical
Getting Technical
Color and Not Color
RGB Color Grayscale Bitonal
05.2.2 Getting Technical
Getting Technical
Color and Not Color: CMYK
The CMYK color model (process color,
four color) is a subtractive color
model, used in color printing, and is
also used to describe the printing
process itself. CMYK refers to the four
inks used in some color printing: cyan,
magenta, yellow, and key (black).
Though it varies by print house, press
operator, press manufacturer, and
press run, ink is typically applied in the
order of the abbreviation.
By Viliam Furík - Own work, CC BY-SA 4.0,
https://commons.wikimedia.org/w/index.php?curid=39316936 (upper)
By SharkD at English Wikipedia Later version uploaded by Jacobolus, Dacium
at en.wikipedia. - Transferred from en.wikipedia to Commons., Public Domain,
https://commons.wikimedia.org/w/index.php?curid=3791468 (lower)
05.2.3 Getting Technical
Getting Technical
Color and Not Color: RBG
The RGB color model is an additive
color model in which red, green, and
blue light are added together in various
ways to reproduce a broad array of
colors. The name of the model comes
from the initials of the three additive
primary colors, red, green, and blue.
The main purpose of the RGB color
model is for the sensing,
representation, and display of images
in electronic systems, such as
televisions and computers
Wikipedia https://en.wikipedia.org/wiki/RGB_color_model
05.2.4 Getting Technical
Getting Technical
Color and Not Color: RBG vs. CYMK
By RGB_CMYK_4.jpg: Annette Shacklett derivative work: Marluxia.Kyoshu [Public domain], via Wikimedia Commons
A comparison of RGB and CMYK color
spaces. The image demonstrates the
difference between the RGB and CMYK
color gamuts. The CMYK color gamut is
much smaller than the RGB color gamut,
thus the CMYK colors look muted. If you
were to print the image on a CMYK device
(an offset press or maybe even a ink jet
printer) the two sides would likely look
much more similar, since the combination
of cyan, yellow, magenta and black cannot
reproduce the range (gamut) of color that
a computer monitor displays. This is a
constant issue for those who work in print
production. Clients produce bright and
colorful images on their computers and
are disappointed to see them look muted
in print.
05.3 Getting Technical
• Master / Archival /
Access
• Image file types
Getting Technical
File Formats
Getting Technical
File Formats
Master
A digital file (images, video, audio)
which has been stored in its original
captured state. These master files
are also referred to as master
copies, preservation masters or
preservation copies.
05.3.1 Getting Technical
Getting Technical
File Formats
Archival
A file that is composed of one or
more computer files along with
metadata. Archive files are used to
collect multiple data files together
into a single file for easier portability
and storage, or simply to compress
files to use less storage space.
Archive files often store directory
structures, error detection and
correction information, arbitrary
comments, and sometimes use
built-in encryption.
05.3.1 Getting Technical
Getting Technical
File Formats
Access
Often used to for low resolution
images, (thumbnails, screen images)
that are made available the Internet.
See also delivery copy and surrogate
image. This could be an identical
copy of the original file or perhaps a
lower quality version with a smaller
file size. Sometimes called delivery or
surrogate or derivative.
05.3.1 Getting Technical
• RAW
• TIFF
• JPEG
• JPEG 2000
• PNG
• PDF
• GIF
Getting Technical
Image file types
05.3.2 Getting Technical
File Formats
Three Factors
• Cost
• System
Implementation
• Sustainability
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.4 Getting Technical
File Formats
Cost Factors
• Implementation Cost
• Cost of Software Tools
• Cost of equipment needed to
produce files
• Storage Cost
• Network Cost
• Ongoing Cost of Production
• Cost of Providing Access
• Cost of Preservation
Processing
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.4.1 Getting Technical
File Formats
Implementation Factors
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
• Level of difficulty/complexity
• Technical Complexity
• Toolset Complexity
• Availability of tools
• Ease and accuracy for OCR
• Ease and accuracy of File
validation
• Ease and accuracy of
monitoring of quality
05.4.2 Getting Technical
File Formats
Sustainability Factors
• Disclosure
• Adoption
• Transparency
• Self-Documentation
• Native Embedded Metadata Capabilities
• Embedded Metadata Capabilities Through
Extension
• Level of Work Necessary to Embed Native
Metadata
• Level of Work Necessary to Embed
Metadata Through Extension
• Geo-referencing Metadata
• Level of Effort to Embed Geo-referencing
Metadata
• Impact of Patents
• Technical Protection Mechanisms
Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
05.4.3 Getting Technical
Exercises
Exercise 3: Deep Dive: FADGI File Format Sheet
File Formats
File Naming (Zhèngmíng | 正名)
05.5 Getting Technical
The Master replied, “What is necessary
is to rectify names.” “So! indeed!” said
Tsze-lu. “You are wide of the mark! Why
must there be such rectification?”
“Therefore a superior man considers it
necessary that the names he uses may
be spoken appropriately, and also that
what he speaks may be carried out
appropriately. What the superior man
requires is just that in his words there
may be nothing incorrect.”
From The Analects of Confucius, Book 13, Verse 3 (James R. Ware, translated in
1980):
File Formats
File Naming: Some Guidelines
Semantic Names
There is meaning encoded in the
name, like:
• “BCA_03_04_00_145”
• Syrnium fulvescens from
Biologia Centrali-Americana).
Aves. Volume IV (1879-1904) by
Osbert Salvin and F. DuCane
Godman (4th volume of the 3rd
part of Biologia Centrali-
Americana, plate 145
05.5.1 Getting Technical
BCA_03_04_00_145.jpg
File Formats
File Naming: Some Guidelines
Practical
• Use barcodes, accession
numbers, etc.
•
“39088002738714-0003”
• Wiener Farbenkabinet (1794)
05.5.2 Getting Technical
39088002738714-0003.jpg
File Formats
File Naming: Three Parts
• Prefix
• Ordinal Position
• Suffix
05.5.3 Getting Technical
39088002738714-0003.jpg
File Formats
File Naming: Three Parts
Prefix
Bca_03_04_00
This is the part
that’s either
semantic or
practical
05.5.3 Getting Technical
Ordinal
145
Position of the
item in relation
to a compound
object
Suffix
jpg
File type
Bca_03_04_00_145.jpg
File Formats
File Naming: Last Thoughts
05.5.3 Getting Technical
• Stick with three letter
extension for the suffix
(.tif, .jpg, .jp2, .png)
• Keep file names the same
length (padding with Zeros
not spaces!)
• Better to be consistent
than right!
05.6 Technical
File Formats
Compression / Lossy / Lossless
Some file format types,
specifically JPEG, JPEG
2000, and TIFF, allow you to
compress the file size.
Compression leads to loss of
data (since you are making
the files smaller).
Be aware of this data loss
when compressing files.
05.7 Technical
File Formats
Bundling file formats
When you need to move
many associated files around,
you may wish to “bundle”
them to pull them all together
into one package. Common
formats are:
• ZIP
• TAR
• RAR
• 7z
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Pixel
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
An abbreviation of picture element, this term may refer to
a component of either a digital image or a digital sensor.
In the case of a digital image, the pixel is the smallest
discrete unit of information in the image's structure. In the
case of the sensor in a scanner or digital camera, a pixel
is the smallest photosensitive component or cell
providing a response to light (or photons).
Term: Pixel
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Pixilation
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
A term used to describe the an abrupt and unnatural
transition over and edge feature. Also referred to as
"staircasing" because of the jagged and abrupt transition.
Term: Pixilation
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: PPI
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
PPI stands for pixels per inch, commonly used in
describing the resolution capabilities of an imaging
device such as a scanner or the resolution of a digital
image. The terms DPI (dots per inch) and PPI are used
somewhat interchangeably today.
Term: PPI
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Pixel Dimensions
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Pixel Dimensions are the horizontal and vertical
measurements of an image expressed in pixels. The pixel
dimensions may be determined by multiplying both the
width and the height by the DPI. A digital camera will also
have pixel dimensions, expressed as the number of
pixels horizontally and vertically that define its resolution
(e.g., 2,048 by 3,072). Calculate the DPI achieved by
dividing a document's dimension into the corresponding
pixel dimension against which it is aligned.
Term: Pixel Dimensions
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Color
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Color is the visual perceptual property corresponding in
humans to the categories called red, blue, yellow, etc.
Color derives from the spectrum of light (distribution of
light power versus wavelength) interacting in the eye with
the spectral sensitivities of the light receptors. Color
categories and physical specifications of color are also
associated with objects or materials based on their
physical properties such as light absorption, reflection, or
emission spectra.
Term: Color
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Bit depth (image)
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
The number of bits used to represent each pixel in an image. The term
can be confusing since it is sometimes used to represent bits per pixel
and at other times, the total number of bits used multiplied by the
number of total channels. For example, a typical color image using 8
bits per channel is often referred to as a 24-bit color image (8 bits x 3
channels). Color scanners and digital cameras typically produce 24 bit
(8 bits x 3 channels) images or 36 bit (12 bits x 3 channels) capture,
and high-end devices can produce 48 bit (16 bit x 3 channels) images.
A grayscale scanner would generally be 1 bit for monochrome or 8 bit
for grayscale (producing 256 shades of gray). Bit depth is also referred
to as color depth.
Term: Bit depth (image)
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: CMYK
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
A subtractive color model used in printing that is based
on cyan (C), magenta (M), yellow (Y) and black (K).
These are typically referred to as process colors. Cyan
absorbs the red component of white light, magenta
absorbs green, and yellow absorbs blue. In theory, the
mix of the three colors will produce black, but a black ink
is used to increase the density of black in a print.
Term: CMYK
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: RGB
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
An additive color model based on the three primary
colors of red (R), blue (B) and green (G).
Term: RGB
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Grayscale
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Grayscale is a range of monochromatic shades from
black to white. Therefore, a grayscale image contains
only shades of gray and no color.
Term: Grayscale
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Bitonal
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
A bitonal image is represented by pixels consisting of 1
bit each, which can represent two tones (typically black
and white), using the values 0 for black and 1 for white or
vice versa.
Term: Bitonal
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Continuous tone
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Generally referring to pictorial images where there is a
non-broken range of tones from white to black that may
have every shade of gray represented. There are
theoretically an infinite number of tones. Traditional
photography (photochemical photography) produces
continuous tone images. When reformatting pictorial
items, it is important to distinguish continuous tone
originals from printed halftones, since these two classes
are likely to require different strategies and methods for
making the digital images.
Term: Continuous tone
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Color Space
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
The choice of colorspace determines how many unique
colors are potentially possible in your digital file, and how
fine the gradations are between shades of color.
Each colorspace was designed for a specific purpose,
none is superior to the others for all applications.
However, FADGI recommends selecting an appropriate
colorspace from the recommendations in the charts in
this document.
Term: Color Space
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: OCR
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Optical Character Recognition (OCR) is a technology that
allows dots or pixels representing machine generated
characters in a raster image to be converted into digitally
coded text. In addition to recognizing and coding text,
OCR programs attempt to recognize and code the
structural elements of a document page, such as
columns and non-text graphical elements.
Term: OCR
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Archival master file
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
File that represents the best copy produced by a
digitizing organization, with best defined as meeting the
objectives of a particular project or program. These
objectives differ from one content category to another
and the specifications to be recommended at this Web
site (forthcoming) will be tailored to fit a variety of
common categories and objectives. In some cases, an
archive may produce more than one archival master file.
Term: Archival master file
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Master File Format
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
The choice of master file format is a decision which
affects how your digitized materials can be used and
managed. There is no one correct master file format for
all applications, all format choices involve compromises
between quality, access and lifecycle management. The
FADGI star system tables list the most appropriate
master file formats for each imaging project type.
Selection of the most appropriate format within these
recommended choices is an important decision that
should be consistent with your long term archive strategy.
Term: Master File Format
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Access File Format
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
FADGI anticipates continual evolution in the availability of
access file formats, each new format designed to provide
specific advantages over others for a specific application.
Care should be taken when selecting access formats to
insure long term viability.
Term: Access File Format
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Vector graphics
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
Vector graphics is the use of polygons to represent
images in computer graphics. Vector graphics are based
on vectors, which lead through locations called control
points or nodes. Each of these points has a definite
position on the x and y axes of the work plane and
determines the direction of the path; further, each path
may be assigned a stroke color, shape, curve, thickness,
and fill.
Term: Vector graphics
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Term: Bundling file format
II
DIGITIZATION VOCABULARY
Vocabulary II: Getting Technical
Definition:
These formats encapsulate their constituent files and,
save for a directory that provides the filenames, do not
describe the content and the relationships that may
obtain between files. (In this, they differ from what are
often called wrappers.) Archetypes include ZIP, StuffIt,
and TAR, the latter associated with the UNIX operating
system. Simple bundling formats tend to be generic, i.e.,
they may be used for a wide range of content types.
Term: Bundling file format
Do it yourself (DIY)
06.1 DIY
Do It Yourself!
Describing the Collection
06.1.1 DIY
Do It Yourself!
Describing the Collection
06.1.2 DIY
Do It Yourself!
Describing the Collection
The sheer number of metadata
standards in the cultural heritage
sector is overwhelming, and their
inter-relationships further
complicate the situation. This
visual map of the metadata
landscape is intended to assist
planners with the selection and
implementation of metadata
standards
Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley
06.1.2 DIY
Do It Yourself!
Describing the Collection
Each of the 105 standards listed
here is evaluated on its strength
of application to defined
categories in each of four axes:
community, domain, function, and
purpose. The strength of a
standard in a given category is
determined by a mixture of its
adoption in that category, its
design intent, and its overall
appropriateness for use in that
category.
Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley
06.2 DIY
Do It Yourself!
Digitization Tools
06.2.1 DIY
Do It Yourself!
Scanner vs. Camera
The World’s First Digital
Camera (1975) by Kodak and
Steve Sasson
06.2.1 DIY
Do It Yourself!
Camera
At the heart of a digital
camera is the sensor. The
size and density of the sensor
determines the pixel count of
the resulting image.
The sensor in combination
with an optical lense creates
the digital image.
By C-M - own Image, CC BY-SA 3.0,
https://commons.wikimedia.org/w/index.php?curid=2150801
06.2.3 DIY
Do It Yourself!
Scanner
A scanner is a much different
device; the sensor array is
long and thin.
By moving across the target
at varying speeds and angles,
higher or lower resolution
outputs can be generated.
By Scanner_a_plat_fonctionnement.png: User:Jean-noderivative work: Pluke (talk) -
Scanner_a_plat_fonctionnement.png, FAL, https://commons.wikimedia.org/w/index.php?curid=17009063
06.2.4 DIY
Do It Yourself!
A Note on Sensors
CCD vs CMOS
Charge-Coupled Devices vs.
Complementary Metal–
Oxide–Semiconductor
Both types of sensor
accomplish the same task of
capturing light and converting
it into electrical signals.
By Filya1 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6304562
06.3 DIY
Do It Yourself!
Digitization Equipment
There are many options for
equipment to create digital
files from collections objects:
• Scanners
• Slide/Negative scanners
• Specialized tools
The World’s First Digital Camera (1975) by Kodak and Steve Sasson
06.3.1 DIY
Do It Yourself!
Digitization Equipment: Flatbed Scanners
• 25-50 Page Automated
Document Feeder
• Flat Bed Scanning
• Support for either TWAIN and/or
ISIS interface drivers
• USB or SCSI Interface
• Support for largest expected
documents
• Duplex (Automatic) scanning
(two sides at one pass)
• Optical 200 x 200 - 600 x 600
Dots Per Inch (DPI)
06.3.2 DIY
Do It Yourself!
Digitization Equipment: MFM / Slide / Neg
Remember to keep in mind
your original size with doing
transparencies and
negatives. You will need to
scan at a much higher PPI to
create an image closer in
size to the original for display
or printing
• Microfilm scanners
• Slide & Negative scanners
06.3.4 DIY
Do It Yourself!
Specialized Tools
• Digital cameras
• Scanning back
cameras
• High speed book
scanners
• 3D scanners
06.3.4.1 DIY
Do It Yourself!
Digital Cameras
Digital cameras come if a
vast array of sizes,
shapes, and formats.
DT ATOM
Frame footprint: 24.625” wide x 31” deep
With table top: 25.625” wide x 31.25” deep
Height with Column: 65.5”
Light Arm Span: 78.5”
06.3.4.1 DIY
Do It Yourself!
Digital Cameras
DT BC100
Materials
Frame is made from black anodized extruded
aluminum, custom brackets are made from
black anodized aircraft grade aluminum.
Overall Dimensions
7’H x 6’5” W x 5’D
Footprint
6’W x 5’D
Glass Platen Dimensions
24.9” x 17.48” on each side
Book Binding Limitations
6” binding
Working Table Height
30”
Accessory & Monitor Shelf Dimensions
Side shelves are 19” x 34” black laminate
Compressor
.5 HP with a 6.3 gallon tank
06.3.4.2 DIY
Do It Yourself!
Scanning Back Cameras
Scanning back cameras
provide generally higher
resolution by replacing the
sensor array with a sensing
device that scans across the
image created by the camera
lens.
• More megapixel images
• Slower scanning times
XF Phase One
DT RCam
DT RG3040
Images from Digital Transitions
http://dtdch.com/
06.3.4.3 DIY
Do It Yourself!
High Speed Book Cameras
For high throughput, robotic
scanners are available for
some materials, such as
books.
06.3.4.4 DIY
Do It Yourself!
3D Equipment
There are a wide variety of 3D
digitization tools and
processes. What type of
equipment and process you
use should be carefully
thought out with the end-use
of the digitization in the
forefront.
Smithsonian 3D Imaging Team
06.4 DIY
Do It Yourself!
Where to do it: The Gray Room
Having dedicated workspace
for your digitization is, of
course, optimal. In reality,
digitization will occur
wherever it is most practical.
Still, if you have the luxury of
dedicated space, here are
some guidelines on building it
out.Internet Archive, San Francisco
06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
The working environment
should be painted/decorated
a neutral, matte gray with a
60% reflectance or less to
minimize flare and perceptual
biases.
FADGI Guidelines
06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
Monitors should be positioned
to avoid reflections and direct
illumination on the screen.
FADGI Guidelines
Smithsonian Libraries Scanning Room
06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
ISO 12646 requires the room
illumination be less than 32 lux when
measured anywhere between the
monitor and the observer, and the
light a color temperature of
approximately 5000K. Consistent
room illumination is a fundamental
element of best practice in imaging.
Changes in color temperature or light
level from a window, for example, can
dramatically affect the perception of
an image displayed on a monitor.
FADGI Guidelines
Smithsonian Libraries Scanning Room
06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
Each digitization station should
be in a separate room, or
separated from each other by
sufficient space and with
screening to minimize the light
from one station affecting
another. It is critically important
to maintain consistent
environmental conditions within
the working environment.
FADGI Guidelines
Internet Archive, San Francisco
06.4.1 DIY
Do It Yourself!
FADGI Space Guidelines
Care should be taken to
maintain the work environment
at the same temperature and
humidity as the objects are
normally kept in. Variations can
cause stress to some materials
and in severe cases may
damage the originals. The use
of a datalogger in both imaging
and storage areas is highly
recommended.
FADGI Guidelines
Smithsonian Libraries Scanning Room
06.5 DIY
Do It Yourself!
Quality Control
Quality Control (QC), or Quality
Assurance (QA), is key to
maintaining the overall quality
and fidelity of any digitization
project. Differing levels of QC
may be needed for the type of
project and materials being
digitized.
In large scale projects, 100%
QC will rarely be feasible.
http://www.sil.si.edu/imagegalaxy/imagegalaxy_imageDetail.cfm?id_image=7403
Exercises
Exercise 4: Quality Control Sampling
06.6 DIY
Do It Yourself!
Storage
Now that we’ve
created all this
data, we need to
store it
somewhere …
https://flic.kr/p/86miWv
06.6.1 DIY
Do It Yourself!
Storage
• Short Term
• Medium Term
• Long Term
https://flic.kr/p/86miWv
06.6.2 DIY
Do It Yourself!
Primary magnetic storage
• Diskettes
• Hard disks (both fixed
and removable)
• High capacity floppy
disks
• Disk cartridges
• Magnetic tape
Smithsonian Data Center
https://flic.kr/p/cGFn2f
06.6.3 DIY
Do It Yourself!
Primary optical storage
• Compact Disk Read
Only Memory (CD
ROM)
• Digital Video Disk Read
Only Memory (DVD
ROM)
• CD Recordable (CD R)
• CD Rewritable (CD
RW)
Smithsonian Data Center
https://flic.kr/p/cGFn4E
06.6.3 DIY
Do It Yourself!
Solid-state storage
Solid-state storage is a type of
non-volatile computer storage
that stores and retrieves digital
information using only
electronic circuits, without any
involvement of moving
mechanical parts. (Wikipedia)
Examples:
• SSD
• Flash driveInternet Archive
https://flic.kr/p/dnDS11
06.6.4 DIY
Do It Yourself!
Acronyms
• DAS
• NAS
• SAN
• DAM
Internet Archive
https://flic.kr/p/8Ms4QV
06.6.4.1 DIY
Do It Yourself!
Direct-attached storage (DAS)
... is a traditional mass
storage, that does not
use any network. This
is still a most popular
approach. This
retronym was coined
recently, together with
NAS and SAN.
(Wikipedia)Internet Archive
https://flic.kr/p/8Ms4QV
06.6.4.2 DIY
Do It Yourself!
Network-attached storage (NAS)
… is mass storage
attached to a computer
which another computer
can access at file level
over a local area network,
a private wide area
network, or in the case of
online file storage, over the
Internet.. (Wikipedia)
Internet Archive
https://flic.kr/p/8Ms4QV
06.6.4.3 DIY
Do It Yourself!
Storage area network (SAN)
... is a specialized network, that
provides other computers with
storage capacity. The crucial
difference between NAS and
SAN is the former presents and
manages file systems to client
computers, whilst the latter
provides access at block-
addressing (raw) leve.
(Wikipedia)
Internet Archive
https://flic.kr/p/8Ms4QV
06.6.4.3 DIY
Do It Yourself!
Digital asset management (DAM)
… consists of
management tasks and
decisions surrounding
the ingestion,
annotation,
cataloguing, storage,
retrieval and
distribution of digital
assets. (Wikipedia)Internet Archive
https://flic.kr/p/8Ms4QV
06.6.5 DIY
Do It Yourself!
Digital Preservation
http://www.xkcd.com/1683/
06.6.5 DIY
Do It Yourself!
Digital Preservation
https://xkcd.com/242/
There are two
kinds of
preservationists:
those who have
lost data and those
who will.
Minimum Digitization Capture
Recommendations (2013)
06.7 DIY
Do It Yourself!
Recap of Scanning
• Scan at best resolution you
can afford to store
• Manuscripts and text: 300
ppi
• Photographs: 400-800 ppi
• Graphic materials: 600-
800 ppi
• Maps: 600 ppi (up to 36”)
or 300-400 pp (greater
than 36”)
• Calibrate monitor and
scanning devicesSmithsonian Libraries Scanning Room
06.8 DIY
Do It Yourself!
Recap of Process
• Create master (uncompressed)
file
• For analog content:
Scan/sample
• For born-digital content:
Convert
• Name the file in a consistent way
• Perform quality control;edit as
needed
• Save master on stable, long-term
storage
• Create derivative or access file
• Share access files as needed
http://xkcd.com/730/
06.9 DIY
Do It Yourself!
Digitization Life-cycle
• Create master (uncompressed)
file
• For analog content:
Scan/sample
• For born-digital content:
Convert
• Name the file in a consistent way
• Perform quality control;edit as
needed
• Save master on stable, long-term
storage
• Create derivative or access file
• Share access files as needed
Biodiversity Heritage Library Digitization Life-Cycle
06.10 DIY
Do It Yourself!
Better, Faster, Cheaper!
Now that we’ve
mastered
digitization. How
do we scale it to
100’s, 1000’s,
millions! of
objects?Smithsonian Natural History Museum
06.10 DIY
Digitization Program Office (Smithsonian)
THE OLD
PARADIGM
This seems like an
interesting and feasible
subset…
06.10 DIY
Digitization Program Office (Smithsonian)
06.10 DIY
Digitization Program Office (Smithsonian)
THE NEW
PARADIGM These homogenous
subsets have rapid-capture
technologies available…
06.10 DIY
Digitization Program Office (Smithsonian)
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Array
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Any orderly arrangement of individual sensor elements.
In digital imaging, there are primarily three array types;
two dimensional or area arrays, one dimensional or linear
arrays, and tri-linear arrays consisting of three
consecutive linear arrays of red, green, and blue
sensitive sensor elements.
Term: Array
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Ambient light
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Light existing in the environment that is not produced by
the imaging system. Ambient light can be natural or
artificial light. Ambient light is generally uncontrolled and
can be highly variable, posing a possible risk to image
quality. The level of ambient light should be minimized in
relation to the level of light produced by the imaging
system.
Term: Ambient light
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Calibration
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The comparison of instrument performance to a standard
of higher accuracy. The standard is considered the
reference and the more correct measure. Calibrations
should be performed against a specified tolerance.
Term: Calibration
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Exif
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Exchangeable image file format (Exif) describes a
metadata set to accompany TIFF, JPEG, and RIFF WAV
formatted image files. Exif was prepared by the Technical
Standardization Committee on AV & IT Storage Systems
and Equipment and is Published by the Japan
Electronics and Information Technology Industries
Association (JEITA ). The Exif 2.2 specification (JEITA
CP-3451) is in nearly universal use by camera
manufacturers.
Term: Exif
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: IPTC Metadata
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Embedded metadata used for image management. IPTC
metadata is primarily composed of descriptive,
administrative, and rights metadata, as opposed to the
technical nature of Exif. IPTC metadata was developed
and is controlled by the IPTC.
Term: IPTC Metadata
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: METS
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The Metadata Encoding and Transmission Standard
(METS) is a metadata standard for encoding descriptive,
administrative, and structural metadata regarding objects
within a digital library, expressed using the XML schema
language of the World Wide Web Consortium (W3C). The
standard is maintained as part of the MARC standards of
the Library of Congress, and is being developed as an
initiative of the Digital Library Federation (DLF).
Term: METS
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Aspect ratio
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The relationship between the horizontal and vertical
dimensions of an image. The horizontal dimension is
normally listed first. For example, a 4 (vertical) by 6 inch
(horizontal) print has an aspect ratio of 3:2.
Term: Aspect ratio
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Artifact (defect)
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
General term to describe a broad range of undesirable
flaws or distortions in digital reproductions produced
during capture or data processing. Some common forms
of image artifacts include noise, chromatic aberration,
blooming, interpolation, and imperfections created by
compression, among others. In digital sound recordings,
the effect of lossy compression is often cited as
accounting for audible artifacts, although several other
types of artifacts may also be present.
Term: Artifact (defect)
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Aliasing
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
A sampling effect that leads to spatial frequencies being
falsely interpreted as other spatial frequencies.
Term: Aliasing
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Compression, lossless
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Data compressed using a lossless compression
technique will allow the decompressed data to be exactly
the same as the original data before compression, bit for
bit.The compression of data is achieved by coding
redundant data in a more efficient manner than in the
uncompressed format. The Compression ratios that can
be achieved with lossless compression are generally
much lower than those that can be achieved using lossy
compression techniques.
Term: Compression, lossless
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Compression, lossy
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Data compressed using a lossy compression technique
results in the loss of information. The decompressed data
will not be identical to the original uncompressed data.
Conservative lossless compression can result in a form
of lossy compression referred to as visually lossless
compression.
Term: Compression, lossy
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Compression, visually lossless
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
A form or manner of lossy compression where the data
that is lost after the file is compressed and
decompressed is not detectable to the eye; the
compressed data appearing identical to the
uncompressed data.
Term: Compression, visually lossless
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: Compression ratio
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
The ratio of a files uncompressed size over its
compressed size. A file compressed ten-fold over its
uncompressed size would be described as having a ten-
to-one compression, expressed as 10:1. Some formats
such as JPEG and JPEG 2000 allow the user to specify
the compression ratio.
Term: Compression ratio
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Term: FADGI
III
DIGITIZATION VOCABULARY
Vocabulary III: Do It Yourself!
Definition:
Federal Agencies Digitization Guidelines Initiative. A
collaborative effort by federal agencies formed as a
group in 2007 to define common guidelines, methods,
and practices to digitize historical content in a sustainable
manner.
Term: FADGI
A Little Bit about Smithsonian …
Rapid Capture and 3D Imaging
07.1 Smithsonian
A Little Bit about Smithsonian …
Rapid Capture
Courtesy of Keri Thompson, SIL
Courtesy of Karen Weiss, AAA
feat. Archives of American Art case study
What Is Rapid Capture?
• Rapid Capture is more than just taking a digital picture
of an object or specimen. . .
• Rapid Capture Workflows are comprehensive, end-to-
end digitization workflows.
• Rapid Capture workflows follow a collection object or
specimen from its shelf in permanent storage all the
way to its potential destination as a virtual object online,
available for access by the public.
07.1 Smithsonian
THE NEW
PARADIGM These homogenous
subsets have rapid-capture
technologies available…
123456AB
C
123456ABC
CDIS
DAMS Hot
Folder IngestDAMS
TMS
IDS
Staging
Rapid Capture
Digitization:
Object & Data
Workflow
Barcode put in
Filename and/or IPTC
Title field:
Object Path
Data Path
DataMatrix
Barcode
Generates Derivative
Media Image
Generates Derivative
Metadata
123456AB
C
123456AB
C
123456ABD
123456ABD
123456ABE
123456ABE
• Show what rapid capture looks like
Rapid Capture In Action
NMAH Numismatics
Hillery York, NMAH Collection Mgr, moves objects from staging to the capture station
Rapid Capture Impact
Access:
From the shelf to the public in less
than 24 hours.
Throughput:
Flat objects: 100,000+ to1.8M per year
Non-flat objects/specimens: 30,000 to60,000 per year
07.1 Smithsonian
What we’ve learned
Moving fast requires a holistic approach.
Moving
Collections
Digitizing
Collections
Moving
Data
Object handling,
cleaning, etc.
Dedicated hardware,
Quality control
Network / Systems
07.1 Smithsonian
From Human-Driven Systems
07.1 Smithsonian
To Conveyor-Driven Systems
07.1 Smithsonian
Rapid Capture – Conveyor Powered
07.1 Smithsonian
07.2 Smithsonian
A Little Bit about Smithsonian …
3D Imaging
Resources
Overview of Resources
08 Resources
You Are Not Alone!
There are a wealth of
resources to help with
digitization project of
all types …
Resources
08 Resources
Digital Library Federation
Strategy meets practice at the
Digital Library Federation (DLF).
Through its programs, working
groups, and initiatives, DLF
connects the vision and research
agenda of its parent organization,
the Council on Library and
Information Resources (CLIR), to an
active and exciting network of
practitioners working in digital
libraries, archives, labs, and
museums. DLF is a place where
ideas can be road-tested, and from
which new strategic directions can
emerge.
Resources
https://www.diglib.org/
08 Resources
Museums and the Web
The Museums and the Web
Bibliography comprises all papers
published on MW conference websites
or in annual selected proceedings.
Entries can be filtered by year and are
listed alphabetically by the primary
author's name. Clicking a paper title
shows details including an abstract and
a live URL link if appropriate. Clicking an
author's name lists all papers by that
author. This bibliography is a work in
progress as we standardize all entries..
Resources
http://www.museumsandtheweb.com/
08 Resources
Federal Agencies Digitization Guidelines Initiative
Federal Agencies Digitization guidelines
Initiative. Formed as a group in 2007 to
define common guidelines, methods,
and practices to digitize historical
content in a sustainable manner. Two
separate working groups were formed.
The Federal Agencies Still Image
Digitization Working Group will
concentrate its efforts on image content
such as books, manuscripts, maps, and
photographic prints and negatives.
The Federal Agencies Audio-Visual
Working Group is focusing its work on
sound, video, and motion picture film.
Resources
http://www.digitizationguidelines.gov/
Last Thoughts
http://xkcd.com/1685/
THANK
YOU!
QUESTIONS?
THANKS
TO…
• Sarah Osborne Bender
• Smithsonian Digitization Program Office
 Günter Waibel
 Adam Metallo
 Vincent Rossi
• Richard Naples (Smithsonian Libraries)
• Keri Thompson (Smithsonian Libraries)
• Jacqueline Chapman (Smithsonian Libraries)

Más contenido relacionado

La actualidad más candente

Selection and acquisitions
Selection and acquisitionsSelection and acquisitions
Selection and acquisitionsJohan Koren
 
Indexing language concept types and characteristics
Indexing language concept types and characteristicsIndexing language concept types and characteristics
Indexing language concept types and characteristicsDr. Utpal Das
 
Automation and Integrated Library Systems
Automation and Integrated Library SystemsAutomation and Integrated Library Systems
Automation and Integrated Library SystemsJulie Goldman
 
Marketing of Library and Information Services: A Study
Marketing of Library and Information Services: A StudyMarketing of Library and Information Services: A Study
Marketing of Library and Information Services: A StudyDipanwita Das
 
A comparative analysis of library classification systems
A comparative analysis of library classification systemsA comparative analysis of library classification systems
A comparative analysis of library classification systemsAli Hassan Maken
 
Need, steps and challenges of library automation
Need, steps and challenges of library automationNeed, steps and challenges of library automation
Need, steps and challenges of library automationpardeeprattan
 
Archival Management: Principles and Techniques
Archival Management: Principles and TechniquesArchival Management: Principles and Techniques
Archival Management: Principles and TechniquesFe Angela Verzosa
 
RDA (Resource Description & Access)
RDA (Resource Description & Access)RDA (Resource Description & Access)
RDA (Resource Description & Access)Jennifer Joyner
 
Serial control
Serial control Serial control
Serial control Perumal A
 
SEARS LIST OF SUBJECT HEADINGS (PRACTICE)
SEARS LIST OF SUBJECT HEADINGS (PRACTICE)SEARS LIST OF SUBJECT HEADINGS (PRACTICE)
SEARS LIST OF SUBJECT HEADINGS (PRACTICE)Libcorpio
 
Library Automation in Circulation
Library Automation in Circulation Library Automation in Circulation
Library Automation in Circulation Murchana Borah
 

La actualidad más candente (20)

Marc 21
Marc 21Marc 21
Marc 21
 
Selection and acquisitions
Selection and acquisitionsSelection and acquisitions
Selection and acquisitions
 
Library Networks
Library NetworksLibrary Networks
Library Networks
 
Dspace software
Dspace softwareDspace software
Dspace software
 
Indexing language concept types and characteristics
Indexing language concept types and characteristicsIndexing language concept types and characteristics
Indexing language concept types and characteristics
 
Automation and Integrated Library Systems
Automation and Integrated Library SystemsAutomation and Integrated Library Systems
Automation and Integrated Library Systems
 
Marc 21
Marc 21Marc 21
Marc 21
 
Marketing of Library and Information Services: A Study
Marketing of Library and Information Services: A StudyMarketing of Library and Information Services: A Study
Marketing of Library and Information Services: A Study
 
A comparative analysis of library classification systems
A comparative analysis of library classification systemsA comparative analysis of library classification systems
A comparative analysis of library classification systems
 
ALA.pptx
ALA.pptxALA.pptx
ALA.pptx
 
Need, steps and challenges of library automation
Need, steps and challenges of library automationNeed, steps and challenges of library automation
Need, steps and challenges of library automation
 
Archival Management: Principles and Techniques
Archival Management: Principles and TechniquesArchival Management: Principles and Techniques
Archival Management: Principles and Techniques
 
Marc 21
Marc 21Marc 21
Marc 21
 
RDA (Resource Description & Access)
RDA (Resource Description & Access)RDA (Resource Description & Access)
RDA (Resource Description & Access)
 
Digital archiving
Digital archivingDigital archiving
Digital archiving
 
Serial control
Serial control Serial control
Serial control
 
SEARS LIST OF SUBJECT HEADINGS (PRACTICE)
SEARS LIST OF SUBJECT HEADINGS (PRACTICE)SEARS LIST OF SUBJECT HEADINGS (PRACTICE)
SEARS LIST OF SUBJECT HEADINGS (PRACTICE)
 
Preservation and archiving unit 1
Preservation and archiving unit 1Preservation and archiving unit 1
Preservation and archiving unit 1
 
Introduction to DSpace
Introduction to DSpaceIntroduction to DSpace
Introduction to DSpace
 
Library Automation in Circulation
Library Automation in Circulation Library Automation in Circulation
Library Automation in Circulation
 

Similar a Digitization Basics for Libraries, Archives, and Museums

Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projectsac2182
 
Back2 basics - A Day In The Life Of An Oracle Analytics Query
Back2 basics - A Day In The Life Of An Oracle Analytics QueryBack2 basics - A Day In The Life Of An Oracle Analytics Query
Back2 basics - A Day In The Life Of An Oracle Analytics QueryChristian Berg
 
Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...
Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...
Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...Senturus
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]guest410707c
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data LakeCalum Miller
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceInside Analysis
 
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven AlertsA DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven AlertsDevOps.com
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015Bipin Singh
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataDenny Lee
 
The Evolution of DITAs
The Evolution of DITAsThe Evolution of DITAs
The Evolution of DITAsIXIASOFT
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Memoori
 
What is an ITaaS catalog and why is it a necessity?
What is an ITaaS catalog and why is it a necessity?What is an ITaaS catalog and why is it a necessity?
What is an ITaaS catalog and why is it a necessity?Gravitant, Inc.
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaNational Library of Australia
 

Similar a Digitization Basics for Libraries, Archives, and Museums (20)

Just Digitise It - Daniel Wilksch - 2015
Just Digitise It - Daniel Wilksch - 2015Just Digitise It - Daniel Wilksch - 2015
Just Digitise It - Daniel Wilksch - 2015
 
Planning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive ProjectsPlanning and Managing Digital Library & Archive Projects
Planning and Managing Digital Library & Archive Projects
 
Back2 basics - A Day In The Life Of An Oracle Analytics Query
Back2 basics - A Day In The Life Of An Oracle Analytics QueryBack2 basics - A Day In The Life Of An Oracle Analytics Query
Back2 basics - A Day In The Life Of An Oracle Analytics Query
 
The New Model
The New ModelThe New Model
The New Model
 
NISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic RoadmapNISO BISG Forum: Bibliographic Roadmap
NISO BISG Forum: Bibliographic Roadmap
 
Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...
Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...
Rethinking The Data Warehouse: Emerging Practices and Technologies to Meet To...
 
Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]Gettingstartedwithdigitalcollectionsweb[1]
Gettingstartedwithdigitalcollectionsweb[1]
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational Intelligence
 
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven AlertsA DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
The Evolution of DITAs
The Evolution of DITAsThe Evolution of DITAs
The Evolution of DITAs
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Repairing with DITA - Don Day
Repairing with DITA -  Don DayRepairing with DITA -  Don Day
Repairing with DITA - Don Day
 
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
Simplifying Building Automation: Leveraging Semantic Tagging with a New Breed...
 
What is an ITaaS catalog and why is it a necessity?
What is an ITaaS catalog and why is it a necessity?What is an ITaaS catalog and why is it a necessity?
What is an ITaaS catalog and why is it a necessity?
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
 

Más de Martin Kalfatovic

BHL and Specimen Collection Data: The needle in the Festuca stack
BHL and Specimen Collection Data: The needle in the Festuca stackBHL and Specimen Collection Data: The needle in the Festuca stack
BHL and Specimen Collection Data: The needle in the Festuca stackMartin Kalfatovic
 
Managing Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution ExperienceManaging Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution ExperienceMartin Kalfatovic
 
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > FutureSeeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > FutureMartin Kalfatovic
 
Managing Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution ExperienceManaging Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution ExperienceMartin Kalfatovic
 
Digital Programs & Initiatives
Digital Programs & InitiativesDigital Programs & Initiatives
Digital Programs & InitiativesMartin Kalfatovic
 
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...Martin Kalfatovic
 
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Martin Kalfatovic
 
Smithsonian Libraries: Digital Programs and Initiatives Division
Smithsonian Libraries: Digital Programs and Initiatives DivisionSmithsonian Libraries: Digital Programs and Initiatives Division
Smithsonian Libraries: Digital Programs and Initiatives DivisionMartin Kalfatovic
 
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...Martin Kalfatovic
 
Natura non facit saltus: But Humans Do, The Need for Taxonomic Annotation
Natura non facit saltus: But Humans Do, The Need for Taxonomic AnnotationNatura non facit saltus: But Humans Do, The Need for Taxonomic Annotation
Natura non facit saltus: But Humans Do, The Need for Taxonomic AnnotationMartin Kalfatovic
 
2018 BHL Program Director’s Report: Secretariat & Technical Update
2018 BHL Program Director’s Report: Secretariat & Technical Update2018 BHL Program Director’s Report: Secretariat & Technical Update
2018 BHL Program Director’s Report: Secretariat & Technical UpdateMartin Kalfatovic
 
Expanding Access for the Local and Global Increasing Access & Empowering Glob...
Expanding Access for the Local and Global Increasing Access & Empowering Glob...Expanding Access for the Local and Global Increasing Access & Empowering Glob...
Expanding Access for the Local and Global Increasing Access & Empowering Glob...Martin Kalfatovic
 
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...Martin Kalfatovic
 
A Vast Library of Life: The Biodiversity Heritage Library
A Vast Library of Life: The Biodiversity Heritage LibraryA Vast Library of Life: The Biodiversity Heritage Library
A Vast Library of Life: The Biodiversity Heritage LibraryMartin Kalfatovic
 
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...Smithsonian Libraries in Service of Scholarly Communications: An Introduction...
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...Martin Kalfatovic
 
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...Martin Kalfatovic
 
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Martin Kalfatovic
 
“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library
“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library
“The Gift of Time”: Impact through Open: The Biodiversity Heritage LibraryMartin Kalfatovic
 

Más de Martin Kalfatovic (20)

ebooks 4 eVeryBody
ebooks 4 eVeryBodyebooks 4 eVeryBody
ebooks 4 eVeryBody
 
BHL and Specimen Collection Data: The needle in the Festuca stack
BHL and Specimen Collection Data: The needle in the Festuca stackBHL and Specimen Collection Data: The needle in the Festuca stack
BHL and Specimen Collection Data: The needle in the Festuca stack
 
Managing Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution ExperienceManaging Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution Experience
 
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > FutureSeeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future
Seeing a Butterfly & Knowing What It Is: BHL: Past > Present > Future
 
Managing Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution ExperienceManaging Scholarly Research Output: The Smithsonian Institution Experience
Managing Scholarly Research Output: The Smithsonian Institution Experience
 
BHL & The Catalogue of Life
BHL & The Catalogue of LifeBHL & The Catalogue of Life
BHL & The Catalogue of Life
 
Digital Programs & Initiatives
Digital Programs & InitiativesDigital Programs & Initiatives
Digital Programs & Initiatives
 
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...
Discoverable, Accessible, Reusable, and Transparent (DART): Scholarly Communi...
 
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
Cultural Heritage and the Technology of Culture: Finding the Nature of Illumi...
 
Smithsonian Libraries: Digital Programs and Initiatives Division
Smithsonian Libraries: Digital Programs and Initiatives DivisionSmithsonian Libraries: Digital Programs and Initiatives Division
Smithsonian Libraries: Digital Programs and Initiatives Division
 
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...
The Biodiversity Heritage Library & Botany: Empowering Discovery through Free...
 
Natura non facit saltus: But Humans Do, The Need for Taxonomic Annotation
Natura non facit saltus: But Humans Do, The Need for Taxonomic AnnotationNatura non facit saltus: But Humans Do, The Need for Taxonomic Annotation
Natura non facit saltus: But Humans Do, The Need for Taxonomic Annotation
 
2018 BHL Program Director’s Report: Secretariat & Technical Update
2018 BHL Program Director’s Report: Secretariat & Technical Update2018 BHL Program Director’s Report: Secretariat & Technical Update
2018 BHL Program Director’s Report: Secretariat & Technical Update
 
Expanding Access for the Local and Global Increasing Access & Empowering Glob...
Expanding Access for the Local and Global Increasing Access & Empowering Glob...Expanding Access for the Local and Global Increasing Access & Empowering Glob...
Expanding Access for the Local and Global Increasing Access & Empowering Glob...
 
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
The Biodiversity Information Standards (TDWG): Opportunities for Collaboratio...
 
A Vast Library of Life: The Biodiversity Heritage Library
A Vast Library of Life: The Biodiversity Heritage LibraryA Vast Library of Life: The Biodiversity Heritage Library
A Vast Library of Life: The Biodiversity Heritage Library
 
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...Smithsonian Libraries in Service of Scholarly Communications: An Introduction...
Smithsonian Libraries in Service of Scholarly Communications: An Introduction...
 
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...
Free & Open Access to Biodiversity Literature: An Introduction to the Biodive...
 
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
Digital Programs & Initiatives @ Smithsonian Libraries: Scholarly Communicati...
 
“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library
“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library
“The Gift of Time”: Impact through Open: The Biodiversity Heritage Library
 

Último

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Último (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Digitization Basics for Libraries, Archives, and Museums

  • 1. Martin R. Kalfatovic | 8 June 2016 | Chapel Hill, NC DIGITIZATION BASICS for LIBRARIES ARCHIVES MUSEUMS and
  • 2.
  • 4. • A bit about me • About this session • Why do we digitize? • Context: Libraries, Archives and Museums (LAMs) Introduction
  • 5. 01.1 Introduction Just a bit about me ... @UDCMRK • Twitter@udcmrk • www.udc793.org • www.linkedin.com/i n/martinkalfatovic • … an inordinate fondness for Dodos
  • 6. 01.2.1 Introduction About this session Taking a broad overview of standards, lingo, hardware, software and planning considerations, this session will get everyone current and on a level playing field to proceed through the subsequent topics. The session will help you establish the foundational vocabulary to both enrich your SEI experience and increase your capacity to communicate with your colleagues about the basics of digital reformatting. The session will introduces the practices, standards, and challenges evident across the spectrum of cultural heritage institutions acquiring, managing, and providing access to digital collections. The session considers the digital curation life-cycle as well as lightly touching on funding and aggregators.
  • 7. About this session 1. Lecture/Discussion 2. Vocabulary Building 3. Exercises 01.2.2 Introduction
  • 8. 01.3 Introduction Context Libraries | Archives | Museums In principle, the work of art has always been reproducible. Objects made by humans could always be copied by humans. Replicas were made by pupils in practicing for their craft, by masters in disseminating their works, and, finally, by third parties in pursuit of profit. But the technological reproduction of artworks is something new. Having appeared intermittently in history, at widely spaced intervals, it is now being adopted with ever- increasing intensity. Das Kunstwerk im Zeitalter seiner technischen Reproduzierbarkeit. Walter Benjamin (1936)
  • 9. 01.3 Introduction Context Libraries | Archives | Museums Digitization has magnified our ability to reproduce art, books, and even objects, with increasing rapidity, ease, and added functionality.
  • 10. 01.4 Introduction • Provide online access to collections • Make digitized material and metadata available through online catalogs AND for reuse on other platforms. • Maximize value to the largest audience in new and creative ways. • Advance the preservation by reducing wear and tear on the originals. Why do we digitize? Based on NARA strategic plan
  • 11. 01.4 Introduction Why do we digitize? Based on NARA strategic plan • Provide access to those materials that can no longer be accessed in their original format. • Maximize the efficient and effective use of resources to carry out digitization and achieve cost-saving benefits whenever possible. • Improve our service to customers by responding to their evolving expectations
  • 12. • Let’s all count in binary! • Analog vs. Binary : Wave vs. Sample • Bytes vs. Bits Basics
  • 13. 02.1.1 Basics BASE 10 (Decimal) 0 1 10 100 1,000 10,000 100,000 1,000,000 BASE 2 (Binary) 0 1 2 4 8 16 32 64 128 256 512 1,024 2,048 5,096 BASE 3 (Ternary) – not on the test! 0 1 2 3 9 27 81 243 729 2,187 6,561 19,683 Basics Let’s All Count in Binary!
  • 14. 02.1.2 Basics Basics Let’s All Count in Binary! 1 - 1 2 - 10 3 - 11 4 - 100 5 - 101 6 - 110 7 - 111 8 - 1000 9 - 1001 10 - 1010 11 - 1011 12 - 1100 13 - 1101 14 - 1110 15 - 1111 16 - 10000 17 - 10001 18 - 10010 19 - 10011 20 - 10100 91 - 1011011 92 - 1011100 93 - 1011101 94 - 1011110 95 - 1011111 96 - 1100000 97 - 1100001 98 - 1100010 99 - 1100011 100 - 1100100
  • 15.
  • 18. Exercises Exercise 1: Let’s Convert Between Binary and Decimal
  • 19. 02.2 Basics Analog vs. Binary : Wave vs. Sample By Hyacinth - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30716342 By Hyacinth - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=23867344
  • 20. 02.3 Basics Bytes vs Bits BIT A bit is the smallest unit of information that can be stored or manipulated on a computer; it consists of either zero or one. AKA a bit a binary digit, especially when working with the 0 or 1 values. BYTE A byte is how many bits are needed to represent letters of the alphabet and other characters. For example, the letter "A" would be 01000001. 8 bits = 1 byte WORD Groups of 4 Bytes (translated into Hexadecimal – Base 16!), e.g. 4B 4A 57 00 = K J W <null>
  • 21. 02.4 Basics Naming the Bytes 1 Bit = Binary Digit 8 Bits = 1 Byte 1000 Bytes = 1 Kilobyte 1000 Kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Brontobyte 1000 Brontobytes = 1 Geopbyte https://flic.kr/p/Knb8k
  • 22. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Term: Binary
  • 23. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Definition: In mathematics and digital electronics, a binary number is a number expressed in the binary numeral system or base-2 numeral system which represents numeric values using two different symbols: typically 0 (zero) and 1 (one). The base-2 system is a positional notation with a radix of 2. Because of its straightforward implementation in digital electronic circuitry using logic gates, the binary system is used internally by almost all modern computers and computer-based devices. Each digit is referred to as a bit. Term: Binary
  • 24. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Term: Analog
  • 25. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Definition: An analog signal has a theoretically infinite resolution. In practice an analog signal is subject to electronic noise and distortion introduced by communication channels and signal processing operations, which can progressively degrade the signal-to-noise ratio (SNR). In contrast, digital signals have a finite resolution. Term: Analog
  • 26. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Term: Byte
  • 27. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Definition: The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures. The size of the byte has historically been hardware dependent and no definitive standards existed that mandated the size. The de facto standard of eight bits is a convenient power of two permitting the values 0 through 255 for one byte. Term: Byte
  • 28. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Term: Digitization
  • 29. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Definition: The process of recording an analog signal in a digital form. In relation to content of this site, it describes the process of translating analog signal data emanating from an object (light or sound) into a digitally encoded format. Audio, still and moving images are commonly digitized for increased access or for preservation purposes. Term: Digitization
  • 30. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Term: Brontobytes
  • 31. I DIGITIZATION VOCABULARY Vocabulary I: Introduction and Basics Definition: The prefix bronto, as used in the term brontobyte, has been used to represent anything from 1015 to 1027 bytes, most often 1027 Term: Brontobytes
  • 32. • Get ready! • Prioritization: Some things are more equal than others • Tools to help you Get Ready and Prioritize • Copyright! Measure Twice, Cut Once
  • 33. 03.1 Measure Twice … Measure Twice, Cut Once Get ready! Staffing Resources Acknowledge that digitizing for public access is a significant business process that crosses multiple business units. Develop a separate human resource plan to support this digitization business process. • • IT Infrastructure • Along with staffing, require an IT plan to support digitization that includes bandwidth, storage, the ability to share images and metadata across business units, among other requirements.
  • 34. 03.1 Measure Twice … Policy and Guidance for Digitization Activities • Promulgate policy and guidance that provides further implementation direction as business units begin implementing the strategy • Technical Digitization Standards • Develop technical digitization requirements for the approaches outlined above to ensure uniformity and standardization. • Funding Strategies • Seek out and explore other options and relationships to digitize and make content available Measure Twice, Cut Once Get ready!
  • 35. 03.2 Measure Twice … Candidates for digitization projects will be prioritized according to established criteria for significance and use. • • Candidates for digitization projects will be prioritized in order to achieve a demonstrated high priority preservation benefit for the agency. • • Funding is available or likely to be available and sustainable for the project. Measure Twice, Cut Once Prioritization: Some things are more equal than others
  • 36. 03.3 Measure Twice Starting up Tools to help you Get ready & Prioritize • Digitization Plans • Digital Asset Management Plan • Web Access Plan
  • 37. 03.4 Measure Twice … Copyright https://xkcd.com/14/ Sometimes I just can't get outraged over copyright law ...
  • 38. Copyright To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries. Article I, Section 8, Clause 8 of the United States Constitution … but most of the time I am … but still … 03.4 Measure Twice …
  • 39. Staffing a digitization project • How • Who
  • 40. 04.1 Staffing Staffing a digitization project Depending on the size of the institution, staff members may fill a number of roles. Also, do not forget that in addition to your regular staff, your volunteers, interns, and student help can participate in the digitization process (with the proper training and supervision).
  • 41. 04.2 Staffing Staffing a digitization project How • In-house staffing • Outsourcing • Hybrid approach
  • 42. 04.3 Staffing Staffing a digitization project Who • Director / CEO • Project Manager • Curator • Technical Staff • Conservator • Scanning Operators
  • 43. 04.3.1 Staffing Staffing a digitization project Director / CEO As with any LAM activity, the overall responsibility for all functions ultimately rests with the director. Strong leadership and vision for digitization is necessary for a successful program.
  • 44. 04.3.2 Staffing Staffing a digitization project Project Manager Manage goals, expectations, identify further staffing, equipment, liaison between departments and staff, create workplans and associated documents, manages funds.
  • 45. 04.3.3 Staffing Staffing a digitization project Curator Or, the person in charge of a collection. In addition to their responsibilities of caring for the collections, curators are also generally responsible for the display of the objects in coherent and informative or educational ways.
  • 46. 04.3.4 Staffing Staffing a digitization project Technical Staff Database development, web/database integration, CGI (Common Gateway Interface) script writing, Perl programming, and related activities that simplify the process of getting objects to the scanning operations and the resulting files in a usable state.
  • 47. 04.3.5 Staffing Staffing a digitization project Conservator Depending upon the types of collections, consultation with the preservation / conservation staff in varying degrees will be necessary to determine if (and how) the items can be digitized and/or photographed.
  • 48. 04.3.6 Who Staffing a digitization project Scanning Operators Scanning, photography, handling materials (packing, shipping) and other such skills are just a few that may be required of staff doing the actual conversion.
  • 49. Getting Technical • Pixel tricks • Color and not color • File formats • Cost and implementation factors • File naming • Compression • Bundling file formats
  • 50. 05.1 Getting Technical Getting Technical Pixel Tricks Pixel An abbreviation of picture element, this term may refer to a component of either a digital image or a digital sensor. In the case of a digital image, the pixel is the smallest discrete unit of information in the image's structure. In the case of the sensor in a scanner or digital camera, a pixel is the smallest photosensitive component or cell providing a response to light (or photons).
  • 51. 05.1.1 Getting Technical Getting Technical Why do we care? Remember back to wave vs. sample? Pixels can be thought of as those elements of the samples that fall within the wave.
  • 52. Getting Technical 05.1.1 Getting Technical Analog vs. Binary : Wave vs. Sample
  • 54. 05.1.3 Getting Technical Getting Technical Pixel Parameters Sampling Frequency This parameter measures the physical pixel count in pixels per inch (ppi), pixels per mm, etc. This parameter informs us about the size of the original and also provides part of the data needed to determine the level of detail recorded in the file. Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
  • 55. 05.1.3 Getting Technical Getting Technical Pixel Parameters Sharpening Sharpening artificially enhances details to create the illusion of greater definition. There are three major sharpening processes in a typical imaging pipeline: capture sharpening (through camera setting adjustment), image sharpening in post processing, and output sharpening for print or display purposes. Sharpening is usually implemented through image edge enhancement, such as filtering techniques using unsharp masks and inverse image diffusion. Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
  • 56. 05.1.3 Getting Technical Getting Technical Sharpening Basic image Autosharpen Extreme sharp
  • 57. 05.1.3 Getting Technical Getting Technical Pixel Parameters Reproduction Scale Accuracy This parameter measures the relationship between the size of the original object to the size of that object in the digital image. This parameter is measured in relation to the pixels per inch (ppi) or pixels per mm (ppmm) of the original digital capture. Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014
  • 58. 05.2 Getting Technical • Bitonal • Grayscale • Color • Additive color • Subtractive color Getting Technical Color and Not Color
  • 59. 05.2.1 Getting Technical Getting Technical Color and Not Color RGB Color Grayscale Bitonal
  • 60. 05.2.2 Getting Technical Getting Technical Color and Not Color: CMYK The CMYK color model (process color, four color) is a subtractive color model, used in color printing, and is also used to describe the printing process itself. CMYK refers to the four inks used in some color printing: cyan, magenta, yellow, and key (black). Though it varies by print house, press operator, press manufacturer, and press run, ink is typically applied in the order of the abbreviation. By Viliam Furík - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=39316936 (upper) By SharkD at English Wikipedia Later version uploaded by Jacobolus, Dacium at en.wikipedia. - Transferred from en.wikipedia to Commons., Public Domain, https://commons.wikimedia.org/w/index.php?curid=3791468 (lower)
  • 61. 05.2.3 Getting Technical Getting Technical Color and Not Color: RBG The RGB color model is an additive color model in which red, green, and blue light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three additive primary colors, red, green, and blue. The main purpose of the RGB color model is for the sensing, representation, and display of images in electronic systems, such as televisions and computers Wikipedia https://en.wikipedia.org/wiki/RGB_color_model
  • 62. 05.2.4 Getting Technical Getting Technical Color and Not Color: RBG vs. CYMK By RGB_CMYK_4.jpg: Annette Shacklett derivative work: Marluxia.Kyoshu [Public domain], via Wikimedia Commons A comparison of RGB and CMYK color spaces. The image demonstrates the difference between the RGB and CMYK color gamuts. The CMYK color gamut is much smaller than the RGB color gamut, thus the CMYK colors look muted. If you were to print the image on a CMYK device (an offset press or maybe even a ink jet printer) the two sides would likely look much more similar, since the combination of cyan, yellow, magenta and black cannot reproduce the range (gamut) of color that a computer monitor displays. This is a constant issue for those who work in print production. Clients produce bright and colorful images on their computers and are disappointed to see them look muted in print.
  • 63. 05.3 Getting Technical • Master / Archival / Access • Image file types Getting Technical File Formats
  • 64. Getting Technical File Formats Master A digital file (images, video, audio) which has been stored in its original captured state. These master files are also referred to as master copies, preservation masters or preservation copies. 05.3.1 Getting Technical
  • 65. Getting Technical File Formats Archival A file that is composed of one or more computer files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress files to use less storage space. Archive files often store directory structures, error detection and correction information, arbitrary comments, and sometimes use built-in encryption. 05.3.1 Getting Technical
  • 66. Getting Technical File Formats Access Often used to for low resolution images, (thumbnails, screen images) that are made available the Internet. See also delivery copy and surrogate image. This could be an identical copy of the original file or perhaps a lower quality version with a smaller file size. Sometimes called delivery or surrogate or derivative. 05.3.1 Getting Technical
  • 67. • RAW • TIFF • JPEG • JPEG 2000 • PNG • PDF • GIF Getting Technical Image file types 05.3.2 Getting Technical
  • 68. File Formats Three Factors • Cost • System Implementation • Sustainability Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014 05.4 Getting Technical
  • 69. File Formats Cost Factors • Implementation Cost • Cost of Software Tools • Cost of equipment needed to produce files • Storage Cost • Network Cost • Ongoing Cost of Production • Cost of Providing Access • Cost of Preservation Processing Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014 05.4.1 Getting Technical
  • 70. File Formats Implementation Factors Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014 • Level of difficulty/complexity • Technical Complexity • Toolset Complexity • Availability of tools • Ease and accuracy for OCR • Ease and accuracy of File validation • Ease and accuracy of monitoring of quality 05.4.2 Getting Technical
  • 71. File Formats Sustainability Factors • Disclosure • Adoption • Transparency • Self-Documentation • Native Embedded Metadata Capabilities • Embedded Metadata Capabilities Through Extension • Level of Work Necessary to Embed Native Metadata • Level of Work Necessary to Embed Metadata Through Extension • Geo-referencing Metadata • Level of Effort to Embed Geo-referencing Metadata • Impact of Patents • Technical Protection Mechanisms Raster Still Images for Digitization A Comparison of File Formats. FADGI. 2014 05.4.3 Getting Technical
  • 72. Exercises Exercise 3: Deep Dive: FADGI File Format Sheet
  • 73. File Formats File Naming (Zhèngmíng | 正名) 05.5 Getting Technical The Master replied, “What is necessary is to rectify names.” “So! indeed!” said Tsze-lu. “You are wide of the mark! Why must there be such rectification?” “Therefore a superior man considers it necessary that the names he uses may be spoken appropriately, and also that what he speaks may be carried out appropriately. What the superior man requires is just that in his words there may be nothing incorrect.” From The Analects of Confucius, Book 13, Verse 3 (James R. Ware, translated in 1980):
  • 74. File Formats File Naming: Some Guidelines Semantic Names There is meaning encoded in the name, like: • “BCA_03_04_00_145” • Syrnium fulvescens from Biologia Centrali-Americana). Aves. Volume IV (1879-1904) by Osbert Salvin and F. DuCane Godman (4th volume of the 3rd part of Biologia Centrali- Americana, plate 145 05.5.1 Getting Technical BCA_03_04_00_145.jpg
  • 75. File Formats File Naming: Some Guidelines Practical • Use barcodes, accession numbers, etc. • “39088002738714-0003” • Wiener Farbenkabinet (1794) 05.5.2 Getting Technical 39088002738714-0003.jpg
  • 76. File Formats File Naming: Three Parts • Prefix • Ordinal Position • Suffix 05.5.3 Getting Technical 39088002738714-0003.jpg
  • 77. File Formats File Naming: Three Parts Prefix Bca_03_04_00 This is the part that’s either semantic or practical 05.5.3 Getting Technical Ordinal 145 Position of the item in relation to a compound object Suffix jpg File type Bca_03_04_00_145.jpg
  • 78. File Formats File Naming: Last Thoughts 05.5.3 Getting Technical • Stick with three letter extension for the suffix (.tif, .jpg, .jp2, .png) • Keep file names the same length (padding with Zeros not spaces!) • Better to be consistent than right!
  • 79. 05.6 Technical File Formats Compression / Lossy / Lossless Some file format types, specifically JPEG, JPEG 2000, and TIFF, allow you to compress the file size. Compression leads to loss of data (since you are making the files smaller). Be aware of this data loss when compressing files.
  • 80. 05.7 Technical File Formats Bundling file formats When you need to move many associated files around, you may wish to “bundle” them to pull them all together into one package. Common formats are: • ZIP • TAR • RAR • 7z
  • 81. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Pixel
  • 82. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: An abbreviation of picture element, this term may refer to a component of either a digital image or a digital sensor. In the case of a digital image, the pixel is the smallest discrete unit of information in the image's structure. In the case of the sensor in a scanner or digital camera, a pixel is the smallest photosensitive component or cell providing a response to light (or photons). Term: Pixel
  • 83. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Pixilation
  • 84. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: A term used to describe the an abrupt and unnatural transition over and edge feature. Also referred to as "staircasing" because of the jagged and abrupt transition. Term: Pixilation
  • 85. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: PPI
  • 86. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: PPI stands for pixels per inch, commonly used in describing the resolution capabilities of an imaging device such as a scanner or the resolution of a digital image. The terms DPI (dots per inch) and PPI are used somewhat interchangeably today. Term: PPI
  • 87. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Pixel Dimensions
  • 88. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: Pixel Dimensions are the horizontal and vertical measurements of an image expressed in pixels. The pixel dimensions may be determined by multiplying both the width and the height by the DPI. A digital camera will also have pixel dimensions, expressed as the number of pixels horizontally and vertically that define its resolution (e.g., 2,048 by 3,072). Calculate the DPI achieved by dividing a document's dimension into the corresponding pixel dimension against which it is aligned. Term: Pixel Dimensions
  • 89. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Color
  • 90. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: Color is the visual perceptual property corresponding in humans to the categories called red, blue, yellow, etc. Color derives from the spectrum of light (distribution of light power versus wavelength) interacting in the eye with the spectral sensitivities of the light receptors. Color categories and physical specifications of color are also associated with objects or materials based on their physical properties such as light absorption, reflection, or emission spectra. Term: Color
  • 91. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Bit depth (image)
  • 92. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: The number of bits used to represent each pixel in an image. The term can be confusing since it is sometimes used to represent bits per pixel and at other times, the total number of bits used multiplied by the number of total channels. For example, a typical color image using 8 bits per channel is often referred to as a 24-bit color image (8 bits x 3 channels). Color scanners and digital cameras typically produce 24 bit (8 bits x 3 channels) images or 36 bit (12 bits x 3 channels) capture, and high-end devices can produce 48 bit (16 bit x 3 channels) images. A grayscale scanner would generally be 1 bit for monochrome or 8 bit for grayscale (producing 256 shades of gray). Bit depth is also referred to as color depth. Term: Bit depth (image)
  • 93. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: CMYK
  • 94. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: A subtractive color model used in printing that is based on cyan (C), magenta (M), yellow (Y) and black (K). These are typically referred to as process colors. Cyan absorbs the red component of white light, magenta absorbs green, and yellow absorbs blue. In theory, the mix of the three colors will produce black, but a black ink is used to increase the density of black in a print. Term: CMYK
  • 95. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: RGB
  • 96. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: An additive color model based on the three primary colors of red (R), blue (B) and green (G). Term: RGB
  • 97. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Grayscale
  • 98. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: Grayscale is a range of monochromatic shades from black to white. Therefore, a grayscale image contains only shades of gray and no color. Term: Grayscale
  • 99. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Bitonal
  • 100. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: A bitonal image is represented by pixels consisting of 1 bit each, which can represent two tones (typically black and white), using the values 0 for black and 1 for white or vice versa. Term: Bitonal
  • 101. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Continuous tone
  • 102. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: Generally referring to pictorial images where there is a non-broken range of tones from white to black that may have every shade of gray represented. There are theoretically an infinite number of tones. Traditional photography (photochemical photography) produces continuous tone images. When reformatting pictorial items, it is important to distinguish continuous tone originals from printed halftones, since these two classes are likely to require different strategies and methods for making the digital images. Term: Continuous tone
  • 103. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Color Space
  • 104. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: The choice of colorspace determines how many unique colors are potentially possible in your digital file, and how fine the gradations are between shades of color. Each colorspace was designed for a specific purpose, none is superior to the others for all applications. However, FADGI recommends selecting an appropriate colorspace from the recommendations in the charts in this document. Term: Color Space
  • 105. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: OCR
  • 106. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: Optical Character Recognition (OCR) is a technology that allows dots or pixels representing machine generated characters in a raster image to be converted into digitally coded text. In addition to recognizing and coding text, OCR programs attempt to recognize and code the structural elements of a document page, such as columns and non-text graphical elements. Term: OCR
  • 107. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Archival master file
  • 108. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: File that represents the best copy produced by a digitizing organization, with best defined as meeting the objectives of a particular project or program. These objectives differ from one content category to another and the specifications to be recommended at this Web site (forthcoming) will be tailored to fit a variety of common categories and objectives. In some cases, an archive may produce more than one archival master file. Term: Archival master file
  • 109. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Master File Format
  • 110. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: The choice of master file format is a decision which affects how your digitized materials can be used and managed. There is no one correct master file format for all applications, all format choices involve compromises between quality, access and lifecycle management. The FADGI star system tables list the most appropriate master file formats for each imaging project type. Selection of the most appropriate format within these recommended choices is an important decision that should be consistent with your long term archive strategy. Term: Master File Format
  • 111. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Access File Format
  • 112. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: FADGI anticipates continual evolution in the availability of access file formats, each new format designed to provide specific advantages over others for a specific application. Care should be taken when selecting access formats to insure long term viability. Term: Access File Format
  • 113. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Vector graphics
  • 114. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: Vector graphics is the use of polygons to represent images in computer graphics. Vector graphics are based on vectors, which lead through locations called control points or nodes. Each of these points has a definite position on the x and y axes of the work plane and determines the direction of the path; further, each path may be assigned a stroke color, shape, curve, thickness, and fill. Term: Vector graphics
  • 115. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Term: Bundling file format
  • 116. II DIGITIZATION VOCABULARY Vocabulary II: Getting Technical Definition: These formats encapsulate their constituent files and, save for a directory that provides the filenames, do not describe the content and the relationships that may obtain between files. (In this, they differ from what are often called wrappers.) Archetypes include ZIP, StuffIt, and TAR, the latter associated with the UNIX operating system. Simple bundling formats tend to be generic, i.e., they may be used for a wide range of content types. Term: Bundling file format
  • 117. Do it yourself (DIY)
  • 118. 06.1 DIY Do It Yourself! Describing the Collection
  • 119. 06.1.1 DIY Do It Yourself! Describing the Collection
  • 120. 06.1.2 DIY Do It Yourself! Describing the Collection The sheer number of metadata standards in the cultural heritage sector is overwhelming, and their inter-relationships further complicate the situation. This visual map of the metadata landscape is intended to assist planners with the selection and implementation of metadata standards Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley
  • 121. 06.1.2 DIY Do It Yourself! Describing the Collection Each of the 105 standards listed here is evaluated on its strength of application to defined categories in each of four axes: community, domain, function, and purpose. The strength of a standard in a given category is determined by a mixture of its adoption in that category, its design intent, and its overall appropriateness for use in that category. Seeing Standards: A Visualization of the Metadata Universe by Jenn Riley
  • 122. 06.2 DIY Do It Yourself! Digitization Tools
  • 123. 06.2.1 DIY Do It Yourself! Scanner vs. Camera The World’s First Digital Camera (1975) by Kodak and Steve Sasson
  • 124. 06.2.1 DIY Do It Yourself! Camera At the heart of a digital camera is the sensor. The size and density of the sensor determines the pixel count of the resulting image. The sensor in combination with an optical lense creates the digital image. By C-M - own Image, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=2150801
  • 125. 06.2.3 DIY Do It Yourself! Scanner A scanner is a much different device; the sensor array is long and thin. By moving across the target at varying speeds and angles, higher or lower resolution outputs can be generated. By Scanner_a_plat_fonctionnement.png: User:Jean-noderivative work: Pluke (talk) - Scanner_a_plat_fonctionnement.png, FAL, https://commons.wikimedia.org/w/index.php?curid=17009063
  • 126. 06.2.4 DIY Do It Yourself! A Note on Sensors CCD vs CMOS Charge-Coupled Devices vs. Complementary Metal– Oxide–Semiconductor Both types of sensor accomplish the same task of capturing light and converting it into electrical signals. By Filya1 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6304562
  • 127. 06.3 DIY Do It Yourself! Digitization Equipment There are many options for equipment to create digital files from collections objects: • Scanners • Slide/Negative scanners • Specialized tools The World’s First Digital Camera (1975) by Kodak and Steve Sasson
  • 128. 06.3.1 DIY Do It Yourself! Digitization Equipment: Flatbed Scanners • 25-50 Page Automated Document Feeder • Flat Bed Scanning • Support for either TWAIN and/or ISIS interface drivers • USB or SCSI Interface • Support for largest expected documents • Duplex (Automatic) scanning (two sides at one pass) • Optical 200 x 200 - 600 x 600 Dots Per Inch (DPI)
  • 129. 06.3.2 DIY Do It Yourself! Digitization Equipment: MFM / Slide / Neg Remember to keep in mind your original size with doing transparencies and negatives. You will need to scan at a much higher PPI to create an image closer in size to the original for display or printing • Microfilm scanners • Slide & Negative scanners
  • 130. 06.3.4 DIY Do It Yourself! Specialized Tools • Digital cameras • Scanning back cameras • High speed book scanners • 3D scanners
  • 131. 06.3.4.1 DIY Do It Yourself! Digital Cameras Digital cameras come if a vast array of sizes, shapes, and formats. DT ATOM Frame footprint: 24.625” wide x 31” deep With table top: 25.625” wide x 31.25” deep Height with Column: 65.5” Light Arm Span: 78.5”
  • 132. 06.3.4.1 DIY Do It Yourself! Digital Cameras DT BC100 Materials Frame is made from black anodized extruded aluminum, custom brackets are made from black anodized aircraft grade aluminum. Overall Dimensions 7’H x 6’5” W x 5’D Footprint 6’W x 5’D Glass Platen Dimensions 24.9” x 17.48” on each side Book Binding Limitations 6” binding Working Table Height 30” Accessory & Monitor Shelf Dimensions Side shelves are 19” x 34” black laminate Compressor .5 HP with a 6.3 gallon tank
  • 133. 06.3.4.2 DIY Do It Yourself! Scanning Back Cameras Scanning back cameras provide generally higher resolution by replacing the sensor array with a sensing device that scans across the image created by the camera lens. • More megapixel images • Slower scanning times XF Phase One DT RCam DT RG3040 Images from Digital Transitions http://dtdch.com/
  • 134. 06.3.4.3 DIY Do It Yourself! High Speed Book Cameras For high throughput, robotic scanners are available for some materials, such as books.
  • 135. 06.3.4.4 DIY Do It Yourself! 3D Equipment There are a wide variety of 3D digitization tools and processes. What type of equipment and process you use should be carefully thought out with the end-use of the digitization in the forefront. Smithsonian 3D Imaging Team
  • 136. 06.4 DIY Do It Yourself! Where to do it: The Gray Room Having dedicated workspace for your digitization is, of course, optimal. In reality, digitization will occur wherever it is most practical. Still, if you have the luxury of dedicated space, here are some guidelines on building it out.Internet Archive, San Francisco
  • 137. 06.4.1 DIY Do It Yourself! FADGI Space Guidelines The working environment should be painted/decorated a neutral, matte gray with a 60% reflectance or less to minimize flare and perceptual biases. FADGI Guidelines
  • 138. 06.4.1 DIY Do It Yourself! FADGI Space Guidelines Monitors should be positioned to avoid reflections and direct illumination on the screen. FADGI Guidelines Smithsonian Libraries Scanning Room
  • 139. 06.4.1 DIY Do It Yourself! FADGI Space Guidelines ISO 12646 requires the room illumination be less than 32 lux when measured anywhere between the monitor and the observer, and the light a color temperature of approximately 5000K. Consistent room illumination is a fundamental element of best practice in imaging. Changes in color temperature or light level from a window, for example, can dramatically affect the perception of an image displayed on a monitor. FADGI Guidelines Smithsonian Libraries Scanning Room
  • 140. 06.4.1 DIY Do It Yourself! FADGI Space Guidelines Each digitization station should be in a separate room, or separated from each other by sufficient space and with screening to minimize the light from one station affecting another. It is critically important to maintain consistent environmental conditions within the working environment. FADGI Guidelines Internet Archive, San Francisco
  • 141. 06.4.1 DIY Do It Yourself! FADGI Space Guidelines Care should be taken to maintain the work environment at the same temperature and humidity as the objects are normally kept in. Variations can cause stress to some materials and in severe cases may damage the originals. The use of a datalogger in both imaging and storage areas is highly recommended. FADGI Guidelines Smithsonian Libraries Scanning Room
  • 142. 06.5 DIY Do It Yourself! Quality Control Quality Control (QC), or Quality Assurance (QA), is key to maintaining the overall quality and fidelity of any digitization project. Differing levels of QC may be needed for the type of project and materials being digitized. In large scale projects, 100% QC will rarely be feasible. http://www.sil.si.edu/imagegalaxy/imagegalaxy_imageDetail.cfm?id_image=7403
  • 143. Exercises Exercise 4: Quality Control Sampling
  • 144. 06.6 DIY Do It Yourself! Storage Now that we’ve created all this data, we need to store it somewhere … https://flic.kr/p/86miWv
  • 145. 06.6.1 DIY Do It Yourself! Storage • Short Term • Medium Term • Long Term https://flic.kr/p/86miWv
  • 146. 06.6.2 DIY Do It Yourself! Primary magnetic storage • Diskettes • Hard disks (both fixed and removable) • High capacity floppy disks • Disk cartridges • Magnetic tape Smithsonian Data Center https://flic.kr/p/cGFn2f
  • 147. 06.6.3 DIY Do It Yourself! Primary optical storage • Compact Disk Read Only Memory (CD ROM) • Digital Video Disk Read Only Memory (DVD ROM) • CD Recordable (CD R) • CD Rewritable (CD RW) Smithsonian Data Center https://flic.kr/p/cGFn4E
  • 148. 06.6.3 DIY Do It Yourself! Solid-state storage Solid-state storage is a type of non-volatile computer storage that stores and retrieves digital information using only electronic circuits, without any involvement of moving mechanical parts. (Wikipedia) Examples: • SSD • Flash driveInternet Archive https://flic.kr/p/dnDS11
  • 149. 06.6.4 DIY Do It Yourself! Acronyms • DAS • NAS • SAN • DAM Internet Archive https://flic.kr/p/8Ms4QV
  • 150. 06.6.4.1 DIY Do It Yourself! Direct-attached storage (DAS) ... is a traditional mass storage, that does not use any network. This is still a most popular approach. This retronym was coined recently, together with NAS and SAN. (Wikipedia)Internet Archive https://flic.kr/p/8Ms4QV
  • 151. 06.6.4.2 DIY Do It Yourself! Network-attached storage (NAS) … is mass storage attached to a computer which another computer can access at file level over a local area network, a private wide area network, or in the case of online file storage, over the Internet.. (Wikipedia) Internet Archive https://flic.kr/p/8Ms4QV
  • 152. 06.6.4.3 DIY Do It Yourself! Storage area network (SAN) ... is a specialized network, that provides other computers with storage capacity. The crucial difference between NAS and SAN is the former presents and manages file systems to client computers, whilst the latter provides access at block- addressing (raw) leve. (Wikipedia) Internet Archive https://flic.kr/p/8Ms4QV
  • 153. 06.6.4.3 DIY Do It Yourself! Digital asset management (DAM) … consists of management tasks and decisions surrounding the ingestion, annotation, cataloguing, storage, retrieval and distribution of digital assets. (Wikipedia)Internet Archive https://flic.kr/p/8Ms4QV
  • 154. 06.6.5 DIY Do It Yourself! Digital Preservation http://www.xkcd.com/1683/
  • 155. 06.6.5 DIY Do It Yourself! Digital Preservation https://xkcd.com/242/ There are two kinds of preservationists: those who have lost data and those who will. Minimum Digitization Capture Recommendations (2013)
  • 156. 06.7 DIY Do It Yourself! Recap of Scanning • Scan at best resolution you can afford to store • Manuscripts and text: 300 ppi • Photographs: 400-800 ppi • Graphic materials: 600- 800 ppi • Maps: 600 ppi (up to 36”) or 300-400 pp (greater than 36”) • Calibrate monitor and scanning devicesSmithsonian Libraries Scanning Room
  • 157. 06.8 DIY Do It Yourself! Recap of Process • Create master (uncompressed) file • For analog content: Scan/sample • For born-digital content: Convert • Name the file in a consistent way • Perform quality control;edit as needed • Save master on stable, long-term storage • Create derivative or access file • Share access files as needed http://xkcd.com/730/
  • 158. 06.9 DIY Do It Yourself! Digitization Life-cycle • Create master (uncompressed) file • For analog content: Scan/sample • For born-digital content: Convert • Name the file in a consistent way • Perform quality control;edit as needed • Save master on stable, long-term storage • Create derivative or access file • Share access files as needed Biodiversity Heritage Library Digitization Life-Cycle
  • 159. 06.10 DIY Do It Yourself! Better, Faster, Cheaper! Now that we’ve mastered digitization. How do we scale it to 100’s, 1000’s, millions! of objects?Smithsonian Natural History Museum
  • 160. 06.10 DIY Digitization Program Office (Smithsonian)
  • 161. THE OLD PARADIGM This seems like an interesting and feasible subset… 06.10 DIY Digitization Program Office (Smithsonian)
  • 162. 06.10 DIY Digitization Program Office (Smithsonian)
  • 163. THE NEW PARADIGM These homogenous subsets have rapid-capture technologies available… 06.10 DIY Digitization Program Office (Smithsonian)
  • 164. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Array
  • 165. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Any orderly arrangement of individual sensor elements. In digital imaging, there are primarily three array types; two dimensional or area arrays, one dimensional or linear arrays, and tri-linear arrays consisting of three consecutive linear arrays of red, green, and blue sensitive sensor elements. Term: Array
  • 166. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Ambient light
  • 167. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Light existing in the environment that is not produced by the imaging system. Ambient light can be natural or artificial light. Ambient light is generally uncontrolled and can be highly variable, posing a possible risk to image quality. The level of ambient light should be minimized in relation to the level of light produced by the imaging system. Term: Ambient light
  • 168. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Calibration
  • 169. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: The comparison of instrument performance to a standard of higher accuracy. The standard is considered the reference and the more correct measure. Calibrations should be performed against a specified tolerance. Term: Calibration
  • 170. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Exif
  • 171. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Exchangeable image file format (Exif) describes a metadata set to accompany TIFF, JPEG, and RIFF WAV formatted image files. Exif was prepared by the Technical Standardization Committee on AV & IT Storage Systems and Equipment and is Published by the Japan Electronics and Information Technology Industries Association (JEITA ). The Exif 2.2 specification (JEITA CP-3451) is in nearly universal use by camera manufacturers. Term: Exif
  • 172. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: IPTC Metadata
  • 173. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Embedded metadata used for image management. IPTC metadata is primarily composed of descriptive, administrative, and rights metadata, as opposed to the technical nature of Exif. IPTC metadata was developed and is controlled by the IPTC. Term: IPTC Metadata
  • 174. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: METS
  • 175. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium (W3C). The standard is maintained as part of the MARC standards of the Library of Congress, and is being developed as an initiative of the Digital Library Federation (DLF). Term: METS
  • 176. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Aspect ratio
  • 177. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: The relationship between the horizontal and vertical dimensions of an image. The horizontal dimension is normally listed first. For example, a 4 (vertical) by 6 inch (horizontal) print has an aspect ratio of 3:2. Term: Aspect ratio
  • 178. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Artifact (defect)
  • 179. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: General term to describe a broad range of undesirable flaws or distortions in digital reproductions produced during capture or data processing. Some common forms of image artifacts include noise, chromatic aberration, blooming, interpolation, and imperfections created by compression, among others. In digital sound recordings, the effect of lossy compression is often cited as accounting for audible artifacts, although several other types of artifacts may also be present. Term: Artifact (defect)
  • 180. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Aliasing
  • 181. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: A sampling effect that leads to spatial frequencies being falsely interpreted as other spatial frequencies. Term: Aliasing
  • 182. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Compression, lossless
  • 183. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Data compressed using a lossless compression technique will allow the decompressed data to be exactly the same as the original data before compression, bit for bit.The compression of data is achieved by coding redundant data in a more efficient manner than in the uncompressed format. The Compression ratios that can be achieved with lossless compression are generally much lower than those that can be achieved using lossy compression techniques. Term: Compression, lossless
  • 184. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Compression, lossy
  • 185. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Data compressed using a lossy compression technique results in the loss of information. The decompressed data will not be identical to the original uncompressed data. Conservative lossless compression can result in a form of lossy compression referred to as visually lossless compression. Term: Compression, lossy
  • 186. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Compression, visually lossless
  • 187. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: A form or manner of lossy compression where the data that is lost after the file is compressed and decompressed is not detectable to the eye; the compressed data appearing identical to the uncompressed data. Term: Compression, visually lossless
  • 188. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: Compression ratio
  • 189. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: The ratio of a files uncompressed size over its compressed size. A file compressed ten-fold over its uncompressed size would be described as having a ten- to-one compression, expressed as 10:1. Some formats such as JPEG and JPEG 2000 allow the user to specify the compression ratio. Term: Compression ratio
  • 190. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Term: FADGI
  • 191. III DIGITIZATION VOCABULARY Vocabulary III: Do It Yourself! Definition: Federal Agencies Digitization Guidelines Initiative. A collaborative effort by federal agencies formed as a group in 2007 to define common guidelines, methods, and practices to digitize historical content in a sustainable manner. Term: FADGI
  • 192. A Little Bit about Smithsonian … Rapid Capture and 3D Imaging
  • 193. 07.1 Smithsonian A Little Bit about Smithsonian … Rapid Capture
  • 194. Courtesy of Keri Thompson, SIL Courtesy of Karen Weiss, AAA feat. Archives of American Art case study
  • 195.
  • 196. What Is Rapid Capture? • Rapid Capture is more than just taking a digital picture of an object or specimen. . . • Rapid Capture Workflows are comprehensive, end-to- end digitization workflows. • Rapid Capture workflows follow a collection object or specimen from its shelf in permanent storage all the way to its potential destination as a virtual object online, available for access by the public. 07.1 Smithsonian
  • 197. THE NEW PARADIGM These homogenous subsets have rapid-capture technologies available…
  • 198. 123456AB C 123456ABC CDIS DAMS Hot Folder IngestDAMS TMS IDS Staging Rapid Capture Digitization: Object & Data Workflow Barcode put in Filename and/or IPTC Title field: Object Path Data Path DataMatrix Barcode Generates Derivative Media Image Generates Derivative Metadata 123456AB C 123456AB C 123456ABD 123456ABD 123456ABE 123456ABE
  • 199. • Show what rapid capture looks like Rapid Capture In Action NMAH Numismatics Hillery York, NMAH Collection Mgr, moves objects from staging to the capture station
  • 200. Rapid Capture Impact Access: From the shelf to the public in less than 24 hours. Throughput: Flat objects: 100,000+ to1.8M per year Non-flat objects/specimens: 30,000 to60,000 per year 07.1 Smithsonian
  • 201. What we’ve learned Moving fast requires a holistic approach. Moving Collections Digitizing Collections Moving Data Object handling, cleaning, etc. Dedicated hardware, Quality control Network / Systems 07.1 Smithsonian
  • 204. Rapid Capture – Conveyor Powered 07.1 Smithsonian
  • 205. 07.2 Smithsonian A Little Bit about Smithsonian … 3D Imaging
  • 206.
  • 207.
  • 208.
  • 209.
  • 210.
  • 211.
  • 212.
  • 213.
  • 215. 08 Resources You Are Not Alone! There are a wealth of resources to help with digitization project of all types … Resources
  • 216. 08 Resources Digital Library Federation Strategy meets practice at the Digital Library Federation (DLF). Through its programs, working groups, and initiatives, DLF connects the vision and research agenda of its parent organization, the Council on Library and Information Resources (CLIR), to an active and exciting network of practitioners working in digital libraries, archives, labs, and museums. DLF is a place where ideas can be road-tested, and from which new strategic directions can emerge. Resources https://www.diglib.org/
  • 217. 08 Resources Museums and the Web The Museums and the Web Bibliography comprises all papers published on MW conference websites or in annual selected proceedings. Entries can be filtered by year and are listed alphabetically by the primary author's name. Clicking a paper title shows details including an abstract and a live URL link if appropriate. Clicking an author's name lists all papers by that author. This bibliography is a work in progress as we standardize all entries.. Resources http://www.museumsandtheweb.com/
  • 218. 08 Resources Federal Agencies Digitization Guidelines Initiative Federal Agencies Digitization guidelines Initiative. Formed as a group in 2007 to define common guidelines, methods, and practices to digitize historical content in a sustainable manner. Two separate working groups were formed. The Federal Agencies Still Image Digitization Working Group will concentrate its efforts on image content such as books, manuscripts, maps, and photographic prints and negatives. The Federal Agencies Audio-Visual Working Group is focusing its work on sound, video, and motion picture film. Resources http://www.digitizationguidelines.gov/
  • 221. THANKS TO… • Sarah Osborne Bender • Smithsonian Digitization Program Office  Günter Waibel  Adam Metallo  Vincent Rossi • Richard Naples (Smithsonian Libraries) • Keri Thompson (Smithsonian Libraries) • Jacqueline Chapman (Smithsonian Libraries)

Notas del editor

  1. 53
  2. The old prioritization methodology focused on whittling down collections to a manageable size.
  3. The new prioritization methodology is focused more on the physical characteristics of collections; identifying large, homogenous collections.
  4. These are known as “digistreets” The opportunity is the massive economies of scale that industrial-scale digitization provides.
  5. - Many of you have seen our Rapid Capture Projects during Open Houses and as of now, 5 units have been deeply immersed in Rapid Capture through our Pilot Projects. . . But for those that haven’t had the opportunity to see Rapid Capture in action, let me give a quick definition of what we’re doing. . . First and foremost. . .
  6. These are known as “digistreets” The opportunity is the massive economies of scale that industrial-scale digitization provides.
  7. Rapid Capture workflows follow a collection object or specimen from its shelf in permanent storage, to its digitization, to storage of the image in DAMS and of the collection records in your unit’s CIS, and finally, all the way to its potential destination as a virtual object online, available for access by the public.
  8. In addition to the various types of objects we’ve digitized at each pilot project, we’ve also integrated various SI collections management systems into our rapid capture workflows to include SIRIS at Gardens, TMS at NMAAHC & Freer Sackler and MIMSY-XG at American History. Additionally, during each pilot project we’ve introduced new techniques, technologies and processes. For example at Gardens we introduced new tools that support Quality Control in the digitization workflow and at American History we integrated Transcription Center into the Rapid Capture workflow. In our next pilot project with Natural History we’ll integrate the EMu Collection Management System into our workflow, and we’ll also introduce barcoding into the rapid capture workflow
  9. The impact of these RCPP have shown us several things. . . - In typical Rapid Capture workflows, its possible for an object or specimen to go from permanent storage, where it hasn’t been seen by the public in years, decades or perhaps ever, to publicly accessible Smithsonian websites in as little as a few hours! - And because Rapid Capture workflows are fine tuned, improved & continuously optimized at every step of the way, depending on the collection object or specimen, high quality digital assets can be generated at throughput rates of 150 objects per day to 700 to 1,300 specimens, and upwards to 6,000 objects or specimens per day!
  10. To take advantage of the opportunities presented by comprehensive, end-to-end rapid capture workflows, we’ve learned we need to take a holistic approach to our workflows; workflows which include not only the digitization process itself, but the object or specimen handling that comes before it and the movement of mass amounts of data that come after it.
  11. We’ve shown what we can do with Human Driven Rapid Capture Workflows. . .
  12. … and the DPO is actively researching technologies available in the digitization community from around the world. . .
  13. To include semi-automated conveyor based digitization systems that have demonstrated their efficiency with high volume, flat collections. As well as robotic systems that can digitize small objects such as insects and large objects such as painting at extremely high resolutions. The bottom line is we’re constantly looking for ways to expand our capabilities, improve efficiencies and reduce costs all while maintaining the highest quality levels.