WGBH Media Library and Archives Director Karen Cariani and American Archive of Public Broadcasting Project Manager Casey Davis gave this presentation at the New England Archivists 2014 Fall Symposium. Karen and Casey discussed managing and preserving digital video; Project Hydra; metadata for audiovisual materials; and collaboration with other institutions through the lens of WGBH Media Library and Archives projects including the American Archive of Public Broadcasting and the NEH funded HydraDAM project.
Challenges, Workflows, and Insights in the Collaboration to Preserve America's Public Media
1. Karen Cariani,
Director,
WGBH Media
Library and
Archives
Casey E. Davis,
AAPB Project
Manager
CHALLENGES, WORKFLOWS
AND INSIGHTS
IN THE COLLABORATION TO PRESERVE
AMERICA'S PUBLIC MEDIA
3. WHO WE ARE: AAPB
...and more than 120 public radio and
television stations and archives
nationwide
4. WHY ARE WE HERE TODAY?
Social media allows anyone to become a video publisher and
broadcaster
100 hours of video uploaded to YouTube every minute
60:1 – 80:1 shooting ratio on documentary films
How often do you create videos?
We’re all digital archivists now.” -Sibyl Schaefer
I would add to that, more specifically....
In a few years, we will also all be audiovisual archivists.
5. GOALS AND OBJECTIVES
• Manage and preserve born-digital AV
materials
• Explore digital media repository solutions
• Generate metadata for digital AV materials
• Evaluate multi -institutional collaborations
6. A FEW QUESTIONS
How many of you have A/V materials in your collection?
How many of you are collecting born digital media?
How are you storing the files?
Can you easily access them?
What are your biggest concerns?
Who is collaborating with other institutions?
8. CHALLENGES OF MANAGING DIGITAL VIDEO
• Fragi l ity, vulnerabi li ty of digi tal media
• No universal ly accepted standards or
proof of concept
• Digi tal obsolescence
• Complexi ty of digi tal video and audio
• Complex intel lectual property issues
• Huge f i le sizes make storage more
expensive
• Storage l imi tat ions lead to decisions
to compress
• Lack of t raining among archivists
wrapper
Synchronization
information
Chapter
information
subtitles
Multiple
video
streams
One or
more
codecs
Multiple
audio
streams
9. AAPB DIGITIZATION OF 40K HOURS
WGBH’s 7,010 tapes that were sent
to Crawford Media Services
11. THE AAPB BORN DIGITAL DELIVERABLE
Addition of 5,000 hours of digitized and born digital media
Up to 59,000 files
Not to exceed 5.24 terabytes after transcoding occurred
12. WE HAD SOME CHALLENGES
Lack of staff resources at stations
Often no metadata for digital files
File names not consistent w/ metadata
System limitations
Bicycling hard drives
Access quality vs preservation quality
5.24 terabytes became 250+ terabytes
13. ACQUIRING DIGITAL MATERIALS
Create procedures for donors to submit their digital files
Provide donors with resources to inventory their collection
Get as much metadata as you can from the donor
Provide donors with instructions on file naming, drive naming,
and organization
14. WGBH CONTRIBUTED FILES, TOO
Media currently stored on LTO-4 in an HSM system
The goal: send all video files to AAPB
10,648 files X approx. 100+ GB each = 201.6 TB
Copied files over network onto
70 3TB hard drives
Success!
15. THINK AGAIN
...we initially had a 57%
failure rate.
We learned the hard way
that everyday IT operations
are not good enough.
In the end:
26.4% failure rate
16. NO, WE DON’T HAVE 7 COPIES
Consider the NDSA
levels of preservation
1. Protect your data
2. Know your data
3. Monitor your data
4. Repair your data
Consider your resources
Do what you can
17. 1: Protect your
Data
2. Know your
data
3. Monitor your
data
4. Repair your
data
Storage &
geographic
location
File fixity &
integrity
Information
security
Metadata
File formats Library of Congress. NDSA Levels of Preservation.
http://www.digitalpreservation.gov/ndsa/activities/levels.html.
18. RESOURCES
UK Data Service. Prepare and Manage Data.
http://ukdataservice.ac.uk/manage-data/
Digital Curation Centre. Checklist for a Data Management Plan.
http://www.dcc.ac.uk/resources/data-management -plans/checklist
Library of Congress. DPOE Training Modules.
http://www.digitalpreservation.gov/education/
WITNESS. Activists Guide to Archiving Video.
http://archiveguide.witness.org /
AMIA Education Committee Blog & forthcoming webinar series
https://amiaeducomm.wordpress.com/
20. Preservation f iles are large
Uncompressed
Slow to move around
Complicated formats
Not just one file type
Codecs, wrappers, frame speed, etc.
Need proxy files for viewing
Smaller size for quick transport over network
Need transcoding
20
WHAT MAKES VIDEO DIFFERENT?
27. Aim to work towards a sus tainable, open
source reusable f ramework for mul t ipurpose,
mul t i func t ion, mul t i - inst i tut ional repos i tory -
enabled solut ions
Chal lenges
Do more with less
Do it fast enough
Do it well
Get back on your feet quick
The Hydra Way - Working in Communi ty
Shared Purpose
Continual Engagement & Assessment
Tangible Results
27
WHY HYDRA?
“If you want to go fast, go alone, if you want to go far, go together” --African Proverb
36. One Body, Many Heads…
ETDs
(Theses)
Books,
Articles
Images
Audio-
Visual
Research
Data
Maps
& GIS
Docu-ments
Scalable, Robust, Shared
Management and
Preservation Services
37. METADATA FOR AV
MATERIALS
• Time consuming to give same level of
detail that happens with other types of
content
• Need rational balance
38. QUESTIONS
How many of you have an inventory of your AV assets?
For Analog and digital?
Do you have full catalog records?
What metadata schema are you using to capture
Descriptive
Intellectual property
Technical &
Preservation metadata?
39. PBCORE | PBCORE.ORG
A standard way for anyone managing video or audio to
speak the same language
Best practices for capturing critical descriptive,
intellectual property, and technical metadata about
video and audio
Under further development by the AAPB and PBCore
Advisory Group
40. WHO USES IT?
Nor theast Hi stor ic Fi lm
Pop Up Archive
Univers i ty of I l l inois Center for
Innovat ion in Teaching and Learning
Smi thsonian Channel
Internat ional Cr iminal Tr ibunals , The
Hague
Al l iance for Communi ty Media
Univers i ty of South Carol ina, Moving
Image Research Col lect ions
Bay Area Video Coal i t ion
Columbia Univers i ty L ibrar ies
Cal i fornia Audiovi sual Preser vat ion
Project
Rock and Rol l Hal l of Fame
Communi ty Media Di st r ibut ion
Network
MyMas sTV Network
Documentary Educat ional Resources
Washington Universi ty Fi lm and
Televi s ion Archive
Amer ican Archive of Publ ic
Broadcast ing
Dance Her i tage Coal i t ion
Univers i ty of Not re Dame
Greene County Publ ic L ibrary
WITNESS
Glenstone Ar t Museum
41. WHO USES IT?
WGBH
I l l inoi s Publ ic Media
Wi sconsin Publ ic Televi s ion
Wi sconsin Publ ic Radio
WYSO
WNYC-FM
WNET
Louis iana Publ ic Broadcast ing
Paci f ica Radio Archives
KQED
SCETV
CUNY-TV
KUHF
Howard Univers i ty Televi s ion
Database companies/orgs have
PBCore prof i les including:
Drupal
CollectiveAccess
Omeka
Islandora
And many video and audio
digi t i zat ion vendors. . .
42. FIRST THINGS FIRST: HOW TO STORE DATA
Local databases (Filemaker, Access, etc.)
DAM systems
Ready-made solutions:
Drupal
CONTENTdm
Collective Access
Omeka
Spreadsheets
43. BEFORE WE GO ANY FURTHER
Asset / Intellectual Work
Instantiations /
Instances
44. STRUCTURE OF PBCORE
4 content classes
Intellectual Content
Intellectual Property
Technical
Extensions
82 total elements
30 attributes
Suggested controlled vocabularies
45. FINDING & CREATING THE METADATA
Minimal fields you need to capture
Identifier
asset level & instantiation level
Source of the identifier
Title
Formal or devised
Type of title
Description
Location
Room, shelf, box, file path, hard drive ID, etc.
49. DIGITAL MEDIA IDENTIFIER & FORMAT
Filename = Instantiation ID
From extension you can get
Digital Format
http://en.wikipedia.org/wiki/Internet_media_type
54. PBCORE IS FLEXIBLE & EXTENSIBLE
As an XML schema, PBCore can be implemented along
with other standards
Within a METS wrapper
With PREMIS as a sidecar file or as a <pbcoreExtension>
To provide more granular item-level description along with
collection-level description in EAD
55. <PBCORETITLE>EXAMPLES</PBCORETITLE>
http://www.pbcore.org/documentation/
• Simple instantiation record
• Simple description document
• PBCore Collection
• PBCore in a METS record
• PBCore in a digital preservation setting
• Using PBCore for asset management
• Using PBCore for archival description
58. COLLABORATIONS
Content projects:
Vietnam, Boston Local News, China?
Content inventory project
Hydra community – open source project
AAPB – participating organizations
Digital Commonwealth
59.
60.
61. HOW TO GET WHAT YOU NEED
Planning time
Creating policy – but be flexible
Deliverables for collaboration
Org chart for decision making – who has the final word
Who is deeply involved, who is peripheral
Example: Inventory project:
Data gathering
Tools – PBCore validator
Forms – minimum fields
Hand holding – call us
62.
63. FACE TO FACE
Important to build rapport
with partners
Relationships: I love you, I
need you, but
I want you to change
Learn about hierarchy at
partner institution so you
can understand challenges
and potential obstacles.
Manage expectations
65. SPEAK UP
Don’t be afraid to make sure your needs or your institutions
needs are being met
In large collaboration most likely you are not the only one to
have those thoughts