Axa Assurance Maroc - Insurer Innovation Award 2024
Workshop on Preservation and Access for Audio and Video
1. Workshop: Preservation and
Access for Audio and Video
Richard Wright
BBC R&D – PrestoCentre
Goportis Digital Preservation Summit
Hamburg – 19Oct 2011
12. The BBC Archive, 1995
• About 1 million hours BBC radio and television
• 1.5 million items of film and videotape
• 750,000 radio recordings
• 3 million photographs
• 1.2 million commercial recordings (“grams”)
• 4 million items of sheet music
• 22 million newspaper cuttings
• 550,000 document files; 20,000 rolls of microfilm
• 500,000 phonetic pronunciations
13. TV Archive Holdings, 1999
Film 30%
D3 6%
Digibeta 1%
Betacam
11%
VHS 14%
Umatic
4.5%
1” C Format 12%
2” Quad 1%
Ekta, Reversal
12%
14. TV Formats and Holdings
• 2” Quad, 60s-80s: Dr Who, Dad’s Army, Steptoe & Son,
Forsythe Saga, Fawlty Towers, Secret Army
• 1” C, 70s-90s: Yes Minister, Eastenders, Angels, Wogan,
All Our Working Lives
• Film: 50 years, 40s-90s: Man Alive, 1984, Ascent of
Man, British Empire, Omnibus, Pride and Prejudice 95
• Ektachrome: 67-82 news: Vietnam War, Yom Kippur war,
all major domestic stories
• U-Matic: 82-90’s: Lockerbie, General Elections in 1983 &
1987, the Gulf War
15. Radio Archive, 1999
• Radio holdings: 750,000 recordings, 300,000 hours
• Fewer technical problems with audio recordings
1/4” tapes in regions 37%
1/4” tapes in London 32%
CD Sound effects 0.02%
LP Sound effects 0.4%
DAT NCA sequence programmes , bulletins 1%
1/4” News bulletins 3%
Cassette NCA sequence programmes 4%
CD Programme extract compilation 0.1%
1/4” complete
programmes 15%
1/4” film unit tapes 1%
DAT, London &
Region 2%
LP & 78RPM Programme
Extract, 5% of sound holdings
16. Radio Formats and Holdings
• Radio One sessions: 40k recordings, all BBC
copyright; mainly ¼” tape
– Rolling Stones, Beatles, Who, Jimi Hendrix, Led
Zeppelin, The Fall ... and so many more
• News off-air on DAT 1990-2001, then CD to 06
• Files from tapeless production onto CD (6 yrs)
• Shellac and vinyl pressings from 20s-60s
– Special problems with lacquer, acetate, aluminium
master discs: 16”, fragile, deteriorate
18. Decaying Obsolete Fragile
• Obsolescence: at least 2/3 of the material
• Deterioration: approximately 1/3 of the
material
• Fragile media: roughly 1/4 of the material
Overall: 70% of
holdings have
problems (2001)
The Solution:
digitisation
19. Obsolescence
• Videotape
–2”; 1”; U-Matic: no playback equipment
• Film
–Disappearing in post production
• Audio formats
–Grams : no playback equipment (in BBC !!)
–¼” no longer accepted in BBC radio
production and playout systems
20. Deterioration
• Videotape – decay of adhesive
– 2”; 1”; U-Matic (30% read failures at BBC)
• Audio – decay of adhesive: ¼” tape
– Lacquer discs
• Magnetic sound tracks: vinegar syndrome
• Other Acetate
• Decay of film splices
• General decay of polymer materials
– Even the sleeves on vinyl LPs
21. Fragile Media
• Vinyl (45s, LPs)
– and shellac (78s)
• Film
– 10 plays per print (videotape: 50)
• Video or audiotape
– physical damage
– magnetic fields
22. Size of the Problem
– in Europe
• Presto: found 5 million hours 2001
– Mainly broadcast archives
• Prestospace: found 10 million hours 2004
– Broadcast and large national collections
• TAPE: found additional 20 million hours
– In collections not covered previously
• UNESCO estimate: 200 million hours
worldwide (100 million in Europe)
23. Where is the material?
• Broadcast archives 30% (roughly)
• National collections 15%
• Other major collections 15%
• Small and specialist collections 40%
NB: all these figures refer to archived
material ONLY (TAPE survey)
24. What to do about it?
Presto preservation factory
• Efficient workflow for digitisation
–Staff specialisation
–Cartography and Triage
Adam Smith: “the division of labour in pin
manufacturing – and the great increase
in the quantity of work that results” (UK
£20 note)
wiki.prestospace.org preservation guide
25. Problems with the solution:
digitisation not fully accepted
“You’re not preserving anything; you’re only
making more proxies and adding to the
problem”
• Not accepted as a solution for film
• Not easy to implement for video (in full
quality); problems: encodings, compression,
formats, file size, bandwidth ...
• But – very much accepted for audio: BWF
26. Problems with the solution:
needs Digital Preservation
The approach in 2006:
•Media
•Multiple copies
• Maintenance
•Migration
27. Media
• Datatape : cheaper that hard drives
– Needs an expensive tape drive
– And has reliability issues
• Optical is cheapest of all
– But isn’t really mass storage (DVD=4.7 GB)
• New DVD format(s) promise 20 to 100 GB
– And has reliability issues
• Hard drives prices have dropped sharply
– Easiest to automate management
– And has reliability issues
28. Multiple copies
• Two copies
– Two technologies
• In two places
• But fastest recovery is by mirroring
– Which means identical technologies
• Big arguments about RAID vs simpler options
vs more complex options
29. Maintenance
• Life cycle management
• Should be every archive’s
built-in process
• Begins with blank media
– Then the writing
– Then the initial checking
– Then the periodic checking: aerobics, scrubbing
• Ends with migration to the next format
30. Migration
• A fact of life
• Every five years
• Can involve a lot of manual handling (of
datatapes or optical media)
• Or can be nearly transparant (disc upgrades) –
but: every three years!
• Best practice: uncompressed file formats
31. Digital Preservation 2009
formal management model
Is the format a
problem?
START HERE
Archive for a
few years
What cost/quality/risk
option can you afford
Compress
lossy
YES
NO
UncompressCompress
lossless
END HERE
(1)
(2)
(3) (4)
(5a)(5b)
(5c)
32. ... with emulation
Is the format at
risk?
START HERE
Archive for a
few years
What cost/
quality/risk can
you afford?
Compress
lossy
YES
NO
Uncompress
Compress
lossless
END HEREMultivalent
33. Stepping back: the real
problem with storage (1)
Medium Bits/cm² Life
Stone 10 10 000
Paper 104 1000
Film 107 100
Disc 1010 10
Each change 1000 times cheaper,
but lasts 1/10th as long
34. The problem (2)
• Current storage media are unreliable
– Discs fail
– Data tapes fail
– Optical media fail (and are easily damaged)
– Companies fail
– exceptions? Glass discs; Holographic media;
Going back to film; Digital film;
35. The problem (3)
• Storage isn’t just about media
– Encoding and obsolescence
– File formats and obsolescence
– File management systems and obsolescence
– Physical interfaces and obsolescence
– Operating systems and obsolescence
– System complexity and associated risks
– Human errors
–Cost: continuous maintenance
36. What is the cost of continuous
maintenance?
• You need a model of storage operating and
replacement costs, into the future
• What storage? So you need a storage
strategy:
– Allocation of storage: primary, backup, cloud ...
– Operation of storage: cycles for copying, checking
– Some idea of relating costs to risks !!!
• NOT available from storage vendors
38. And now:
one PrestoPRIME tool
• A model for storage systems, to calculate
– Cost
– Risk
– Loss
– And compare what-if scenarios
• http://prestoprime.it-innovation.soton.ac.uk/
39.
40.
41. Storage Systems
HDD in servers
Migration required every 4 years. Running Costs
Access: €0.1 per GB
Storage: €1 per GB per year
Corruption Rates
Access: avg. 1 in 500 files
Latent: avg. 1 in 750 files per year
HDD on shelves
Migration required every 4 years. Running Costs
Access: €1 per GB
Storage: €0.25 per GB per year
Corruption Rates
Access: avg. 1 in 100 files
Latent: avg. 1 in 500 files per year
42. More Storage Systems
Data tapes in a robot
Migration required every 6 years. Running Costs
Access: €0.2 per GB
Storage: €0.4 per GB per year
Corruption Rates
Access: avg. 1 in 1x104 files
Latent: avg. 1 in 1x105 files per year
Data tapes on shelves
Migration required every 6 years. Running Costs
Access: €1 per GB
Storage: €0.1 per GB per year
Corruption Rates
Access: avg. 1 in 1x104 files
Latent: avg. 1 in 1x105 files per year
43.
44. Storage Configuration
Found 3 storage configurations. Add...
Disk with Tape
System 1: HDD in servers
Files accessed avg of 0.25 times per year, staying constant
Scrubbing every 1 year(s)
System 2: Data tapes in a robot
Files accessed avg of 0 times per year, staying constant
Scrubbing every 3 year(s)
45.
46. File Collections
• Found 1 file collection. Add...
• read-only
• Default File Collection
• Length of cost/loss projection is 25 year(s).
Files
• 100 thousand initially, staying constant.
• Average File Size
• 25 GB.
47.
48. Plans
Found 3 plans. Add...
Disk and Tape edit Delete Evaluate
File Collection: Default File Collection
25 year lifetime. 100 files, avg. 25 GB in size.
Storage Configuration: Disk with Tape
Uses HDD in servers and Data tapes in a robot
systems.
52. 1. Digitisation – Key Ideas
• Cartography and Triage
– Make a map of your holdings
– Decide on priorities
• Make a preservation plan
– Digitisation: in-house or a service provider?
• Better – Faster – Cheaper
– Division of labour: Adam Smith, industrial process
– Lower prices by contractors for archive work
Joanneum: Quality Analysis Tool
53. 2- Working with digital
content (lots of files)
It’s all about management
– DAM/MAM and Trusted Repositories – what do
they do, what don’t they do -- White Papers
– Storage –ITI online free tools
– Metadata – Joanneum mapping and validation ;
“tag gardening” Univ Amsterdam;
fingerprinting INA
– Digital library technology RAI, BBC MXF support
– Access – Joanneum Time-based navigation,
annotation
– Rights – RAI ontology, Eurix implementation
54. 3- Preserving the digital content
• Preservation Platform: P4=PrestoPRIME
Preservation Platfom, Eurix; Rosetta, Ex Libris
• Standards: OAIS; formal control; formal
preservation actions eg migration; P4
• Emulation – Multivalent, Univ of Liverpool
• Formats, carriers, storage: Planning and
strategy: PrestoPRIME white papers
• Managing and maintaining storage into the
future –SLA’s for outsourced service; white
papers, software for real-time SLA monitoring;
modelling and simulation tools
55. Access: Audiovisual Content
and Digital Libraries
• Digitisation makes audiovisual content
available to web access, including digital
libraries
• Broadcast archive projects: Birth of TV,
VIdeoActive, EUScreen (link to Europeana)
• BUT – what are digital libraries doing to
provide access to audiovisual content?
59. Reference and Citation
• the core requirement for scholarly discourse
– along with a major change in attitude!
• Needs a permanent place for “things to be”
– Hence the need for stable audiovisual collections
“Hamlet, for example, is comparable to Saxo
Grammaticus' Gesta Danorum.[citation needed]
King Lear is based on King Leir in Historia
Regum Britanniae by Geoffrey of Monmouth,
retold in 1587 by Raphael Holinshed.[citation
needed] “
wikipedia
60. Annotation
• the core requirement for
social web = interactivity
• individual interacts with
content
• individuals interact with
other individuals