SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
METS-Bagger Tool
Normalizing existing digitized content into standardized
packages for robust long-term management.

Marcus Emmanuel Barnes
#c4lbc
2013-11-28
Background
● SFU Library holds about 15 TB of content
○ the Library has created high-quality master versions
of content it has digitized using ‘preservationfriendly’ formats.
○ descriptive metadata exists for almost all of it.

However, this content was not previously
managed with generally accepted digital
preservation practice.
Solution
● SFU Library Digitized Content Packaging
Specification
● METS-Bagger tool for normalizing existing
digitized content based on this specification
for robust long-term management.
METS-Bagger Tool
● Two components:
○ Collection normalization script
○ Integrity scripts based on collection
manifest
Collection Normalization
● Processes existing collections of files into a format
compliant with the SFU Library Digitized Content
Packaging Specification
● Packaging Formats:
○ METS (http://www.loc.gov/standards/mets/)
○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
How Collection Normalization Works
1. Configuration file for settings
2. Script walks the directory tree of a collection, compiles
list of files to be preserved
3. Files are collated into items (e.g., newspaper issue),
METS file is generated
4. Items files and associated METS file are bagged (and
serialized)
5. Future: A collection manifest is created for the collection
for integrity checking (automatic or manual).
Before and After Processing
Design Principles
● a minimalist implementation - uses as few METS and
BagIt options as possible.
● incorporates three widely implemented and understood
standards: METS, BagIt and UUID (Universally Unique
Identifiers)
● Technical metadata included in METS should include at
a minimum bit-level checksums, file type identification,
creating application, and where possible format validity
● Whenever possible, include descriptive metadata for the
item in the METS file.
Script Details
● Configuration file, main script, log file, processed
collection output directory
● Uses Python for using the tool on multiple platforms
● Plugins for technical metadata (FITS) and descriptive
metadata.
● Configuration options include:
○ test run (limited run size)
○ skipping technical metadata creation
○ file types of interest
Future
● Addition of manifest and integrity checking
tools that check a collection against its
manifest
● Additional plugins
● Sharing code on GitHub
Thank You
This work was made possible by the support of:
● Simon Fraser University Library
● SFU Library Systems group
● Mark Jordan @mjordan

Más contenido relacionado

Similar a SFU Library's METS-Bagger Tool

Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
smile790243
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
Alchemist095
 

Similar a SFU Library's METS-Bagger Tool (20)

BatIg
BatIgBatIg
BatIg
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
NCompass Live: Best Practices for Digital Collections
NCompass Live: Best Practices for Digital Collections NCompass Live: Best Practices for Digital Collections
NCompass Live: Best Practices for Digital Collections
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
 
APS-Presentation-MK.pptx
APS-Presentation-MK.pptxAPS-Presentation-MK.pptx
APS-Presentation-MK.pptx
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
Archivematica and Local Authority Archive Services
Archivematica and Local Authority Archive ServicesArchivematica and Local Authority Archive Services
Archivematica and Local Authority Archive Services
 
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
PERICLES Process Compiler - ‘Eye of the Storm: Preserving Digital Content in ...
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
People aggregator
People aggregatorPeople aggregator
People aggregator
 
What is Digital Asset Management?
What is Digital Asset Management?What is Digital Asset Management?
What is Digital Asset Management?
 
Asp .net folders and web.config
Asp .net folders and web.configAsp .net folders and web.config
Asp .net folders and web.config
 
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
Webinar: What's New in Pipeline Pilot 8.5 Collection Update 1?
 
Page 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docxPage 18Goal Implement a complete search engine. Milestones.docx
Page 18Goal Implement a complete search engine. Milestones.docx
 
Asp .net folders and web.config
Asp .net folders and web.configAsp .net folders and web.config
Asp .net folders and web.config
 
File management in OS
File management in OSFile management in OS
File management in OS
 
Islandora & Archivematica combined NDSA RAG poster for LITA
Islandora & Archivematica combined NDSA RAG poster for LITAIslandora & Archivematica combined NDSA RAG poster for LITA
Islandora & Archivematica combined NDSA RAG poster for LITA
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
 
Personal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - WorkshopPersonal Digital Archiving 2015 - NYU - Workshop
Personal Digital Archiving 2015 - NYU - Workshop
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

SFU Library's METS-Bagger Tool

  • 1. METS-Bagger Tool Normalizing existing digitized content into standardized packages for robust long-term management. Marcus Emmanuel Barnes #c4lbc 2013-11-28
  • 2. Background ● SFU Library holds about 15 TB of content ○ the Library has created high-quality master versions of content it has digitized using ‘preservationfriendly’ formats. ○ descriptive metadata exists for almost all of it. However, this content was not previously managed with generally accepted digital preservation practice.
  • 3. Solution ● SFU Library Digitized Content Packaging Specification ● METS-Bagger tool for normalizing existing digitized content based on this specification for robust long-term management.
  • 4. METS-Bagger Tool ● Two components: ○ Collection normalization script ○ Integrity scripts based on collection manifest
  • 5. Collection Normalization ● Processes existing collections of files into a format compliant with the SFU Library Digitized Content Packaging Specification ● Packaging Formats: ○ METS (http://www.loc.gov/standards/mets/) ○ BagIt (http://tools.ietf.org/html/draft-kunze-bagit)
  • 6. How Collection Normalization Works 1. Configuration file for settings 2. Script walks the directory tree of a collection, compiles list of files to be preserved 3. Files are collated into items (e.g., newspaper issue), METS file is generated 4. Items files and associated METS file are bagged (and serialized) 5. Future: A collection manifest is created for the collection for integrity checking (automatic or manual).
  • 7. Before and After Processing
  • 8. Design Principles ● a minimalist implementation - uses as few METS and BagIt options as possible. ● incorporates three widely implemented and understood standards: METS, BagIt and UUID (Universally Unique Identifiers) ● Technical metadata included in METS should include at a minimum bit-level checksums, file type identification, creating application, and where possible format validity ● Whenever possible, include descriptive metadata for the item in the METS file.
  • 9. Script Details ● Configuration file, main script, log file, processed collection output directory ● Uses Python for using the tool on multiple platforms ● Plugins for technical metadata (FITS) and descriptive metadata. ● Configuration options include: ○ test run (limited run size) ○ skipping technical metadata creation ○ file types of interest
  • 10. Future ● Addition of manifest and integrity checking tools that check a collection against its manifest ● Additional plugins ● Sharing code on GitHub
  • 11. Thank You This work was made possible by the support of: ● Simon Fraser University Library ● SFU Library Systems group ● Mark Jordan @mjordan