Jpylyzer is a tool for validation and feature extraction for the JP2 (JPEG 2000 Part 1) still image format. The tool is being developed in the SCAPE Project and was presented by Johan van der Knijff at Archiving 2012 in Copenhagen.
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
1. SCAPE
Improved validation and feature
extraction for JPEG 2000 Part 1:
the jpylyzer tool
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands
2 Open Planets Foundation
3 The British Library
IS&T, Archiving 2012, Copenhagen, 15.6.2012
2. SCAPE
Metamorfoze
National Programme for preservation of paper
heritage
Digitisation as a means to conserve threatened paper
originals
146 TB
Migrate by end 2012
TIFF
JP2
7. SCAPE
Possible solutions
Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2
Not ready for operational use (yet)
Option 3
Develop dedicated tool
16. SCAPE
Example 1: detection of broken JP2s in JISC 1
Newspapers
Number of images 2,152,116
Total size 45 TB
Average image size 21.8 MB
Number of threads 1
Time 21 days*
Images/day/ thread 100,000
TB/day/thread 2
*Includes unzipping, actual time needed by jpylyzer much less!
17. SCAPE
Results
- 676 broken JP2s in JISC 1 collection (0.03 %)
TIFF originals still available
- JISC 2 (> 1 million images): 3 broken JP2s
- 19th Century books (> 22 million images): no broken
JP2s
19. SCAPE
TIFF pixels no
identical?
pixel compare yes
Aware JP2K SDK
no
valid JP2?
JP2 Jpylyzer*
yes
image no
properties compare properties
match?
yes
properties
profile
pass fail
*Imported as module in Python-based workflow
20. SCAPE
Example 3: pre-ingest quality control Wellcome
Library
- JP2s produced in-house and by external suppliers
- Use jpylyzer to validate against JP2 spec
- Use extracted properties to validate against a
profile
(Progression order, ratio, layers, ….)
- Profile coded as XML schema
(So jpylyzer output can be validated against schema)
24. SCAPE
Acknowledgements
Debian packages
- Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)
Feedback on early versions
- Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)
25. SCAPE
Funding
This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).
http://www.scape-project.eu
#SCAPEProject