SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
SCAPE


Improved validation and feature
extraction for JPEG 2000 Part 1:
the jpylyzer tool
Johan van der Knijff1,2, René van der Ark1, Carl Wilson3
1 Koninklijke Bibliotheek – National Library of the Netherlands
2 Open Planets Foundation

3 The British Library



IS&T, Archiving 2012, Copenhagen, 15.6.2012
SCAPE
                  Metamorfoze
National Programme for preservation of paper
  heritage
  Digitisation as a means to conserve threatened paper
    originals


         146 TB

             Migrate by end 2012
  TIFF
                                       JP2
SCAPE
JP2 from JISC 1 Newspaper Collection (BL)
SCAPE
JP2 from JISC 1 Newspaper Collection (BL)




                              “Well-formed and valid”
SCAPE




             Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg


Hardware failure may result in
corrupted images
SCAPE




Not all encoders
produce standard
compliant images
SCAPE
               Possible solutions

Option 1
Improve JPEG 2000 module JHOVE
But no institutional support, superseded by JHOVE2 (?)
Option 2
Develop JPEG 2000 module for JHOVE2
Not ready for operational use (yet)
Option 3
Develop dedicated tool
SCAPE
                                    Jpylyzer tool




0   1   1   1   1   0   0   1   0   1   1   1       0   1   0   1   1
SCAPE
                 Jpylyzer tool
- First prototype: December 2011

- Refactoring of original code: Jan 2012

- Packaging (Debian): Mar 2012
   Univ. Southampton, KEEP Solutions, AIT Vienna

- Add remaining functionality, bugfixes: Apr-May
   2012 (current version: 1.5)
SCAPE
JP2 file


             JPEG 2000 Signature box

                  File Type box

            JP2 Header box (superbox)

           Contiguous Codestream box 0



           Contiguous Codestream box n

                     IPR box

                   XML box(es)

                  UUID box(es)

           UUID Info box(es) (superbox)
SCAPE
Command-line use
SCAPE
Result
SCAPE
Properties extraction (excerpt)
SCAPE
Properties embedded ICC profile
SCAPE
Documentation
SCAPE
Example 1: detection of broken JP2s in JISC 1
               Newspapers

    Number of images           2,152,116
    Total size                 45 TB
    Average image size         21.8 MB
    Number of threads          1
    Time                       21 days*
    Images/day/ thread 100,000
    TB/day/thread              2


    *Includes unzipping, actual time needed by jpylyzer much less!
SCAPE
                           Results

- 676 broken JP2s in JISC 1 collection (0.03 %)
  TIFF originals still available


- JISC 2 (> 1 million images): 3 broken JP2s

- 19th Century books (> 22 million images): no broken
  JP2s
SCAPE
Example 2: quality control Metamorfoze
              migration



         146 TB


            Migrate by end 2012
 TIFF
                                     JP2
SCAPE
     TIFF                                            pixels     no
                                                   identical?

                  pixel compare                     yes
Aware JP2K SDK
                                                                 no
                                                   valid JP2?

     JP2                  Jpylyzer*
                                                   yes
                    image                                       no
                  properties      compare          properties
                                                    match?

                                                   yes
                  properties
                    profile
                                                     pass        fail


    *Imported as module in Python-based workflow
SCAPE
Example 3: pre-ingest quality control Wellcome
                   Library

 - JP2s produced in-house and by external suppliers

 - Use jpylyzer to validate against JP2 spec

 - Use extracted properties to validate against a
   profile
    (Progression order, ratio, layers, ….)

 - Profile coded as XML schema
    (So jpylyzer output can be validated against schema)
SCAPE
Platforms and licensing stuff
SCAPE
http://www.openplanetsfoundation.org/software/jpylyzer
SCAPE
Community involvement
SCAPE
              Acknowledgements

Debian packages
- Dave Tarrant (Uni Southampton/OPF)
- Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions),
- Rainer Schmidt (AIT)


Feedback on early versions
- Christy Henshaw (Wellcome Library)
- Ross Spencer (TNA)
- Wouter Kool (KB)
SCAPE
                    Funding


This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).


      http://www.scape-project.eu



                        #SCAPEProject

Más contenido relacionado

Destacado

Presentation of SCAPE Project
Presentation of SCAPE ProjectPresentation of SCAPE Project
Presentation of SCAPE ProjectSCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsSCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationSCAPE Project
 
Similarity Maps Using SSIM Index
Similarity Maps Using SSIM IndexSimilarity Maps Using SSIM Index
Similarity Maps Using SSIM IndexMichel Alves
 

Destacado (6)

Presentation of SCAPE Project
Presentation of SCAPE ProjectPresentation of SCAPE Project
Presentation of SCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collections
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlation
 
Similarity Maps Using SSIM Index
Similarity Maps Using SSIM IndexSimilarity Maps Using SSIM Index
Similarity Maps Using SSIM Index
 

Similar a Jpylyzer, a validation and feature extraction tool developed in SCAPE project

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...jkSlidevault
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-SupervisionDeep Learning JP
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesRichard Bernier
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」Satoshi Goto
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackSemantic Web Company
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software BenchmarkAkira Shibata
 
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUGSmart annotation processing - Paris JUG
Smart annotation processing - Paris JUGgdigugli
 
Smart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUGSmart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUGgdigugli
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeireimec
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific WorkflowsMatthew Gerring
 
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...gdigugli
 
Overview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesOverview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesTouradj Ebrahimi
 

Similar a Jpylyzer, a validation and feature extraction tool developed in SCAPE project (20)

Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyze...
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
 
The djatoka Image Server
The djatoka Image ServerThe djatoka Image Server
The djatoka Image Server
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital Archives
 
ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」ドワンゴでのScala活用事例「ニコニコandroid」
ドワンゴでのScala活用事例「ニコニコandroid」
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012Seminario Maurizio Agelli, 20-09-2012
Seminario Maurizio Agelli, 20-09-2012
 
LOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stackLOD2 Webinar: The 2nd release of the LOD2 stack
LOD2 Webinar: The 2nd release of the LOD2 stack
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
 
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUGSmart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
 
Bedrich Vychodil DIFFER
Bedrich Vychodil DIFFERBedrich Vychodil DIFFER
Bedrich Vychodil DIFFER
 
Jpeg2000
Jpeg2000Jpeg2000
Jpeg2000
 
Smart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUGSmart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUG
 
iMinds The Conference: Jan Lemeire
iMinds The Conference: Jan LemeireiMinds The Conference: Jan Lemeire
iMinds The Conference: Jan Lemeire
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
 
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
 
Vips 4mar09e
Vips 4mar09eVips 4mar09e
Vips 4mar09e
 
Overview of JPEG standardization committee activities
Overview of JPEG standardization committee activitiesOverview of JPEG standardization committee activities
Overview of JPEG standardization committee activities
 

Más de SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPESCAPE Project
 

Más de SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 

Último

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 

Último (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 

Jpylyzer, a validation and feature extraction tool developed in SCAPE project

  • 1. SCAPE Improved validation and feature extraction for JPEG 2000 Part 1: the jpylyzer tool Johan van der Knijff1,2, René van der Ark1, Carl Wilson3 1 Koninklijke Bibliotheek – National Library of the Netherlands 2 Open Planets Foundation 3 The British Library IS&T, Archiving 2012, Copenhagen, 15.6.2012
  • 2. SCAPE Metamorfoze National Programme for preservation of paper heritage Digitisation as a means to conserve threatened paper originals 146 TB Migrate by end 2012 TIFF JP2
  • 3. SCAPE JP2 from JISC 1 Newspaper Collection (BL)
  • 4. SCAPE JP2 from JISC 1 Newspaper Collection (BL) “Well-formed and valid”
  • 5. SCAPE Source: http://img70.imageshack.us/img70/9950/serversnm2.jpg Hardware failure may result in corrupted images
  • 6. SCAPE Not all encoders produce standard compliant images
  • 7. SCAPE Possible solutions Option 1 Improve JPEG 2000 module JHOVE But no institutional support, superseded by JHOVE2 (?) Option 2 Develop JPEG 2000 module for JHOVE2 Not ready for operational use (yet) Option 3 Develop dedicated tool
  • 8. SCAPE Jpylyzer tool 0 1 1 1 1 0 0 1 0 1 1 1 0 1 0 1 1
  • 9. SCAPE Jpylyzer tool - First prototype: December 2011 - Refactoring of original code: Jan 2012 - Packaging (Debian): Mar 2012 Univ. Southampton, KEEP Solutions, AIT Vienna - Add remaining functionality, bugfixes: Apr-May 2012 (current version: 1.5)
  • 10. SCAPE JP2 file JPEG 2000 Signature box File Type box JP2 Header box (superbox) Contiguous Codestream box 0 Contiguous Codestream box n IPR box XML box(es) UUID box(es) UUID Info box(es) (superbox)
  • 16. SCAPE Example 1: detection of broken JP2s in JISC 1 Newspapers Number of images 2,152,116 Total size 45 TB Average image size 21.8 MB Number of threads 1 Time 21 days* Images/day/ thread 100,000 TB/day/thread 2 *Includes unzipping, actual time needed by jpylyzer much less!
  • 17. SCAPE Results - 676 broken JP2s in JISC 1 collection (0.03 %) TIFF originals still available - JISC 2 (> 1 million images): 3 broken JP2s - 19th Century books (> 22 million images): no broken JP2s
  • 18. SCAPE Example 2: quality control Metamorfoze migration 146 TB Migrate by end 2012 TIFF JP2
  • 19. SCAPE TIFF pixels no identical? pixel compare yes Aware JP2K SDK no valid JP2? JP2 Jpylyzer* yes image no properties compare properties match? yes properties profile pass fail *Imported as module in Python-based workflow
  • 20. SCAPE Example 3: pre-ingest quality control Wellcome Library - JP2s produced in-house and by external suppliers - Use jpylyzer to validate against JP2 spec - Use extracted properties to validate against a profile (Progression order, ratio, layers, ….) - Profile coded as XML schema (So jpylyzer output can be validated against schema)
  • 24. SCAPE Acknowledgements Debian packages - Dave Tarrant (Uni Southampton/OPF) - Miguel Ferreira, Rui Castro, Hélder Silva (KEEP Solutions), - Rainer Schmidt (AIT) Feedback on early versions - Christy Henshaw (Wellcome Library) - Ross Spencer (TNA) - Wouter Kool (KB)
  • 25. SCAPE Funding This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137). http://www.scape-project.eu #SCAPEProject