SlideShare una empresa de Scribd logo
1 de 13
June 14, 2013
Page 1
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
CCS
Content Conversion Specialists
europeana newspapers
Workshop Refinement and Quality Assessment, Belgrade 14.6.2013
OLR at CCS
From unstructured to structured newspaper data and the role
of content providers in the overall process
June 14, 2013
Page 2
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Agenda
 About CCS
 General workflow for mass digitization of newspapers
 OLR – Layout and structure analysis
 ENP OLR workflow (involvement of CP‘s)
 Quality assurance
 Output - METS/ALTO package
 Demo of first results
June 14, 2013
Page 3
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
About CCS
 CCS Content Conversion Specialists GmbH (Hamburg), as technical project
partner, will provide its expertise and docWorks technology to set up and
operate a mass digitisation workflow to create high quality structured content
from 2 million scanned newspaper pages provided by 5 library partners
 Page volume:
BNF=1.000 k, NLE=500 k , SUB HH=480 k, NLF=90 k, SBB=10 k
 The distributed OLR workflow enables the contribution of project partners
(content providers) to the integrated quality assurance process
 CCS will also contribute to the specification of the metadata model
June 14, 2013
Page 4
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
General workflow for mass digitization
Re-Scan
Conversion
Imaging
Layout
Analysis
OCR
ISR
Reject
Condition
Delivery
QA
random
Final
Output
Scanning
Image
Metadata
Database
----------------
Repository
Automated QA
Document
UID
Barcode
Item Tracking
Manual QA
•in-house
•near-shore
•off-shore
•multiple locations
Manual QA
•in-house
•near-shore
Check in
Check out
Scanner
•Robot-
•Book-
•Document-
•Microfilm-
QA+Correcti
onQA+Correcti
on
QA +
Correction
Z 39.50
Metadata
June 14, 2013
Page 5
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Layout and structure analysis
 Layout analysis based on „bottom up“ approach
 General rule system enables recognition of words, text
lines, text blocks, columns and classification of text
blocks, illustrations, advertisements, tables and the
following page types:
- title page (the title page of an issue)
- content page (a page that consists of content/text only)
- illustration page (a page that has at least one illustration)
- advertisement page (a page that contains adverts only)
 Structure analysis through classification of headlines
and grouping of zones into articles
(incl. article continuation)
June 14, 2013
Page 6
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
ENP OLR workflow | Conversion without scanning
Digital Image
Metadata
Delivery
Digital Image
Metadata
Delivery
Digital Object
Return
Digital Object
Return
Inspection /
Automatic QA
Inspection /
Automatic QA
Doc DeliveryDoc Delivery
RejectReject
Conversion facility
Material location
Conversion
MD Recording
June 14, 2013
Page 7
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Possible conversion scenarios
A) Conversion at library (on-site)
B) Conversion off-shore at CCS data center,
final QA at the library via internet transfer (remote QA solution)
C) Conversion off-shore at CCS,
final QA at the library by backup shipment
June 14, 2013
Page 8
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Scenario B | Remote QA at library
Internet
StorageStorage
IN
OUTPOOL
dW Share
Master
Offshore
Processing
@ CCS
OUTPUT
METS ALTO
StorageStorage
POOL
dW Share
RQA
QA on-site
@ Library
INPUT
June 14, 2013
Page 9
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Quality assurance
 @ CCS | Automated markup and basic manual correction:
- headlines, illustrations, tables, captions, advertisements, etc.
- article segmentation and grouping of zones into articles (incl. continuation)
 @ Content Provider (Library)
Recommended:
- Zoning: correct classification of blocks as „text“ or „illustration“
- Article segmentation: correct identification of headlines/text blocks/captions
- Grouping: correct gouping of blocks (text, illustration) to articles
- Metadata: correct title, issue date and issue number
Optional:
- Page types: correct page types
- Page numbers: correct page sequence
- OCR: perform text correction of specific zones (e.g. headlines, captions)
June 14, 2013
Page 10
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Output | METS/ALTO package
 METS/ALTO metadata schemas to describe the structured digital ouput object
 A newspaper issue processed in docWorks is converted into one METS XML
file. It reflects the whole physical and logical structure, manages all links to the
image files and the related ALTO XML files. ALTO is based on a standardized
page description schema and contains all information of a page (print space,
margins, coordinates, OCR results).
 Benefits of structural markup:
- better browsing and more precise text search
- better access and display on tablet and mobile devices
- automated article classification and clustering through data/text mining and
linguistic technologies
- user engagement for manual online text correction, article classification,
annotation, building personal collections, etc.
- sharing articles via social media platforms like Facebook, Twitter, etc.
_______________
METS = Metadada Encoding and Transmission Standard
ALTO = Analyzed Layout and Text Object
June 14, 2013
Page 11
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Access and Presentation
 Access through Europeana as well as content provider portals
 Existing newspaper presentation systems at National Library of Australia
(Trove), Library of Congress/NDNP (Chronicling America), Dutch National
Library (DDD), National Library of Luxembourg (eLuxemburgensia), ...
 Veridian demo:
Example of a newspaper presentation system to demonstrate access to
already processed ENP newspaper issues
June 14, 2013
Page 12
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Questions + answers
June 14, 2013
Page 13
Content Conversion Specialists
WS Refinement and Quality Assessment
Claus Gravenhorst
Director Strategic Initiatives
Contact
Claus Gravenhorst
Director Strategic Initiatives
CCS Content Conversion Specialists GmbH
Weidestr. 134
22083 Hamburg
Germany
c.gravenhorst@content-conversion.com
www.content-conversion.com

Más contenido relacionado

Similar a ENP Belgrade WS OLR @ CCS

Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayEuropeana Newspapers
 
Trekk cross media series using xml to create once - distribute everywhere - e...
Trekk cross media series using xml to create once - distribute everywhere - e...Trekk cross media series using xml to create once - distribute everywhere - e...
Trekk cross media series using xml to create once - distribute everywhere - e...Jeffrey Stewart
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewSemantic Web Company
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...DataScienceConferenc1
 
SAP Portal Role-Based Navigation Models for Different Countries and Languages.
SAP Portal Role-Based Navigation Models for Different Countries and Languages.SAP Portal Role-Based Navigation Models for Different Countries and Languages.
SAP Portal Role-Based Navigation Models for Different Countries and Languages.Markus Van Kempen
 
Design portfolio sarvesh satam
Design portfolio sarvesh satamDesign portfolio sarvesh satam
Design portfolio sarvesh satamSarvesh Satam
 
Cropley & Girlie - Planning and Implementing a Successful Mobile App
Cropley & Girlie - Planning and Implementing a Successful Mobile AppCropley & Girlie - Planning and Implementing a Successful Mobile App
Cropley & Girlie - Planning and Implementing a Successful Mobile AppLavaCon
 
Smart migration Solution overview
Smart migration Solution overviewSmart migration Solution overview
Smart migration Solution overviewMarc St-Pierre
 
Project Management Software
Project Management SoftwareProject Management Software
Project Management SoftwareMartin Sillaots
 
A Semantic-web-based Decision Support System for Specific Degree Programs
A Semantic-web-based Decision Support System for Specific Degree ProgramsA Semantic-web-based Decision Support System for Specific Degree Programs
A Semantic-web-based Decision Support System for Specific Degree Programsbmake
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Georg Rehm
 
Roadmap to Frontend Development.pdf
Roadmap to Frontend Development.pdfRoadmap to Frontend Development.pdf
Roadmap to Frontend Development.pdfSohan Singh
 
Toc08 Goldthwaite Digitizing Your Backlist
Toc08 Goldthwaite Digitizing Your BacklistToc08 Goldthwaite Digitizing Your Backlist
Toc08 Goldthwaite Digitizing Your Backlisttoc
 

Similar a ENP Belgrade WS OLR @ CCS (20)

Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
Ramesh_resume
Ramesh_resumeRamesh_resume
Ramesh_resume
 
Trekk cross media series using xml to create once - distribute everywhere - e...
Trekk cross media series using xml to create once - distribute everywhere - e...Trekk cross media series using xml to create once - distribute everywhere - e...
Trekk cross media series using xml to create once - distribute everywhere - e...
 
Resume - RK
Resume - RKResume - RK
Resume - RK
 
PoolParty Semantic Platform - Overview
PoolParty Semantic Platform - OverviewPoolParty Semantic Platform - Overview
PoolParty Semantic Platform - Overview
 
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
[DSC Europe 23] Djordje Grozdic - Transforming Business Process Automation wi...
 
Web design
Web designWeb design
Web design
 
SAP Portal Role-Based Navigation Models for Different Countries and Languages.
SAP Portal Role-Based Navigation Models for Different Countries and Languages.SAP Portal Role-Based Navigation Models for Different Countries and Languages.
SAP Portal Role-Based Navigation Models for Different Countries and Languages.
 
Design portfolio sarvesh satam
Design portfolio sarvesh satamDesign portfolio sarvesh satam
Design portfolio sarvesh satam
 
Cropley & Girlie - Planning and Implementing a Successful Mobile App
Cropley & Girlie - Planning and Implementing a Successful Mobile AppCropley & Girlie - Planning and Implementing a Successful Mobile App
Cropley & Girlie - Planning and Implementing a Successful Mobile App
 
Smart migration Solution overview
Smart migration Solution overviewSmart migration Solution overview
Smart migration Solution overview
 
Project Management Software
Project Management SoftwareProject Management Software
Project Management Software
 
Van.Saini_CV
Van.Saini_CVVan.Saini_CV
Van.Saini_CV
 
A Semantic-web-based Decision Support System for Specific Degree Programs
A Semantic-web-based Decision Support System for Specific Degree ProgramsA Semantic-web-based Decision Support System for Specific Degree Programs
A Semantic-web-based Decision Support System for Specific Degree Programs
 
Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)Section 508 Compliance and Remediation Procdure_MMEdits (2)
Section 508 Compliance and Remediation Procdure_MMEdits (2)
 
Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?Web Annotations – A Game Changer for Language Technology?
Web Annotations – A Game Changer for Language Technology?
 
Roadmap to Frontend Development.pdf
Roadmap to Frontend Development.pdfRoadmap to Frontend Development.pdf
Roadmap to Frontend Development.pdf
 
Vinnie.Saini_CV
Vinnie.Saini_CVVinnie.Saini_CV
Vinnie.Saini_CV
 
Toc08 Goldthwaite Digitizing Your Backlist
Toc08 Goldthwaite Digitizing Your BacklistToc08 Goldthwaite Digitizing Your Backlist
Toc08 Goldthwaite Digitizing Your Backlist
 
KimaniKinyuaResume
KimaniKinyuaResumeKimaniKinyuaResume
KimaniKinyuaResume
 

Más de Europeana Newspapers

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana Newspapers
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana Newspapers
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayEuropeana Newspapers
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana Newspapers
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayEuropeana Newspapers
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers
 

Más de Europeana Newspapers (20)

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information Day
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
 
Enp lft infoday_neudecker
Enp lft infoday_neudeckerEnp lft infoday_neudecker
Enp lft infoday_neudecker
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
 
Europeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday KempfEuropeana Newspapers LFT Infoday Kempf
Europeana Newspapers LFT Infoday Kempf
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 

ENP Belgrade WS OLR @ CCS

  • 1. June 14, 2013 Page 1 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives CCS Content Conversion Specialists europeana newspapers Workshop Refinement and Quality Assessment, Belgrade 14.6.2013 OLR at CCS From unstructured to structured newspaper data and the role of content providers in the overall process
  • 2. June 14, 2013 Page 2 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Agenda  About CCS  General workflow for mass digitization of newspapers  OLR – Layout and structure analysis  ENP OLR workflow (involvement of CP‘s)  Quality assurance  Output - METS/ALTO package  Demo of first results
  • 3. June 14, 2013 Page 3 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives About CCS  CCS Content Conversion Specialists GmbH (Hamburg), as technical project partner, will provide its expertise and docWorks technology to set up and operate a mass digitisation workflow to create high quality structured content from 2 million scanned newspaper pages provided by 5 library partners  Page volume: BNF=1.000 k, NLE=500 k , SUB HH=480 k, NLF=90 k, SBB=10 k  The distributed OLR workflow enables the contribution of project partners (content providers) to the integrated quality assurance process  CCS will also contribute to the specification of the metadata model
  • 4. June 14, 2013 Page 4 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives General workflow for mass digitization Re-Scan Conversion Imaging Layout Analysis OCR ISR Reject Condition Delivery QA random Final Output Scanning Image Metadata Database ---------------- Repository Automated QA Document UID Barcode Item Tracking Manual QA •in-house •near-shore •off-shore •multiple locations Manual QA •in-house •near-shore Check in Check out Scanner •Robot- •Book- •Document- •Microfilm- QA+Correcti onQA+Correcti on QA + Correction Z 39.50 Metadata
  • 5. June 14, 2013 Page 5 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Layout and structure analysis  Layout analysis based on „bottom up“ approach  General rule system enables recognition of words, text lines, text blocks, columns and classification of text blocks, illustrations, advertisements, tables and the following page types: - title page (the title page of an issue) - content page (a page that consists of content/text only) - illustration page (a page that has at least one illustration) - advertisement page (a page that contains adverts only)  Structure analysis through classification of headlines and grouping of zones into articles (incl. article continuation)
  • 6. June 14, 2013 Page 6 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives ENP OLR workflow | Conversion without scanning Digital Image Metadata Delivery Digital Image Metadata Delivery Digital Object Return Digital Object Return Inspection / Automatic QA Inspection / Automatic QA Doc DeliveryDoc Delivery RejectReject Conversion facility Material location Conversion MD Recording
  • 7. June 14, 2013 Page 7 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Possible conversion scenarios A) Conversion at library (on-site) B) Conversion off-shore at CCS data center, final QA at the library via internet transfer (remote QA solution) C) Conversion off-shore at CCS, final QA at the library by backup shipment
  • 8. June 14, 2013 Page 8 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Scenario B | Remote QA at library Internet StorageStorage IN OUTPOOL dW Share Master Offshore Processing @ CCS OUTPUT METS ALTO StorageStorage POOL dW Share RQA QA on-site @ Library INPUT
  • 9. June 14, 2013 Page 9 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Quality assurance  @ CCS | Automated markup and basic manual correction: - headlines, illustrations, tables, captions, advertisements, etc. - article segmentation and grouping of zones into articles (incl. continuation)  @ Content Provider (Library) Recommended: - Zoning: correct classification of blocks as „text“ or „illustration“ - Article segmentation: correct identification of headlines/text blocks/captions - Grouping: correct gouping of blocks (text, illustration) to articles - Metadata: correct title, issue date and issue number Optional: - Page types: correct page types - Page numbers: correct page sequence - OCR: perform text correction of specific zones (e.g. headlines, captions)
  • 10. June 14, 2013 Page 10 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Output | METS/ALTO package  METS/ALTO metadata schemas to describe the structured digital ouput object  A newspaper issue processed in docWorks is converted into one METS XML file. It reflects the whole physical and logical structure, manages all links to the image files and the related ALTO XML files. ALTO is based on a standardized page description schema and contains all information of a page (print space, margins, coordinates, OCR results).  Benefits of structural markup: - better browsing and more precise text search - better access and display on tablet and mobile devices - automated article classification and clustering through data/text mining and linguistic technologies - user engagement for manual online text correction, article classification, annotation, building personal collections, etc. - sharing articles via social media platforms like Facebook, Twitter, etc. _______________ METS = Metadada Encoding and Transmission Standard ALTO = Analyzed Layout and Text Object
  • 11. June 14, 2013 Page 11 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Access and Presentation  Access through Europeana as well as content provider portals  Existing newspaper presentation systems at National Library of Australia (Trove), Library of Congress/NDNP (Chronicling America), Dutch National Library (DDD), National Library of Luxembourg (eLuxemburgensia), ...  Veridian demo: Example of a newspaper presentation system to demonstrate access to already processed ENP newspaper issues
  • 12. June 14, 2013 Page 12 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Questions + answers
  • 13. June 14, 2013 Page 13 Content Conversion Specialists WS Refinement and Quality Assessment Claus Gravenhorst Director Strategic Initiatives Contact Claus Gravenhorst Director Strategic Initiatives CCS Content Conversion Specialists GmbH Weidestr. 134 22083 Hamburg Germany c.gravenhorst@content-conversion.com www.content-conversion.com

Notas del editor

  1. DDD = Databank of Digital Daily newspapers