SlideShare una empresa de Scribd logo
1 de 19
On the Two Sides of the Pond
By Hans-Jörg Lieder,

Head of the Department of Bibliographic Services – Union Catalogue of Serials
Staatsbibliothek zu Berlin - Preußischer Kulturbesitz;

Dr. Katalin Radics,

Distinguished Librarian; Librarian of the West European Collections and Classics
Young Research Library, University of California, Los Angeles
UNIQUE EUROPEAN MATERIALS – HELD IN A US LIBRARY
Partnership
between the UCLA Library and Staatsbibliothek zu Berlin
Newspapers on the way to discoloring and disintegration
Storage facility of the University of California Libraries on
the UCLA campus
- Leaflets 13”x18.5” or 33cm x 47cm
- Imprint indicating the title, date, the number of the issue;
warning
-Published four or five times a day
UCLA stamps including receiving dates
Packed in wrapping paper probably
after 1940, packages of 700-800 sheets
No documentation (ordering or receiving
records) in the library archives; no
correspondence
Normal serial subscription scheme (?)
Very minimal cataloging record – very
low use
Towards a Weeding Decision
Brittle condition
Check for other holdings in California, US and World libraries
OCLC – no other holdings at the time of checking
Nine 1938 issues at BNF
No holding at the German National Library (Deutsche
Nationalbibliothek)
Contact with head of Zeitungsabteilung, Staatsbibliothek – no
holding in Germany
UNIQUE!!!
Decision: keep and preserve the UCLA holdings.
Keep and Preserve
9600 pages
1936-1940 with gaps
Acid-free boxes
The most fragile pages in mylar
Digitization Project
Funding for digitization
Highest quality resolution: 600 dpi
RGB
Add minimal metadata
Title
Deutsches Nachrichtenbüro. 5 Jahrg., Nr. 1581, 1938 October 1,
Erste Morgen-Ausgabe
Alt ID
3813183_1938-10-01_1581 [Local]
AltTitle
Erste Morgen-Ausgabe [Descriptive]
Deutsches Nachrichtenbüro [Descriptive]
Date
October 1, 1938 [Publication]
1938-10-01 [Normalized]
Format
1 p. [Extent]
Language
ger
Name
University of California, Los Angeles. Library. Dept. of Special
Collections [Repository]
Type
newspapers [Genre]
text [Type Of Resource]
Digitized copies: part of UCLA Digital Library at
http://digital2.library.ucla.edu/ -- freely accessible
Searchable only by date
More sophisticated searching capability needed – day by day chronicle of the
Third Reich for a short period of time
-events
-names
-institutions etc.
Deutsches Nachrichten Büro – December 5, 1933network of 36 local services (Landesdienste)
Indexing needed
Fraktur – major problem
Transliteration into Latin characters
OCR (Optical Character Recognition) – has to be made in Germany

Looking for a German
Partner
Not a problem … here we are!
… but who are “we”?
• Project: Europeana Newspapers: http://www.europeana-newspapers.eu/
• 18 partners from 12 countries
• Tasks:
• Provide OCR for 18 million pages
• Provide OLR for 2 million pages
• Provide NER experimentally in assorted languages
• Provide best practice recommendations for newspaper metadata
• Provide quality prediction tools
• Aggregate content and make it available to TEL and Europeana
OCR = Optical Character Recognition
OLR = Optical Layout Recognition
NER = Named Entities Recognition
A Dance of Acronyms:
UCLA, SBB and CCS
UCLA sent data on hard drive
SBB
• Checked data for correctness and moved images into directory
structure
• Sent data to CCS in Hamburg for OCR and OLR
CCS (Content Conversion Specialists)
• Created full texts per article
• Stuck data in NZ web service for preliminary presentation purposes
SBB
• Will perform QA of OCR and OLR results
• Will provide all data to UCLA for further use
• Will present data in ZEFYS, its own newspaper portal; to the
Deutsche Digitale Bibliothek; to TEL (The European Library) and to
Europeana
Layout and structure analysis
 recognition of words, text lines, text blocks,
columns and classification of text blocks,
illustrations, advertisements, tables and the
following page types:
- title page (the title page of an issue)
- content page (a page that consists of content/text only)
- illustration page (a page that has at least one illustration)
- advertisement page (a page that contains adverts only)

 Structure analysis through classification of
headlines and grouping of zones into articles
(incl. article continuation)
ENP OLR workflow | Conversion without scanning
Digital Image
Digital Image
Metadata
Metadata
Delivery
Delivery

Digital Object
Digital Object
Return
Return

Material location
Conversion facility

Inspection //
Inspection
Automatic QA
Automatic QA

Conversion
MD Recording

Reject
Reject

Doc Delivery
Doc Delivery
Quality assurance


@ CCS | Automated markup and basic manual correction:
- headlines, illustrations, tables, captions, advertisements, etc.
- article segmentation and grouping of zones into articles (incl. continuation)



@ Content Provider (Library)
Recommended:
- Zoning: correct classification of blocks as „text“ or „illustration“
- Article segmentation: correct identification of headlines/text blocks/captions
- Grouping: correct gouping of blocks (text, illustration) to articles
- Metadata: correct title, issue date and issue number
Optional:
- Page types: correct page types
- Page numbers: correct page sequence
- OCR: perform text correction of specific zones (e.g. headlines, captions)
Output | METS/ALTO package


METS/ALTO metadata schemas to describe the structured digital output object



A newspaper issue processed in docWorks is converted into one METS XML
file. It reflects the whole physical and logical structure, manages all links to the
image files and the related ALTO XML files. ALTO is based on a standardized
page description schema and contains all information of a page (print space,
margins, coordinates, OCR results).



Benefits of structural markup:
- better browsing and more precise text search
- better access and display on tablet and mobile devices
- automated article classification and clustering through data/text mining and
linguistic technologies
- user engagement for manual online text correction, article classification,
annotation, building personal collections, etc.
- sharing articles via social media platforms like Facebook, Twitter, etc.
_______________
METS = Metadada Encoding and Transmission Standard
ALTO = Analyzed Layout and Text Object

Más contenido relacionado

Destacado

The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers ProjectEuropeana Newspapers
 
ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionEuropeana Newspapers
 
Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday Europeana Newspapers
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers
 
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja Europeana Newspapers
 
Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Europeana Newspapers
 
eluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspaperseluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspapersEuropeana Newspapers
 

Destacado (13)

The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers Project
 
ENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introductionENP Belgrade WS refinement introduction
ENP Belgrade WS refinement introduction
 
Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday Europeana Newspapers Project - German infoday
Europeana Newspapers Project - German infoday
 
Web services uddi
Web services uddiWeb services uddi
Web services uddi
 
What is a named entity
What is a named entityWhat is a named entity
What is a named entity
 
Trtovac, dakic, september 2012
Trtovac, dakic, september 2012Trtovac, dakic, september 2012
Trtovac, dakic, september 2012
 
ENP Belgrade WS Metadata
ENP Belgrade WS MetadataENP Belgrade WS Metadata
ENP Belgrade WS Metadata
 
ENP Belgrade WS Introduction
ENP Belgrade WS IntroductionENP Belgrade WS Introduction
ENP Belgrade WS Introduction
 
ENP_SEEDI_2013_UB
ENP_SEEDI_2013_UBENP_SEEDI_2013_UB
ENP_SEEDI_2013_UB
 
Europeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introductionEuropeana Newspapers Amsterdam workshop introduction
Europeana Newspapers Amsterdam workshop introduction
 
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
Europeana Newspapers: novo mesto susreta korisnika digitalnih sadržaja
 
Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser Challenges and solutions in creating a european historic newspapers browser
Challenges and solutions in creating a european historic newspapers browser
 
eluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspaperseluxemburgensia: the portal for Luxembourg's historic newspapers
eluxemburgensia: the portal for Luxembourg's historic newspapers
 

Similar a On the two sides of the pond

Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?Charleston Conference
 
Links and Entities: The Library Data Revolution
Links and Entities: The Library Data RevolutionLinks and Entities: The Library Data Revolution
Links and Entities: The Library Data RevolutionOCLC
 
Book of the Dead Project
Book of the Dead ProjectBook of the Dead Project
Book of the Dead ProjectBarry Norton
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...jessica666
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?cneudecker
 
State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009William Fee
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...Francesco Spagnolo
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER Europe
 
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...Juliya Borie
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectEuropeana Newspapers
 

Similar a On the two sides of the pond (20)

Data Mining Newspapers Metadata
Data Mining Newspapers MetadataData Mining Newspapers Metadata
Data Mining Newspapers Metadata
 
Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?Which Data Quality is Needed and Affordable?
Which Data Quality is Needed and Affordable?
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
Links and Entities: The Library Data Revolution
Links and Entities: The Library Data RevolutionLinks and Entities: The Library Data Revolution
Links and Entities: The Library Data Revolution
 
Book of the Dead Project
Book of the Dead ProjectBook of the Dead Project
Book of the Dead Project
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
Publishers' Bindings Online and The Artistic, Cultural, and Historical Signif...
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?
 
State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009State Library of Pennsylvania Cataloging PALA 2009
State Library of Pennsylvania Cataloging PALA 2009
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
Bridging The ALM Divide: An Integrated Archive-Library-Museum Approach for Hy...
 
Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...
Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...
Bibliotheca Digitalis Summer School: Bibliographic data – Definition, Structu...
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Links and Entities
Links and EntitiesLinks and Entities
Links and Entities
 
The Ground Truth: Arabic Scientific Manuscripts Workshop
The Ground Truth: Arabic Scientific Manuscripts WorkshopThe Ground Truth: Arabic Scientific Manuscripts Workshop
The Ground Truth: Arabic Scientific Manuscripts Workshop
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
NASIG Webinar 2014 "From Record-Bound to Boundless: FRBR, Linked Data and New...
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 

Más de Europeana Newspapers

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisEuropeana Newspapers
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayEuropeana Newspapers
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayEuropeana Newspapers
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana Newspapers
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayEuropeana Newspapers
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayEuropeana Newspapers
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayEuropeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers
 

Más de Europeana Newspapers (20)

Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in ParisPresentation of Philippe Mezzasalma at the BnF Information Day in Paris
Presentation of Philippe Mezzasalma at the BnF Information Day in Paris
 
Presentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information DayPresentation of Ioannis Anagnostopoulos at BnF Information Day
Presentation of Ioannis Anagnostopoulos at BnF Information Day
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information Day
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
Presentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information DayPresentation of Alaa Abi Haidar at the BnF Information Day
Presentation of Alaa Abi Haidar at the BnF Information Day
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Europeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne KoutsEuropeana Newspapers Estonian Infoday Ragne Kouts
Europeana Newspapers Estonian Infoday Ragne Kouts
 
Europeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel VeimannEuropeana Newspapers Estonian Infoday Kristel Veimann
Europeana Newspapers Estonian Infoday Kristel Veimann
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Europeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista AruEuropeana Newspapers Estonian Infoday Krista Aru
Europeana Newspapers Estonian Infoday Krista Aru
 
Europeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred PussEuropeana Newspapers Estonian Infoday Fred Puss
Europeana Newspapers Estonian Infoday Fred Puss
 
Europeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday NeudeckerEuropeana Newpapers LFT Infoday Neudecker
Europeana Newpapers LFT Infoday Neudecker
 
Europeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday ThompsonEuropeana Newspapers LFT Infoday Thompson
Europeana Newspapers LFT Infoday Thompson
 
Europeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday RossiEuropeana Newspapers LFT Infoday Rossi
Europeana Newspapers LFT Infoday Rossi
 
Enp lft infoday_neudecker
Enp lft infoday_neudeckerEnp lft infoday_neudecker
Enp lft infoday_neudecker
 
Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Europeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday MessinaEuropeana Newspapers LFT Infoday Messina
Europeana Newspapers LFT Infoday Messina
 
Europeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday MarchettiEuropeana Newspapers Infoday Marchetti
Europeana Newspapers Infoday Marchetti
 

Último

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

On the two sides of the pond

  • 1. On the Two Sides of the Pond By Hans-Jörg Lieder, Head of the Department of Bibliographic Services – Union Catalogue of Serials Staatsbibliothek zu Berlin - Preußischer Kulturbesitz; Dr. Katalin Radics, Distinguished Librarian; Librarian of the West European Collections and Classics Young Research Library, University of California, Los Angeles
  • 2. UNIQUE EUROPEAN MATERIALS – HELD IN A US LIBRARY
  • 3. Partnership between the UCLA Library and Staatsbibliothek zu Berlin
  • 4. Newspapers on the way to discoloring and disintegration Storage facility of the University of California Libraries on the UCLA campus
  • 5. - Leaflets 13”x18.5” or 33cm x 47cm - Imprint indicating the title, date, the number of the issue; warning -Published four or five times a day
  • 6. UCLA stamps including receiving dates Packed in wrapping paper probably after 1940, packages of 700-800 sheets No documentation (ordering or receiving records) in the library archives; no correspondence Normal serial subscription scheme (?) Very minimal cataloging record – very low use
  • 7. Towards a Weeding Decision Brittle condition Check for other holdings in California, US and World libraries OCLC – no other holdings at the time of checking Nine 1938 issues at BNF No holding at the German National Library (Deutsche Nationalbibliothek) Contact with head of Zeitungsabteilung, Staatsbibliothek – no holding in Germany UNIQUE!!! Decision: keep and preserve the UCLA holdings.
  • 8. Keep and Preserve 9600 pages 1936-1940 with gaps Acid-free boxes The most fragile pages in mylar
  • 9. Digitization Project Funding for digitization Highest quality resolution: 600 dpi RGB Add minimal metadata
  • 10. Title Deutsches Nachrichtenbüro. 5 Jahrg., Nr. 1581, 1938 October 1, Erste Morgen-Ausgabe Alt ID 3813183_1938-10-01_1581 [Local] AltTitle Erste Morgen-Ausgabe [Descriptive] Deutsches Nachrichtenbüro [Descriptive] Date October 1, 1938 [Publication] 1938-10-01 [Normalized] Format 1 p. [Extent] Language ger Name University of California, Los Angeles. Library. Dept. of Special Collections [Repository] Type newspapers [Genre] text [Type Of Resource]
  • 11. Digitized copies: part of UCLA Digital Library at http://digital2.library.ucla.edu/ -- freely accessible Searchable only by date More sophisticated searching capability needed – day by day chronicle of the Third Reich for a short period of time -events -names -institutions etc. Deutsches Nachrichten Büro – December 5, 1933network of 36 local services (Landesdienste)
  • 12. Indexing needed Fraktur – major problem Transliteration into Latin characters OCR (Optical Character Recognition) – has to be made in Germany Looking for a German Partner
  • 13. Not a problem … here we are!
  • 14. … but who are “we”? • Project: Europeana Newspapers: http://www.europeana-newspapers.eu/ • 18 partners from 12 countries • Tasks: • Provide OCR for 18 million pages • Provide OLR for 2 million pages • Provide NER experimentally in assorted languages • Provide best practice recommendations for newspaper metadata • Provide quality prediction tools • Aggregate content and make it available to TEL and Europeana OCR = Optical Character Recognition OLR = Optical Layout Recognition NER = Named Entities Recognition
  • 15. A Dance of Acronyms: UCLA, SBB and CCS UCLA sent data on hard drive SBB • Checked data for correctness and moved images into directory structure • Sent data to CCS in Hamburg for OCR and OLR CCS (Content Conversion Specialists) • Created full texts per article • Stuck data in NZ web service for preliminary presentation purposes SBB • Will perform QA of OCR and OLR results • Will provide all data to UCLA for further use • Will present data in ZEFYS, its own newspaper portal; to the Deutsche Digitale Bibliothek; to TEL (The European Library) and to Europeana
  • 16. Layout and structure analysis  recognition of words, text lines, text blocks, columns and classification of text blocks, illustrations, advertisements, tables and the following page types: - title page (the title page of an issue) - content page (a page that consists of content/text only) - illustration page (a page that has at least one illustration) - advertisement page (a page that contains adverts only)  Structure analysis through classification of headlines and grouping of zones into articles (incl. article continuation)
  • 17. ENP OLR workflow | Conversion without scanning Digital Image Digital Image Metadata Metadata Delivery Delivery Digital Object Digital Object Return Return Material location Conversion facility Inspection // Inspection Automatic QA Automatic QA Conversion MD Recording Reject Reject Doc Delivery Doc Delivery
  • 18. Quality assurance  @ CCS | Automated markup and basic manual correction: - headlines, illustrations, tables, captions, advertisements, etc. - article segmentation and grouping of zones into articles (incl. continuation)  @ Content Provider (Library) Recommended: - Zoning: correct classification of blocks as „text“ or „illustration“ - Article segmentation: correct identification of headlines/text blocks/captions - Grouping: correct gouping of blocks (text, illustration) to articles - Metadata: correct title, issue date and issue number Optional: - Page types: correct page types - Page numbers: correct page sequence - OCR: perform text correction of specific zones (e.g. headlines, captions)
  • 19. Output | METS/ALTO package  METS/ALTO metadata schemas to describe the structured digital output object  A newspaper issue processed in docWorks is converted into one METS XML file. It reflects the whole physical and logical structure, manages all links to the image files and the related ALTO XML files. ALTO is based on a standardized page description schema and contains all information of a page (print space, margins, coordinates, OCR results).  Benefits of structural markup: - better browsing and more precise text search - better access and display on tablet and mobile devices - automated article classification and clustering through data/text mining and linguistic technologies - user engagement for manual online text correction, article classification, annotation, building personal collections, etc. - sharing articles via social media platforms like Facebook, Twitter, etc. _______________ METS = Metadada Encoding and Transmission Standard ALTO = Analyzed Layout and Text Object