SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
Transkribus.
A research infrastructure for
transcribing, recognizing and searching
archival documents
Günter Mühlberger
University of Innsbruck,
Digitisation and Digital Preservation Group
Googeling?
Voorts Hooge Mogende heeren, Inde practyque vande decisie by uw Hoge
voorts hoomge Mogende heeren, Inde pra tiqune vande decasie by ende hoge
Mo: gedaen in t'voorJaer over t'vinden vande middelen syn verscheyden differenten
Mo: gedaen in t' voorJaer over t' vinden vande middelen syn verscheyden differenten
voorgevallen tusschen Stadt en Lande, als namentl ouer den voet ende
voorgevallen tusschen Stadt en Lande, als nanent ouer den voet ende
Neural Networks are taking over. (Ray Smith, Google)
Archives are starting to digitise their holdings.
BTW: Documents in archives are unique, were never published before and contain extremely
interesting content!
Digital Humanities are (big) data driven.
Volunteers and the crowd are willing to contribute to scientific and
cultural heritage projects.
 Research Infrastructure!
HUMANITIES
SCHOLARS
ARCHIVES
COMPUTER
SCIENCE
&
TECHNOLOGY
PROVIDERS
PUBLIC
VOLUNTEERS / CROWD
TRANS-
KRIBUS
Deliver
Documents
STORAGE
Work with
documents
EXPERT
INTERFACES TOOLS
Advanced search
WEB UI
Contribute
(Crowd-Sourcing)
Data
Technology
Enriched
documents
Get results
Wolpertinger, Bavarian mythical creature (is everything…)
READ
• H2020 e-Infrastructure Project
• Duration: 1.1.2016 – 30.6.2019
• Budget: 8,2 mill. EUR grant
• Coordinated by University of Innsbruck
• 14 partners and more than 20 institutions connected with an Memorandum of
Understanding
• Main objectives
• Foster research in Pattern Recognition, Machine Learning, Natural Language
Processing, Digital Humanities
• Set up a service platform (“Transkribus”) to make the technology available to
archives, scholars, public.
• Transform this research infrastructure into a permanent service
Transkribus as platform and as expert client
Documents in Transkribus
• Private
• All documents in Transkribus are first of all private – visible only to the “owner” of the
document
• Local
• For simple operations, but all services are available only for remote documents
• Remote
• Standard mode
• Stored on the servers of the University of Innsbruck
• Upload of documents
• HTTP
• PDF
• FTP
• METS Link
• Direkt download from repository
Documents directly loaded from the repository – one button!
Implemented by Intranda (Goobi Viewer)
Researchers can go “shopping” and collect documents from various
repositories and digital libraries in their private Transkribus collection
Transcribe text in a reliable, secure, and machine readable way = create
a scholarly transcription
 And use the text to train the HTR engines
Finished? Write an email…
Training process will be made available to the user as well
(but will need some time due to a lack of resources in Innsbruck)
HTR engine(s) – current implementation
• Hidden Markov Models - HMM (already available)
• Training takes some hours
• Recognition takes 20-60’ for one page
• Strong limitations on dictionary and resolution of images
• Recurrent Neural Networks - RNN (coming soon)
• Training takes some days
• Recognition takes less than 60’’ (!)
• No limitation on resolution of images
• Free choice of dictionary – less dependency
• Main limitations for both HTR engines
• Layout Analysis (“line finder”)
• Need for dictionaries
What to do with the automatically recognized
text?
• Measure results
• Correct the text
• Search in the full-text
• Invite people to support you in transcribing
Measure results with Character Error Rate (CER) and
Word Error Rate (WER)
Correct text
• Character Error Rates
• Above 20%  correction takes as long as keying, but readers who have
difficulties to decipher may benefit
• Above 10%  correction is faster, but experienced readers prefer to key
• Below 10%  correction is much faster and even experienced readers will
accept correction instead of keying
• Currently typical figures are 10% CER
• Under lab conditions significantly better results are already possible
Search full-text
• Private search, you will get only results from collections where you
are member
• Facetted search
• Configurable search
Share your documents among your working group, colleagues, students
and volunteers…
Export documents
• Various formats
• XML (PAGE)
• METS (Metadata Encoding and Transmission Standard – LoC)
• ALTO (Analyzed Layout and Text Object – LoC)
• DOCX
• TEI (Text Encoding Initiative)
• PDF
• Excel
• …
How to access services via machines?
Services in Transkribus are accessible via REST interface
What will come next?
• Table editor
• eLearning Interface
• Web-interface for simplified transcription (crowd-sourcing)
• Text2Image matching tool
• ScanApp
• …
Table editor
Define table as template  automatic matching
Export data as CSV or Excel file
User learns with real objects
Self-evaluation is based on simple metric: Word Error Rate
(the same as for the machine)
eLearning interface
Statistics
Web-interface
Every document in Transkribus will also be accessible via a web-
interface suitable to involve volunteers and the crowd
txt2img tool
Many printed or digital editions are available.
Automated matching may simplify the training data production.
(only good matches will be taken for training)
ScanApp
Researchers are enabled to use mobile phones as document scanners
(images are sent directly to their private Transkribus collection and archives may
benefit from this)
ScanApp
Try out?
We are happy to support you to set up test projects
Conclusion of a Memorandum of Understanding is a simple way to take
part in the project!
Credits
Hubert Alisade Hilde Boe Laurant Bolli Max Bryan Elaine Charwat Vincent
Christlein Sebastian Colutto Hervé Déjean Barbara Denicolo Markus Diem
Felix Dietrich Reko Etelävuori Stefan Fiel Basilis Gatos Beat Gnädinger Tobias
Grüning Vili Haukkovaara Gerhard Heyer Tobias Hodel Frederic Kaplan
Maria Kallio Istvan Kecskemeti Florian Kleber Roger Labahn Eva Lang Sören
Laube Gundram Leifert Georgios Louloudis Philip Kahle Rory McNicholl
Jean-Luc Meunier Johannes Michael Hannes Obermair Moises Pastor
Nathanael Philipp Hannelore Putz George Retsinas Veronica Romero Joan
Andreu Sanchez Robert Sablatnig Christian Sieber Giorgos Sfikas Philip
Schofield Louise Seaward Nikolaos Stamatopolous Tobias Strauss Melissa
Terras Alejandro Hector Toselli Enrique Vidal Mauricio Villegas Max
Weidemann Welf Wustlich Herbert Wurster and many, many more!
Thank you for your attention!
More information on the project and the Transkribus platform
http://read.transkribus.eu/
http://transkribus.eu/
http://transkribus.eu/wiki/
This project has received funding from the European Union’s
Horizon 2020 research and innovation programme under
grant agreement No 674943.

Más contenido relacionado

La actualidad más candente

DIGITAL LIBRARIES: WHITHER THOU GOEST?
DIGITAL LIBRARIES: WHITHER THOU GOEST? DIGITAL LIBRARIES: WHITHER THOU GOEST?
DIGITAL LIBRARIES: WHITHER THOU GOEST? IAEME Publication
 
Digital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectivesDigital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectivesCornelius Puschmann
 
Digital Library
Digital LibraryDigital Library
Digital LibraryM Gujjar
 
Libraries, research infrastructures and the digital humanities: are we ready ...
Libraries, research infrastructures and the digital humanities: are we ready ...Libraries, research infrastructures and the digital humanities: are we ready ...
Libraries, research infrastructures and the digital humanities: are we ready ...Sally Chambers
 
Digital libraries power point
Digital libraries power pointDigital libraries power point
Digital libraries power pointckdozier
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Olaf Janssen
 
People, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the WebPeople, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the WebTrevor Owens
 
Digital Humanities
Digital Humanities Digital Humanities
Digital Humanities Suman Das
 
National Digital Library
National Digital LibraryNational Digital Library
National Digital Libraryguesta45bc80
 
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014Marko Grobelnik
 
Creating a digital library
Creating a digital libraryCreating a digital library
Creating a digital libraryDebra Murphy
 
20100119 Ape Beyond And Far Beyond
20100119 Ape Beyond And Far Beyond20100119 Ape Beyond And Far Beyond
20100119 Ape Beyond And Far BeyondStefan Gradmann
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomsonpvhead123
 

La actualidad más candente (18)

Digital Library
Digital LibraryDigital Library
Digital Library
 
DIGITAL LIBRARIES: WHITHER THOU GOEST?
DIGITAL LIBRARIES: WHITHER THOU GOEST? DIGITAL LIBRARIES: WHITHER THOU GOEST?
DIGITAL LIBRARIES: WHITHER THOU GOEST?
 
Digital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectivesDigital Humanities and Internet Research: shared methods and perspectives
Digital Humanities and Internet Research: shared methods and perspectives
 
Digital Library
Digital LibraryDigital Library
Digital Library
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
 
Libraries, research infrastructures and the digital humanities: are we ready ...
Libraries, research infrastructures and the digital humanities: are we ready ...Libraries, research infrastructures and the digital humanities: are we ready ...
Libraries, research infrastructures and the digital humanities: are we ready ...
 
Digital libraries power point
Digital libraries power pointDigital libraries power point
Digital libraries power point
 
Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...Introduction to digital libraries - definitions, examples, concepts and trend...
Introduction to digital libraries - definitions, examples, concepts and trend...
 
People, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the WebPeople, Communities and Platforms: Digital Cultural Heritage and the Web
People, Communities and Platforms: Digital Cultural Heritage and the Web
 
Digital Library
Digital LibraryDigital Library
Digital Library
 
Digital Libraries
Digital LibrariesDigital Libraries
Digital Libraries
 
eBooks on demand
eBooks on demandeBooks on demand
eBooks on demand
 
Digital Humanities
Digital Humanities Digital Humanities
Digital Humanities
 
National Digital Library
National Digital LibraryNational Digital Library
National Digital Library
 
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
From Text To Reasoning - Marko Grobelnik - SWANK Workshop Stanford - 16 Apr 2014
 
Creating a digital library
Creating a digital libraryCreating a digital library
Creating a digital library
 
20100119 Ape Beyond And Far Beyond
20100119 Ape Beyond And Far Beyond20100119 Ape Beyond And Far Beyond
20100119 Ape Beyond And Far Beyond
 
Digital Humanities by Ingrid Thomson
Digital Humanities  by Ingrid ThomsonDigital Humanities  by Ingrid Thomson
Digital Humanities by Ingrid Thomson
 

Similar a Transkribus | Günter Mühlberger

Website designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practicesWebsite designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practicesCss Founder
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayEuropeana Newspapers
 
Digital Humanities Research
Digital Humanities ResearchDigital Humanities Research
Digital Humanities Researchelli.m
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine HarvesterTry PurpleSearch
 
Presentatie nbic2011templates
Presentatie nbic2011templatesPresentatie nbic2011templates
Presentatie nbic2011templatesthehyve
 
Ict uses in libraries
Ict uses in librariesIct uses in libraries
Ict uses in librariesLiaquat Rahoo
 
Doing DH in Theological Libraries
Doing DH in Theological LibrariesDoing DH in Theological Libraries
Doing DH in Theological LibrariesClifford Anderson
 
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering StandardsNavigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering StandardsLiz Grumbach
 

Similar a Transkribus | Günter Mühlberger (20)

co:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlbergerco:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlberger
 
Website designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practicesWebsite designing company_in_delhi_digitization practices
Website designing company_in_delhi_digitization practices
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
DLCS
DLCSDLCS
DLCS
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
co:op-READ-Convention Marburg - Sebastian Colutto
co:op-READ-Convention Marburg - Sebastian Coluttoco:op-READ-Convention Marburg - Sebastian Colutto
co:op-READ-Convention Marburg - Sebastian Colutto
 
Présentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information DayPrésentation Günter Mühlberger, BnF Information Day
Présentation Günter Mühlberger, BnF Information Day
 
Digital Content Management
Digital Content ManagementDigital Content Management
Digital Content Management
 
Digital Humanities Research
Digital Humanities ResearchDigital Humanities Research
Digital Humanities Research
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine Harvester
 
ITS Projects and Services Showcase - June 2013
ITS Projects and Services Showcase - June 2013ITS Projects and Services Showcase - June 2013
ITS Projects and Services Showcase - June 2013
 
Presentatie nbic2011templates
Presentatie nbic2011templatesPresentatie nbic2011templates
Presentatie nbic2011templates
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Irish Digital Libraries Summit
Irish Digital Libraries SummitIrish Digital Libraries Summit
Irish Digital Libraries Summit
 
Ict uses in libraries
Ict uses in librariesIct uses in libraries
Ict uses in libraries
 
Doing DH in Theological Libraries
Doing DH in Theological LibrariesDoing DH in Theological Libraries
Doing DH in Theological Libraries
 
An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013An Introduction to Force11 at WWW2013
An Introduction to Force11 at WWW2013
 
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering StandardsNavigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
Navigating the Storm: eMOP, Big DH Projects, and Agile Steering Standards
 

Más de Netwerk Oorlogsbronnen

Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...
Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...
Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...Netwerk Oorlogsbronnen
 
Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019
Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019
Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019Netwerk Oorlogsbronnen
 
Sessie II Collecting Collections | Netwerkdag 2019| Michael Hoffmann
Sessie II Collecting Collections | Netwerkdag 2019| Michael HoffmannSessie II Collecting Collections | Netwerkdag 2019| Michael Hoffmann
Sessie II Collecting Collections | Netwerkdag 2019| Michael HoffmannNetwerk Oorlogsbronnen
 
Sessie II Collecting Collections | Netwerkdag 2019| Janneke Jorna
Sessie II Collecting Collections | Netwerkdag 2019| Janneke JornaSessie II Collecting Collections | Netwerkdag 2019| Janneke Jorna
Sessie II Collecting Collections | Netwerkdag 2019| Janneke JornaNetwerk Oorlogsbronnen
 
Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019
Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019
Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019Netwerk Oorlogsbronnen
 
III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...
III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...
III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...Netwerk Oorlogsbronnen
 
Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...
Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...
Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...Netwerk Oorlogsbronnen
 
Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019
Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019
Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019Netwerk Oorlogsbronnen
 
De bevrijding van Nijmegen in woord en beeld | Joost Rosendaal
De bevrijding van Nijmegen in woord en beeld | Joost RosendaalDe bevrijding van Nijmegen in woord en beeld | Joost Rosendaal
De bevrijding van Nijmegen in woord en beeld | Joost RosendaalNetwerk Oorlogsbronnen
 
Voetbal in Puinland | Jurryt van de Vooren
Voetbal in Puinland | Jurryt van de VoorenVoetbal in Puinland | Jurryt van de Vooren
Voetbal in Puinland | Jurryt van de VoorenNetwerk Oorlogsbronnen
 
Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...
Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...
Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...Netwerk Oorlogsbronnen
 
Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...
Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...
Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...Netwerk Oorlogsbronnen
 
Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...
Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...
Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...Netwerk Oorlogsbronnen
 
Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...
Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...
Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...Netwerk Oorlogsbronnen
 
Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...
Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...
Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...Netwerk Oorlogsbronnen
 
De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...
De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...
De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...Netwerk Oorlogsbronnen
 
De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019
De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019
De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019Netwerk Oorlogsbronnen
 
Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...
Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...
Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...Netwerk Oorlogsbronnen
 
Patatje Oorlog | KNVI jaarcongres | 13 december 2018
Patatje Oorlog | KNVI jaarcongres | 13 december 2018Patatje Oorlog | KNVI jaarcongres | 13 december 2018
Patatje Oorlog | KNVI jaarcongres | 13 december 2018Netwerk Oorlogsbronnen
 

Más de Netwerk Oorlogsbronnen (20)

Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...
Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...
Webinar Huis voor de Kunsten Limburg | Lizzy Jongma & Edwin Klijn | Netwerk O...
 
Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019
Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019
Componisten in verzet | Leo Smit Stichting | Netwerkdag 2019
 
Sessie II Collecting Collections | Netwerkdag 2019| Michael Hoffmann
Sessie II Collecting Collections | Netwerkdag 2019| Michael HoffmannSessie II Collecting Collections | Netwerkdag 2019| Michael Hoffmann
Sessie II Collecting Collections | Netwerkdag 2019| Michael Hoffmann
 
Sessie II Collecting Collections | Netwerkdag 2019| Janneke Jorna
Sessie II Collecting Collections | Netwerkdag 2019| Janneke JornaSessie II Collecting Collections | Netwerkdag 2019| Janneke Jorna
Sessie II Collecting Collections | Netwerkdag 2019| Janneke Jorna
 
Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019
Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019
Facebook van Vervolging | Lizzy Jongma | Netwerkdag Oorlogsbronnen 2019
 
III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...
III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...
III Het mijnenveld van datamining | Joost Gijselman | Netwerkdag Oorlogsbronn...
 
Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...
Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...
Sessie I Oorlogsbronnen in een nieuw jasje | Perspekt | Netwerkdag Oorlogsbro...
 
Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019
Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019
Kreatief met Kollecties | Edwin Klijn | Netwerkdag Oorlogsbronnen 2019
 
De bevrijding van Nijmegen in woord en beeld | Joost Rosendaal
De bevrijding van Nijmegen in woord en beeld | Joost RosendaalDe bevrijding van Nijmegen in woord en beeld | Joost Rosendaal
De bevrijding van Nijmegen in woord en beeld | Joost Rosendaal
 
Voetbal in Puinland | Jurryt van de Vooren
Voetbal in Puinland | Jurryt van de VoorenVoetbal in Puinland | Jurryt van de Vooren
Voetbal in Puinland | Jurryt van de Vooren
 
Van archief tot verhaal | Edwin Klijn
Van archief tot verhaal | Edwin KlijnVan archief tot verhaal | Edwin Klijn
Van archief tot verhaal | Edwin Klijn
 
Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...
Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...
Workshop AI en u | Edwin Klijn en Rutger van Koert | TRIADO slotcongres| 13 s...
 
Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...
Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...
Kamp Westerbork 1945-1971 | Guido Abuys | Noordelijke Netwerkdag Oorlogsbronn...
 
Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...
Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...
Kamp De Beetse na de bevrijding | Jochem Abbes | Noordelijke Netwerkdag Oorlo...
 
Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...
Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...
Van laboratorium naar praktijk | Edwin Klijn | Noordelijke Netwerkdag Oorlogs...
 
Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...
Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...
Fries Verzetsmuseum online | Nynke Kuipers | Noordelijke Netwerkdag Oorlogsbr...
 
De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...
De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...
De Holocaust als nieuws | Huub Wijfjes | Noordelijke Netwerkdag Oorlogsbronne...
 
De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019
De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019
De meerwaarde van samenwerking bij data- en contentdonatie | WikiconNL 2019
 
Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...
Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...
Startbijeenkomst Crowdsouringproject 'Rotterdamse arrestanten' | Tessa Free |...
 
Patatje Oorlog | KNVI jaarcongres | 13 december 2018
Patatje Oorlog | KNVI jaarcongres | 13 december 2018Patatje Oorlog | KNVI jaarcongres | 13 december 2018
Patatje Oorlog | KNVI jaarcongres | 13 december 2018
 

Último

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 

Último (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 

Transkribus | Günter Mühlberger

  • 1. Transkribus. A research infrastructure for transcribing, recognizing and searching archival documents Günter Mühlberger University of Innsbruck, Digitisation and Digital Preservation Group
  • 3. Voorts Hooge Mogende heeren, Inde practyque vande decisie by uw Hoge voorts hoomge Mogende heeren, Inde pra tiqune vande decasie by ende hoge
  • 4. Mo: gedaen in t'voorJaer over t'vinden vande middelen syn verscheyden differenten Mo: gedaen in t' voorJaer over t' vinden vande middelen syn verscheyden differenten
  • 5. voorgevallen tusschen Stadt en Lande, als namentl ouer den voet ende voorgevallen tusschen Stadt en Lande, als nanent ouer den voet ende
  • 6.
  • 7. Neural Networks are taking over. (Ray Smith, Google)
  • 8. Archives are starting to digitise their holdings. BTW: Documents in archives are unique, were never published before and contain extremely interesting content!
  • 9. Digital Humanities are (big) data driven.
  • 10. Volunteers and the crowd are willing to contribute to scientific and cultural heritage projects.
  • 12. HUMANITIES SCHOLARS ARCHIVES COMPUTER SCIENCE & TECHNOLOGY PROVIDERS PUBLIC VOLUNTEERS / CROWD TRANS- KRIBUS Deliver Documents STORAGE Work with documents EXPERT INTERFACES TOOLS Advanced search WEB UI Contribute (Crowd-Sourcing) Data Technology Enriched documents Get results
  • 13. Wolpertinger, Bavarian mythical creature (is everything…)
  • 14.
  • 15. READ • H2020 e-Infrastructure Project • Duration: 1.1.2016 – 30.6.2019 • Budget: 8,2 mill. EUR grant • Coordinated by University of Innsbruck • 14 partners and more than 20 institutions connected with an Memorandum of Understanding • Main objectives • Foster research in Pattern Recognition, Machine Learning, Natural Language Processing, Digital Humanities • Set up a service platform (“Transkribus”) to make the technology available to archives, scholars, public. • Transform this research infrastructure into a permanent service
  • 16. Transkribus as platform and as expert client
  • 17.
  • 18. Documents in Transkribus • Private • All documents in Transkribus are first of all private – visible only to the “owner” of the document • Local • For simple operations, but all services are available only for remote documents • Remote • Standard mode • Stored on the servers of the University of Innsbruck • Upload of documents • HTTP • PDF • FTP • METS Link • Direkt download from repository
  • 19.
  • 20.
  • 21. Documents directly loaded from the repository – one button! Implemented by Intranda (Goobi Viewer)
  • 22.
  • 23. Researchers can go “shopping” and collect documents from various repositories and digital libraries in their private Transkribus collection
  • 24. Transcribe text in a reliable, secure, and machine readable way = create a scholarly transcription  And use the text to train the HTR engines
  • 25.
  • 26.
  • 27.
  • 28. Finished? Write an email… Training process will be made available to the user as well (but will need some time due to a lack of resources in Innsbruck)
  • 29. HTR engine(s) – current implementation • Hidden Markov Models - HMM (already available) • Training takes some hours • Recognition takes 20-60’ for one page • Strong limitations on dictionary and resolution of images • Recurrent Neural Networks - RNN (coming soon) • Training takes some days • Recognition takes less than 60’’ (!) • No limitation on resolution of images • Free choice of dictionary – less dependency • Main limitations for both HTR engines • Layout Analysis (“line finder”) • Need for dictionaries
  • 30.
  • 31.
  • 32.
  • 33. What to do with the automatically recognized text? • Measure results • Correct the text • Search in the full-text • Invite people to support you in transcribing
  • 34. Measure results with Character Error Rate (CER) and Word Error Rate (WER)
  • 35.
  • 36. Correct text • Character Error Rates • Above 20%  correction takes as long as keying, but readers who have difficulties to decipher may benefit • Above 10%  correction is faster, but experienced readers prefer to key • Below 10%  correction is much faster and even experienced readers will accept correction instead of keying • Currently typical figures are 10% CER • Under lab conditions significantly better results are already possible
  • 37. Search full-text • Private search, you will get only results from collections where you are member • Facetted search • Configurable search
  • 38.
  • 39.
  • 40.
  • 41. Share your documents among your working group, colleagues, students and volunteers…
  • 42.
  • 43. Export documents • Various formats • XML (PAGE) • METS (Metadata Encoding and Transmission Standard – LoC) • ALTO (Analyzed Layout and Text Object – LoC) • DOCX • TEI (Text Encoding Initiative) • PDF • Excel • …
  • 44.
  • 45. How to access services via machines? Services in Transkribus are accessible via REST interface
  • 46.
  • 47. What will come next? • Table editor • eLearning Interface • Web-interface for simplified transcription (crowd-sourcing) • Text2Image matching tool • ScanApp • …
  • 48. Table editor Define table as template  automatic matching Export data as CSV or Excel file
  • 49.
  • 50. User learns with real objects Self-evaluation is based on simple metric: Word Error Rate (the same as for the machine) eLearning interface
  • 51.
  • 52.
  • 54. Web-interface Every document in Transkribus will also be accessible via a web- interface suitable to involve volunteers and the crowd
  • 55.
  • 56. txt2img tool Many printed or digital editions are available. Automated matching may simplify the training data production. (only good matches will be taken for training)
  • 57. ScanApp Researchers are enabled to use mobile phones as document scanners (images are sent directly to their private Transkribus collection and archives may benefit from this)
  • 59. Try out? We are happy to support you to set up test projects Conclusion of a Memorandum of Understanding is a simple way to take part in the project!
  • 60. Credits Hubert Alisade Hilde Boe Laurant Bolli Max Bryan Elaine Charwat Vincent Christlein Sebastian Colutto Hervé Déjean Barbara Denicolo Markus Diem Felix Dietrich Reko Etelävuori Stefan Fiel Basilis Gatos Beat Gnädinger Tobias Grüning Vili Haukkovaara Gerhard Heyer Tobias Hodel Frederic Kaplan Maria Kallio Istvan Kecskemeti Florian Kleber Roger Labahn Eva Lang Sören Laube Gundram Leifert Georgios Louloudis Philip Kahle Rory McNicholl Jean-Luc Meunier Johannes Michael Hannes Obermair Moises Pastor Nathanael Philipp Hannelore Putz George Retsinas Veronica Romero Joan Andreu Sanchez Robert Sablatnig Christian Sieber Giorgos Sfikas Philip Schofield Louise Seaward Nikolaos Stamatopolous Tobias Strauss Melissa Terras Alejandro Hector Toselli Enrique Vidal Mauricio Villegas Max Weidemann Welf Wustlich Herbert Wurster and many, many more!
  • 61. Thank you for your attention! More information on the project and the Transkribus platform http://read.transkribus.eu/ http://transkribus.eu/ http://transkribus.eu/wiki/ This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 674943.