SlideShare una empresa de Scribd logo
1 de 17
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
An Experimental Workflow Development 
Platform for Historical Document Digitisation 
Clemens Neudecker, KB National Library of the Netherlands
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Background 
 IMPACT – Improving Access to Text (2008 – 2011) 
From a technical perspective: 
> 20 software components for solving specific issues 
Prototyping new algorithms, improving commercial solutions 
Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) 
+ 3rd party applications 
“One ring to rule them all…” 
 IMPACT Interoperability Framework (IIF)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Main requirements 
Behavioural: 
 Minimize integration effort 
 Minimize deployment effort 
 Maximize usability 
 Maximize scalability 
Functional: 
 Modular 
 Transparent 
 Expandable 
 Open source 
 Platform independent
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Framework integration 
 Simple to use generic command line wrapper for web services
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Architecture 
 IMPACT Interoperability Framework: Technologies 
- Java 
- Apache Maven 
- Apache Tomcat 
- Apache Axis2+Synapse 
- Taverna Workflow Engine 
 IMPACT Interoperability Framework: Dataset 
- more than 600.000 images from digital libraries 
- more than 50.000 ground truth transcriptions
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Generic Web Service Wrapper 
Only requirement: Command Line Application  HTML form 
Source code available on github: 
https://github.com/impactcentre/toolwrapper 
 Easy integration: developers can focus on their application 
and have to worry less about integration = 
higher quality software components
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Workflows 
 OCR workflow = 
data pipeline 
 Building blocks = 
processing modules 
 Integration = 
interaction between 
nodes (mashups) 
 Collaboration with
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Workflow management 
 Web 2.0 style registry: myExperiment 
 Local client: Taverna Workbench 
 Web client: Project website
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Local client: Taverna Workbench 
 Background: 
BioSciences 
 Developed and 
maintained by 
myGrid, UK 
 Available for 
Windows/Linux/OSX 
and as open source 
(Java)
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Web client: Taverna Server/ 
Workflow Parser 
 SOAP/REST API 
 Remote execution of workflows
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Community 
 Web2.0 style workflow registry 
 Community of experts 
 Sharing of resources 
 Knowledge exchange 
 A central meeting point 
for users and researchers
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Compute cluster 
 Enterprise Service Bus 
receives requests from 
users and distributes 
the load to the available 
worker nodes 
 Main effect: 
Process parallelization, 
Load distribution, 
Fail over 
 Test deployment on Dutch Supercomputing Cloud HPC
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Dataset 
 Representative and annotated dataset of significant size, with 
metadata, ground truth and search facilities
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Evaluation features 
 Text based comparison of result with ground truth, 
using Levenshtein distance method 
 Layout based comparison of result with ground truth, 
using the Page Analysis And Ground Truth Elements Framework 
 Example:
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Outlook 
 Online service for testing/evaluation/processing 
Results Repository (WebDAV, POI) 
 Extending the scope: 
Workflows for linguistic analysis: CLARIN 
Workflows for preservation: SCAPE 
 Even better scalability: MapReduce/Hadoop 
 Supported by a community of developers & practitioners
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. 
Summary 
- Availability of resources (images, ground truth and tools) 
to the international research community 
- A common baseline for transparent evaluation and comparison 
- Ready-to-use components, reproducible experiments 
- Sharing of results and know-how 
- Enable scalability for prototypes/data intensive workflows 
- Simple and uniform user interface for all embedded tools 
- Consolidation of support and maintenance 
Thank you! 
Questions?

Más contenido relacionado

La actualidad más candente

Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02walter koch
 
G02 walter koch_micro_services
G02 walter koch_micro_servicesG02 walter koch_micro_services
G02 walter koch_micro_servicesevaminerva
 
Governments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overviewGovernments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overviewGovernments ENabled with IPv6
 
Up2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introductionUp2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introductionUp2Universe
 
Heterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth ProjectHeterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth ProjectBig Data Value Association
 
ICN in the IRTF and IETF
ICN in the IRTF and IETFICN in the IRTF and IETF
ICN in the IRTF and IETFDirk Kutscher
 
SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014Álvaro Sicilia
 
Polyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVMPolyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVMESUG
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsPieter Pauwels
 
Lime recommendation
Lime recommendationLime recommendation
Lime recommendationJohn Pereira
 
CV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect anCV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect anJurriaan Brandsma
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyValeria Pesce
 

La actualidad más candente (20)

Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02Amersfoort 2016 koch_wg_v02
Amersfoort 2016 koch_wg_v02
 
G02 walter koch_micro_services
G02 walter koch_micro_servicesG02 walter koch_micro_services
G02 walter koch_micro_services
 
FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2
 
Governments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overviewGovernments ENabled through IPv6 - GEN6 - project overview
Governments ENabled through IPv6 - GEN6 - project overview
 
Up2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introductionUp2U Workshop at TNC 2018-introduction
Up2U Workshop at TNC 2018-introduction
 
Fire at Net Futures2015
Fire at Net Futures2015Fire at Net Futures2015
Fire at Net Futures2015
 
Heterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth ProjectHeterogeneous HPC Computing in the DeepHealth Project
Heterogeneous HPC Computing in the DeepHealth Project
 
Deep Hybrid DataCloud
Deep Hybrid DataCloudDeep Hybrid DataCloud
Deep Hybrid DataCloud
 
AntoineLambertResume
AntoineLambertResumeAntoineLambertResume
AntoineLambertResume
 
FIRE at the ICT2015
FIRE at the ICT2015FIRE at the ICT2015
FIRE at the ICT2015
 
FIRE Brochure 2014 multimedia eBook -version
FIRE Brochure 2014 multimedia eBook -versionFIRE Brochure 2014 multimedia eBook -version
FIRE Brochure 2014 multimedia eBook -version
 
ICN in the IRTF and IETF
ICN in the IRTF and IETFICN in the IRTF and IETF
ICN in the IRTF and IETF
 
SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014SEMANCO poster at ESWC 2014
SEMANCO poster at ESWC 2014
 
Polyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVMPolyglot Notebooks with Squeak/Smalltalk on the GraalVM
Polyglot Notebooks with Squeak/Smalltalk on the GraalVM
 
TANGO Project Poster v2
TANGO Project Poster v2TANGO Project Poster v2
TANGO Project Poster v2
 
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF toolsCIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
CIB W78 Accelerating BIM Workshop 2015 - IFC2RDF tools
 
Lime recommendation
Lime recommendationLime recommendation
Lime recommendation
 
CV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect anCV Senior Integratie en Cloud Architect an
CV Senior Integratie en Cloud Architect an
 
The European Portal for documents and Archives: the APEnet Project
The European Portal for documents and Archives: the APEnet ProjectThe European Portal for documents and Archives: the APEnet Project
The European Portal for documents and Archives: the APEnet Project
 
VIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontologyVIVOCamp slides: agenda and slides on the extension of the ontology
VIVOCamp slides: agenda and slides on the extension of the ontology
 

Destacado

OCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTcneudecker
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshellcneudecker
 
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondcneudecker
 
Experimental Workflow Development in Digitisation
Experimental Workflow Development in DigitisationExperimental Workflow Development in Digitisation
Experimental Workflow Development in Digitisationcneudecker
 
Succeed 2nd hackathon
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathoncneudecker
 
Collaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanitiescneudecker
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspaperscneudecker
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Onlinecneudecker
 
Bessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognitioncneudecker
 
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...cneudecker
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Daycneudecker
 
Digitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliothekencneudecker
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoopcneudecker
 
Berliner DH Rundgang
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgangcneudecker
 
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...cneudecker
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Tavernacneudecker
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?cneudecker
 

Destacado (17)

OCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACTOCR challenges in historic documents and the contribution of IMPACT
OCR challenges in historic documents and the contribution of IMPACT
 
Europeana Newspapers in a nutshell
Europeana Newspapers in a nutshellEuropeana Newspapers in a nutshell
Europeana Newspapers in a nutshell
 
The IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyondThe IMPACT Interoperability Framework - Workflows for OCR and beyond
The IMPACT Interoperability Framework - Workflows for OCR and beyond
 
Experimental Workflow Development in Digitisation
Experimental Workflow Development in DigitisationExperimental Workflow Development in Digitisation
Experimental Workflow Development in Digitisation
 
Succeed 2nd hackathon
Succeed 2nd hackathonSucceed 2nd hackathon
Succeed 2nd hackathon
 
Collaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital HumanitiesCollaborative Workflow Development and Experimentation in the Digital Humanities
Collaborative Workflow Development and Experimentation in the Digital Humanities
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
Europeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers OnlineEuropeana Newspapers - the Gateway to European Newspapers Online
Europeana Newspapers - the Gateway to European Newspapers Online
 
Bessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity RecognitionBessere Suchergebnisse durch Named Entity Recognition
Bessere Suchergebnisse durch Named Entity Recognition
 
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
Climbing the Tower of Babel: Challenges and Opportunities in Multilingual Dat...
 
IMPACT HPC Cloud Day
IMPACT HPC Cloud DayIMPACT HPC Cloud Day
IMPACT HPC Cloud Day
 
Digitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in BibliothekenDigitale Kuratierungstechnologien in Bibliotheken
Digitale Kuratierungstechnologien in Bibliotheken
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Berliner DH Rundgang
Berliner DH RundgangBerliner DH Rundgang
Berliner DH Rundgang
 
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
Neudecker who-cares-about-yesterday’s-news-–-use-cases-and-requirements-for-n...
 
Preservation Workflows with Taverna
Preservation Workflows with TavernaPreservation Workflows with Taverna
Preservation Workflows with Taverna
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?
 

Similar a IMPACT at OCR Summit

IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KBcneudecker
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerBiblioteca Nacional de España
 
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerIMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerBiblioteca Nacional de España
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsPaolo Nesi
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibrarySven Schlarb
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
Wonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual WorkplaceWonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual Workplacevincenzo de simone
 
Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01walter koch
 
Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7Fire Conference 2010
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01The European Library
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudTU Delft, Netherlands
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdEOSC-hub project
 
RNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesRNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesEUBrasilCloudFORUM .
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesMichael Day
 

Similar a IMPACT at OCR Summit (20)

IMPACT Demo Dag at KB
IMPACT Demo Dag at KBIMPACT Demo Dag at KB
IMPACT Demo Dag at KB
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
 
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens NeudeckerIMPACT Interoperability and Evaluation Framework. Clemens Neudecker
IMPACT Interoperability and Evaluation Framework. Clemens Neudecker
 
ECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming artsECLAP White paper, social network for Cultural Heritage on Peforming arts
ECLAP White paper, social network for Cultural Heritage on Peforming arts
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Application scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National LibraryApplication scenarios of the SCAPE project at the Austrian National Library
Application scenarios of the SCAPE project at the Austrian National Library
 
Bne impact co_c
Bne impact co_cBne impact co_c
Bne impact co_c
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
Wonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual WorkplaceWonderland @ Cattid - Sun's Virtual Workplace
Wonderland @ Cattid - Sun's Virtual Workplace
 
Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01Cbsdl 2015 koch_wg_v01
Cbsdl 2015 koch_wg_v01
 
Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7Per Blixt - Fire results from call 5 and plans for call 7
Per Blixt - Fire results from call 5 and plans for call 7
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
 
FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2FIRE slideshow @ECFI-2
FIRE slideshow @ECFI-2
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowdFranco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
Franco Niccolucci: Example of an EOSCpilot Science Demonstrator - TextCrowd
 
IMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEPIMPACT Final Conference - Muehlberger - FEP
IMPACT Final Conference - Muehlberger - FEP
 
RNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challengesRNP Cloud Infrastructure model, services and challenges
RNP Cloud Infrastructure model, services and challenges
 
The Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiativesThe Improving Access to Text (IMPACT) project and other European initiatives
The Improving Access to Text (IMPACT) project and other European initiatives
 

Más de cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenzcneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-Dcneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspaperscneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Miningcneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltextecneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minutencneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlincneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 

Más de cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

IMPACT at OCR Summit

  • 1. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. An Experimental Workflow Development Platform for Historical Document Digitisation Clemens Neudecker, KB National Library of the Netherlands
  • 2. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Background  IMPACT – Improving Access to Text (2008 – 2011) From a technical perspective: > 20 software components for solving specific issues Prototyping new algorithms, improving commercial solutions Different frameworks (C, C++, Java, etc.), platforms (Win/Linux) + 3rd party applications “One ring to rule them all…”  IMPACT Interoperability Framework (IIF)
  • 3. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Main requirements Behavioural:  Minimize integration effort  Minimize deployment effort  Maximize usability  Maximize scalability Functional:  Modular  Transparent  Expandable  Open source  Platform independent
  • 4. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Framework integration  Simple to use generic command line wrapper for web services
  • 5. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Architecture  IMPACT Interoperability Framework: Technologies - Java - Apache Maven - Apache Tomcat - Apache Axis2+Synapse - Taverna Workflow Engine  IMPACT Interoperability Framework: Dataset - more than 600.000 images from digital libraries - more than 50.000 ground truth transcriptions
  • 6. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Generic Web Service Wrapper Only requirement: Command Line Application  HTML form Source code available on github: https://github.com/impactcentre/toolwrapper  Easy integration: developers can focus on their application and have to worry less about integration = higher quality software components
  • 7. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Workflows  OCR workflow = data pipeline  Building blocks = processing modules  Integration = interaction between nodes (mashups)  Collaboration with
  • 8. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
  • 9. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Workflow management  Web 2.0 style registry: myExperiment  Local client: Taverna Workbench  Web client: Project website
  • 10. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Local client: Taverna Workbench  Background: BioSciences  Developed and maintained by myGrid, UK  Available for Windows/Linux/OSX and as open source (Java)
  • 11. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Web client: Taverna Server/ Workflow Parser  SOAP/REST API  Remote execution of workflows
  • 12. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Community  Web2.0 style workflow registry  Community of experts  Sharing of resources  Knowledge exchange  A central meeting point for users and researchers
  • 13. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Compute cluster  Enterprise Service Bus receives requests from users and distributes the load to the available worker nodes  Main effect: Process parallelization, Load distribution, Fail over  Test deployment on Dutch Supercomputing Cloud HPC
  • 14. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Dataset  Representative and annotated dataset of significant size, with metadata, ground truth and search facilities
  • 15. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Evaluation features  Text based comparison of result with ground truth, using Levenshtein distance method  Layout based comparison of result with ground truth, using the Page Analysis And Ground Truth Elements Framework  Example:
  • 16. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Outlook  Online service for testing/evaluation/processing Results Repository (WebDAV, POI)  Extending the scope: Workflows for linguistic analysis: CLARIN Workflows for preservation: SCAPE  Even better scalability: MapReduce/Hadoop  Supported by a community of developers & practitioners
  • 17. IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. Summary - Availability of resources (images, ground truth and tools) to the international research community - A common baseline for transparent evaluation and comparison - Ready-to-use components, reproducible experiments - Sharing of results and know-how - Enable scalability for prototypes/data intensive workflows - Simple and uniform user interface for all embedded tools - Consolidation of support and maintenance Thank you! Questions?