The document discusses the IMPACT Interoperability Framework (IIF), which allows various software components to work together. The IIF uses open source technologies like Java, Tomcat, and Axis2. It includes a generic web service wrapper that facilitates integration of new components. Workflows can be created in Taverna to combine components into processing pipelines. The workflows can be shared, executed, and monitored through various clients and repositories. The IIF supports scalability through an enterprise service bus and has been tested on a supercomputing cloud.
IMPACT Interoperability Framework - Clemens Neudecker
1. Click to edit document name
IMPACT
Interoperability
Framework
Clemens Neudecker,
National Library of the Netherlands
2. Background
IMPACT from a technical perspective:
• > 20 software components for solving specific challenges
• Prototyping new algorithms, improving commercial solutions
• Different frameworks (C, C++, Java, etc.), platforms (Win/Linux)
• Extensible with third party applications
IMPACT Interoperability Framework (IIF)
4. Generic Web Service Wrapper
• Source code available: https://github.com/impactcentre/toolwrapper
• Facilitates easy integration: developers can focus on their application and
have to worry less about integration = higher quality software components
6. Workflows
• OCR workflow =
data pipeline
• Building blocks =
processing modules
• Integration =
interaction between
nodes (mashups)
• Collaboration with
7. Workflows
• OCR workflow =
data pipeline
• Building blocks =
processing modules
• Integration =
interaction between
nodes (mashups)
• Collaboration with
8.
9. Workflow Management
• Web 2.0 style registry: myExperiment
• Local client: Taverna Workbench
• Remote client: Project website
10. Local client: Taverna Workbench
Background:
• Life Sciences
• Developed and
maintained by
myGrid, UK
• Active community
Windows/Linux/OSX & source code available: http://www.taverna.org.uk/
11. Remote client: Taverna Server / Workflow Parser
• Remote execution of workflows via REST/SOAP API
• Client application for website integration
• Source code available: https://github.com/impactcentre/interfaces/taverna
12. Repository integration
• Custom WebDAV service for IMPACT:
– Configurable storage of result sets & provenance
– Fully interoperable, since HTTP-based
– Report API
– Source code available
• Integration with PRIMA image & ground truth repository
• Connectors for Fedora digital object repository
13. Community
• Web2.0 style workflow registry
• Discover, rate, tag, review
• Community of experts
• Sharing of resources
• Knowledge exchange
Central meeting point
for users & researchers,
tools & data
14. Scalability
• Enterprise Service Bus
receives requests from
users and distributes
the load to the available
worker nodes
• Main effects:
Process parallelization,
Load distribution,
Fail over,
Monitoring
• Tested on Dutch Supercomputing Cloud HPC
15. Evaluation
• Text based comparison of result with ground truth,
using Levenshtein distance method
• Layout based comparison of result with ground truth,
using the Page Analysis And Ground Truth Elements Framework
• Example:
16. Outlook
• Extending the scope:
– Workflows for linguistic analysis: CLARIN
– Workflows for digital preservation: SCAPE
• Even better scalability: MapReduce/Hadoop
• Supported by a community of developers &
practitioners in the Centre of Competence
Interested? Get in touch!
http://www.digitisation.eu/contact-us/