Regal - a Repository for Electronic Documents and Bibliographic Data
1. graphthinking
a Repository for Electronic
Documents and Bibliographic Data
Felix Ostrowski (graphthinking, @literarymachine)
Jan Schnasse (hbz, @InspektorHicks)
ELAG, June 11th 2014, University of Bath
2. graphthinking
Rationale: A new foundation for
Edoweb
● A system to gather, describe and archive
deposit copies of electronic publications
and websites on behalf of the State Library
Center of Rhineland-Palatinate (LBZ)
● Operated by the North Rhine-Westphalian
Library Service Center (hbz) since 2002
● Technical evolution: OPUS – Digitool – regal
3. graphthinking
The current system and its
shortcomings: Digitool
● Digitool end-of-life is coming
● Unwanted/unexpected dependencies to other projects
hosted on the same Digitool instance
● Performance issues (we have millions of objects in
Digitool)
● No easily configurable search indexes or OAI-PMH
interfaces for single collections
● No out-of-the-box support of regional requirements (e.g.
metadata delivery to German National Library), extra
money/developer hours needed
4. graphthinking
The current system and its
shortcomings: Homemade
● Mix of self developed and Ex Libris components
● Vicious circle
– introduction of workarounds
– unpredictable migration costs
– decision to stay on obsolete version
– running out of support
– introduction of workarounds
● Administrative responsibilities in different hbz
working groups
6. graphthinking
The following aspects are
mandatory to achieve our goals
● Increase the overall performance
● Provide an up-to-date, modern user interface
● Use open source software (Fedora, Elasticsearch, Drupal)
●
Seamlessly import (meta-)data from Digitool and potentially other
(repository) systems
● Integrate the system with the emerging Linked-Open-Data
ecosystem, especially authority data
● Loosen the tight integration with Ex Libris Aleph
● Expose (meta-)data for easy discovery & re-use by others.
7. graphthinking
Overview of the new architecture
regal (backend)
Fedora Elasticsearch
regal-drupal (frontend)
Ex Libris
Aleph
lobid API
8. graphthinking
Data model
● Simple hierarchical data model consists of nodes
associated via hasPart and partOf relations
● Each node is identified by a namespace
combined with a Universally Unique Identifier
(UUID)
● Each node can have a bit and a metadata
stream
● Metadata canonically stored as RDF N-triples
● Bitstream can contain arbitrary data
10. graphthinking
Fedora (3.7.1)
● mainly used to organize and associate
multiple datastreams and their versions
● provides a long term accessible data storage
● usage of Proai as OAI-PMH solution
11. graphthinking
Elasticsearch (1.1.0)
● Used to provide performant lookup (for
metadata and full-text)
● Stores compacted JSON-LD
● Faceting can be used to browse the collection
12. graphthinking
Backend / API
● Java Web API (RESTful) implemented with
Jersey
● Abstracts access to storage & indexing,
transparently updates Fedora and different
Elasticsearch indexes
● Provides resources as OAI-ORE aggregations
13. graphthinking
Drupal Frontend
● Re-use of common features
– User management
– Template-system
– Field API
– RDF Mappings
– HTML-Form API
● Extended with custom modules for
– Storage Backend
– Linked Data Fields
– JavaScript UI enhancements
29. graphthinking
Possible child nodes, in case
of a monograph these are
only files. Journals provide more
complex structures (volumes,
issues, articles).
38. graphthinking
Obstacles encountered / lessons
learned: Drupal
● is designed to be standalone, so we basically
have two backends
● its HTML Form API can be awkward to work
with if you don't want to do things the
"Drupal-way"
● a pure JavaScript / HTML5 frontend might
replace Drupal in upcoming versions
39. graphthinking
Obstacles encountered / lessons
learned: Fedora
● is more of an infrastructure than a storage
system
● because of its complexity, we consider
authorization via XACML a big disadvantage
● OAI-PMH is also not supported very well
● we are still looking for a more lightweight
solution
● perhaps as lightweight as simply using the file
system for both bitstreams and metadata
40. graphthinking
Obstacles encountered / lessons
learned: Elasticsearch
● Works very well with JSON-LD in general
● but needs some care to create proper
mappings
● and could use a more generic notion of
relations than only parent/child.
42. graphthinking
Good news: Linked Data Works!
● regal / Edoweb is not a research project,
● it is integrated into the hbz IT landscape,
● it is on the web,
● it does not require expertise in Linked Data,
● and real librarians will use it to create real
catalog entries.