A walk through of a framework based around the distinctions between Abstraction, Implementation and Audience for considering the value and utility of data modeling patterns and paradigms in cultural heritage information systems. In particular, a focus on CIDOC-CRM, BibFrame, RiC-CM/RiC-O, EDM, and IIIF, with the intent to demonstrate best practices and anti-patterns in modeling.
5. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Digital Heritage
Non-rivalrous
Use/consumption by an individual
does not reduce use by others
Consistency is Good
Experience is better if the product
reuses known interaction patterns
Fewer is Better
Having few highly functional and
usable digital products improves
community and sustainability
Rivalrous
Use/consumption by one individual
reduces simultaneous use by others
Diversity is Good
Experience is better if the resource is
novel, innovative and emotive
More is Better
Having many diverse cultural heritage
resources available improves impact
to the user
Cultural
14. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Model vs Ontology
encodes
Model
Ontology
• Single, new namespace or reuse terms?
• Easier argument that new terms are needed,
as other terms reflect other conceptual models
• Reuse of terms can take place downstream
• Opaque term names vs human-readable?
• Human-readable! The model gives the abstraction,
the ontology can be encoded in different ways if
needed.
• Ontology can use technology features
• Ontology encodes, not defines, the model
• RDFS vs OWL ; json-schema vs xml-schema
• Property Graphs, Named Graphs, simple Trees
16. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Ontology vs Vocabulary
encodes
refines
Model
Ontology
Vocabulary
• Separation of Knowledge Management?
• Model and Ontology should be lean and general to
ensure breadth of use
• Modelers are often not domain experts, and domain
thesauri exist outside of any model
• Every Identity, Its Ontology
• Description of the vocabulary entity requires
ontology, so the separation is complex
• Concepts (AAT, ICONCLASS, LCSH) more appropriate
than Things (ULAN, VIAF, TGN, Geonames, …)
• Model needs to recognize Vocabulary
• Need to have the right slots for external vocab terms
• Otherwise ontology will take up the responsibility
17. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
CIDOC-CRM and Vocabularies
P2 has type (is type of)
This property allows sub typing of CIDOC CRM entities - a form of
specialization – through the use of a terminological hierarchy, or
thesaurus.
The CIDOC CRM is intended to focus on the high-level entities and
relationships needed to describe data structures. Consequently, it does not
specialize entities any further than is required for this immediate purpose.
*
* This is very debatable ;)
“ ”
Every entity can have an external classification, keeping the model lean.
Examples:
Human Made Object Painting, Brush, Book, XRF Scanner, …
Identifier DOI, ISBN, Local, Accession Number, …
Dimension Height, Width, Duration, File Size, …
Linguistic Object Description, Article, Abstract, Chapter, …
18. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
BibFrame, RiCO and Vocabularies
• BibFrame relies on Ontology for classification…
Examples:
41 Identifier subclasses: Ansi, AudioTake, Barcode, Coden, …
9 Digital Characteristics: EncodingFormat, ObjectCount, Resolution, …
6 Titles: Title, KeyTitle, VariantTitle, AbbreviatedTitle, ParallelTitle, …
• RiCO expands a small set of model classes to a long list of ontology classes,
that could easily be solved with vocabulary…
Examples:
47 Relation subclasses: AccumulationRelation, … WorkRelation
14 Type subclasses: ActivityType … RoleType
20. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Separating Abstraction and Implementation?
• Model, Ontology, Vocabulary
• Available classes, properties and instances
• API
• How to interact with the data over the network
• LOD does not separate Model / Ontology and API by
requiring the syntax to directly reflect the ontology
• LOD does not separate Vocabulary and API by requiring
the terms to be instances (as above)
• Solutions?
22. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: Linked Art
A Linked Open Usable Data profile for cultural heritage,
collaboratively designed to work across organizations,
that is easy to publish and use in consuming applications.
Design Principles:
• Focused on Usability, not 100% precision / completeness
• Consistently solves actual challenges from real data
• Development is iterative, as new use cases are found
• Solve 90% of use cases, with 10% of the effort
• https://linked.art/
23. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Linked Art Collaboration
Formalization of the profile in ICOM, funded by Kress & AHRC
• Getty
• Rijksmuseum
• Metropolitan Museum of Art
• Smithsonian
• MoMA
• V&A
• NGA
• Philadelphia Art Museum
• Indianapolis Art Museum
• The Frick Collection
• Princeton University
• Yale University
• Oxford University
• Academica Sinica
• ETH Zurich
• FORTH
• University of the Arts, London
• Canadian Heritage Info. Network
• American Numismatics Society
• Europeana
27. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Profile vs API
A Profile is a selection of appropriate abstractions,
to encode the scope of what can be described.
An API is a selection of appropriate technologies,
to give access to the data managed using the profile.
Scope
• Classes
• Properties and Relationships
• Structure of Graph
• Vocabulary Terms
Access
• Document format(s)
• Document structure and boundary
• URI patterns
• Operations: CRUD, Browse, Search
29. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Case Study: IIIF APIs
1. Scope design through shared use cases
2. Design for international use
3. As simple as possible, but no simpler
4. Make easy things easy, complex things possible
5. Avoid dependency on specific technologies
6. Use REST / Don’t break the web
7. Separate concerns, keep APIs loosely coupled
8. Design for JSON-LD, using LOD principles
9. Follow existing standards, best practices, when possible
10. Define success, not failure (for extensibility)
https://iiif.io/api/annex/notes/design_patterns/
35. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audiences: Progressive Enhancement
• Data for: Humans - Strings
• Separate entities, with attached textual descriptions
• Data for: Machines - Structure
• Entities with machine-processable, comparable values
• Data for: The Network - d’Stributed
• Entities are connected across systems and institutions
• Data for: Research - Stringent
• Sufficient accuracy and comprehensiveness to answer
research questions from aggregated data
Human
Machine
Network
Research
enhances
36. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: Humans
• Strings: Entities with descriptions
• Easy to do with existing data
• Regardless of information system, can export data as strings
• Easy on-ramp … need to start somewhere
• Serves important audience: everyone
• It’s our cultural heritage, after all! :)
• Data not Document
• Better than today, as encourages multiple interfaces and reuse
• Can be enhanced by third party with more resources
38. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: Machines
• Structure: Connected, comparable values for machines
to process, rather than just display
• Comparison of Values
• Answering basic questions via dimensions, materials, age etc.
• Sorting entities by values rather than only computed relevance
• Indexing values
• Searching based on values rather than full text
• Facets require consistent, structured data
• Visualization
• Can only have visualization, rather than display, with structure
42. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Audience: Research
• Stringent: Answering research questions requires
sufficient aggregation, precision and completeness
• Requirement or Stretch Goal?
• Audience is relatively small, but important
• Cultural Sector: Entertainment or Educational?
• Requires Collaboration and Continuity
• To be cost effective, must be ongoing, sustainable resource
used for multiple projects
• Need for Contextualization of Knowledge?
• The process of knowledge capture, and meta-meta-data
44. @azaroth42
rsanderson
@getty.edu
IIIF:Interoperabilituy
Abstractions
&Audiences
@azaroth42
Context: Data Provenance
• Creation / Modification of the Data: Who, When, Why?
• Considerations
• Researcher (human) wants Confidence in the dataset
• Developer will ignore it if possible
• Internal use for structure (edits by X between t1 and t2)
but not external research use
• Dataset Level description in prose is fine for external use
• Otherwise, need named graphs per triple (expensive, support?)
or to reify everything (expensive, un-ignorable)
Libraries – We don’t care about the paper, only the balance sheet
Archives – Look at this complete set of every denomination (that we can’t describe because there’s too much of it)Museums – Look at this mint condition 1975 Roosevelt Dime!
Conservation – The metal in a penny is worth more than a penny
Aggregators – Please give us everything in only unmarked dollar bills
In order to do that need to have appropriate abstractions and implementations that meet the technical and social needs of diverse stakeholders and user communities.
Encode = machine-actionable
Consistency of thinking about and managing diversity
Many trees worth of documentation, most of which is utterly opaque.
That’s it. There’s Work, Instance, Item, Agent, Event and Subject.
RiC is in early stages, but clearly building upon a conceptual model that is then instantiated in an ontology. Hard to know if inspired by CRM, but certainly has some of its hallmarks.
An ontology encodes a model, regardless of whether that model is separately documented. The advantage of the separation is the ability to have multiple ontologies encoding the same model, thereby having some degree of semantic interoperability, even if not necessarily technical interoperability.Few models as possible! Alignment between models helps to get to alignment between ontologies.
Neo4J and similar systems have property graphs. Named Graphs standardized but not well implemented, and tend to add complexity. Only get one Named Graph, so use it wisely.
Reification is generally unloved, but used in CRM’s ontology – PC14 is the class (the number of the property) and then pc14.1 for the role of relationship on that class.
Partitioning avoids the issue at the expense of some semantic precision, which is what we try to do in Linked Art profile.
Vocabulary allows subdomains to be specific about their content, within a more generalized model. This is important, as we want as few models as possible.
Vocabularies might conceptually relate to the model generally, but given the description of the vocabulary term, needs to be thought of in terms of the ontology encoding of it as well.
Ontologists gonna Ontologize, but you shouldn’t have to care. A profile selects the features and instances in order to meet the needs of the application domain.
Subset of the model, ontology and vocabularies as appropriate.
Name, Identifier, Contact Point, “Statement”, …
Michael Barth has six fundamental features for API evaluation, which relate directly to the value of the API as a standard for use. This seems like a good starting point for standards for digital interoperability.
Abstraction Level -- is the abstraction of the data and functionality appropriate to the audience and use cases. An end user of the "car" API presses a button or turns a key. A "car" developer needs access to engine directly.
Comprehensibility -- is the audience able to understand how to use it to accomplish their goals
Consistency -- if you know the "rules" of the API, how well does it stick to them? Or how many exceptions are there to a core set of design principles
Documentation -- How easy is it to find out the functionality of the API?
Domain Correspondence -- If you understand the core domain of the data and API, how closely does the understanding of the domain align with an understanding of the data?
And what barriers to getting started are there?
There are two more that I think are important.
Note – these are all about access, not about modeling.
Less important for IIIF to be complete as it’s about presentation not semantics. But for semantic description…
Stressful but Strategic Stretch goal.
Still within the framework of the profile and API – needs to be possible in the model, simultaneously with more structured data.
Still within the framework of the profile and API – needs to be possible in the model, simultaneously with more structured data.
Corpus Art History
Painful to implement. ORE Europeana, RICO.
Knowledge Provenance, not data provenance. Adds evidence that the dimension was valid in 1986. Could add confidence or technique used. Doesn’t rely on reification, but doesn’t work everywhere.