LCAR Unit 13 - The Real Estate Business - 14th Edition Revised
Lkl talk-2012
1. Ontological Knowledge Engineering
for Cultural Heritage of Andean Textiles
Immanuel Normann
July 20, 2012
Department of Computer Science and Information Systems
2. Project Context
● Pre-Columbian Latin America had no writing system
● Alternative encoding systems were developed to pass down
cultural knowledge
● Hypothesis: weaving patterns as “writing systems” in this sense
● General research endavour: deciphering these “writing
systems”
● Our objective: systematization on knowledge about Andean
weaving through ontological approach
● implementation of ontological knowledge system
● instantiation of the system with facts
3. Project Team
● La Paz
Instituto de Lengua y Cultura Aymara (Denise Y Arnold)
● Domain experts: knowledge acquisition and creation, building
physical and virtual models, creating multimedia data.
● Software developer: web front end
● London
● AHRC (Luciana Martins):
principal investigator & domain experts (iconographic analysis)
● Birkbeck DCS (Sven Helmer):
Knowledge engineering + knowledge system implementation
9. Project status at the beginning of my work
● Project proposal intends ontological approach
● LaPaz team already aquainted with ontology related know how:
● Methontology
● Protege, CMap tools
● CIDOC-CRM
● Great amount of knowledge/data in spreadsheets
● Relational database schemes developed.
● other
● handwritten museum register documents
● images, videos, other multimedia documents,
● woven samples
10. Initial Steps
● identification of central research subdomains and their
documents
textiles, instruments, processes, historical/cultural back
grounds, iconography, ...
● identification of central docs: concept maps, spreadsheets
● identification of the requirements for the KMS:
● identification of stake holders
● development of use case scenarios
● competency questions
● setting up a communication platform & versioning system
12. tiempo Materia Objeto textil
es
Example elabora
Concept Map
se hizo en
se
es prima
periodo fibra
con
es P. Colonial tinte
P. Contemporáneo, etc.
mordiente
estilo
tiene se elabora
con instrumento
es
e. universal es s
Objeto telar
e. local/tecnológico textil es es
T. horizontal
prend imagen
Vida social a T. cintura
tiene bien
Foto,
Aprendizaje, video Rueca, etc.
etc. se obtiene mediante
tiene es actividades
elaborado
por es evento
Lugar movimient
actor o
es sitio proceso
es
es esquila
persona
S. producción es
es tejedora hilado
S. recojo
pertenece a teñido
S. custodia
urdido estructura
grupo
es tejido
ruta apsu
técnica
acabado
14. Example Competency Questions
● ¿En qué sitios se halla evidencia de la práctica de la técnica x?
● What sites is evidence of the practice of the technique x?
●
● ¿En qué culturas se halla evidencia de la práctica de tal técnica?
● In what cultures is evidence of the practice of the technique x?
●
● ¿Cuál es el registro más antiguo de la técnica T?
● What is the oldest log of the technical T?
●
● ¿En qué tipo de prenda se empleó por primera vez la técnica X?
● What type of garment is employment for the first time the technique X?
●
● ¿Qué tipos de textiles se ha tejido usando la técnica T en un período P y región R?
● What types of textiles has been woven using the technique T in a period P and region R?
15. Early Results from Requirement Analysis
● How much of ontological reasoning is needed?
● Which system could provide it?
● Early tendency: RDBM.
● RDB schema already defined
● content partially already inserted in RDBM
● most content in spreadsheets
● ideas for simple reasoning developed
(transitivity, ontological queries translated to SQL)
● Does this approach satisfy the requirements?
16. Against the RDBM approach
● Knowledge in concept maps
● graph like knowledge representation - closer to ontological
knowledge representation.
● graph like queries involving some reasoning.
● Dynamik model evolution
● RDBS schema vs. Ontology change.
17. Relational Database vs. Ontology
Relational database systems
● are perfect to model relationships with a static knowledge model
(i.e. static relationship schema)
● schema change is problematic and
● no notion of hierarchies.
Ontology knowledge systems
● allow to store the same datatypes as relational database systems
● allow for modelling relationships
– in a different way closer to concept maps then to relation tables
● have a built in notion of hierarchies!
● and allow even more reasoning.
18. Queries on Graph Structures
select all Accesorios es elabora con Técnica para faz de trama
19. Requirements for Museum KMS
A museum knowledge management system should
● facilitate relations between entities
● have built in support for basic reasoning
● should be flexible w.r.t. the evolution of knowledge model
● facilitate storage of basic datatypes (numbers, boolean, ...), free
text, and multimedia.
Conclusion
● the RDB approach is insufficient w.r.t. model evolution and
reasoning.
● Ontological storage engine required.
● Which is the best for our purpose?
20. Review of Triplestores
State of the art surveys:
● http://www.w3.org/wiki/LargeTripleStores
● Europeana RDF Store Report (2011)
● An incomplete list of triple stores:
● Native stores: AllegroGrah, OWLIM, stardog
● RDBMS based: Oracle, Jena SDB
● hybrid: Virtuoso, Sesame, BigData
21. Our Decision: Virtuoso
● why virtuoso:
● multi paradigm storage: RDBM (SQL), XML (XQuery), OWL
(SPARQL), reasoning.
● scalable, massive data processing, stable, opensource edition,
active community.
● some know how from former projects
● may be drawbacks:
● too many ways to implement a knowledge base.
● manual 4000 pages.
● reasoning capabilities beyond reasoners like Pellet.
23. Ontology in a nutshell
● unary constructs:
● individuals (e.g. the textile object whose ID is ILCA_BML074)
● class (e.g. the set of all garment classified as Poncho)
● binary constructs:
● object property = relation between individuals (e.g. in custody of:
textile object ILCA_BML074 is in custody of the British Museum)
● data property = attribute of an individual (e.g. has width: textile
object ILCA_BML074 has width 52 cm)
● instance of (type) = a relation between individuals classes (e.g.
textile object ILCA_BML074 is an instance of the class Facha
Ancha)
● subclass relation = relation between classes (e.g. Facha Ancha is
a subclass of Accesorios)
● and even more like: union, intersection, complement, quantification, number restriction, ...
25. Ontology Schema and Facts
Ontology schema (TBox)
● subclass relations (e.g. Poncho is subclass of Producto Textil)
● domain and range restrictions of
● object properties (e.g. in custody of has domain Producto Textil
and as range Museum)
● data properties (e.g. has width has domain Producto Textil and cm
as range)
Ontology facts (ABox)
● all relations involving individuals (instance of, object properties,
data properties)
28. Abstract Entities
● Abstracts entities: don't exist in space or in time.
● Concrete entities exist at least in time. For example:
● physical objects (like garments, books, etc.)
● events (like the production of a certain garment)
● Entities like colour, material, and shape are rather time independent.
● what is the appropriate way to model abstract entities?
In OWL we have only two options: as classes or instances.
● For concrete entities it is easy:
● my jacket I am wearing is an instance of the class of all Jackets which is
a subclass of physical objects.
● the discovery of Machu Picchu by Hiram Bingham is an instance of the
class of all discoveries which is a subclass of events.
29. Abstract Entities
● What about abstract entities: can they have subclasses or
instances? For example colours:
– is the red we see here one instance and the red we see there
another instance?
– If so, isn't it inconsistent to say that they are both the same reds?
(we introduced the concept of colour coccurrence).
– is red a unique colour or a class of colors whose instances are e.g.
dark-red, orange-red.
– aren't dark-red and orange-red rather themselves classes of reds?
– are there at all colours that are not subdividable into more granular
colour values? (we chose to stop at RGB. For physicians wave
lenght would make more sense).
30. Semi Abstract Entities
● structure, technique, motive:
● not localized in space: possibly at two different place at the same
time.
● not localized in time: may exist even if currently not applied or
observed.
● but: techniques / motives are invented and can be forgotten
● epoch and style
● seem to be clearly bound to a certain time period, but
● at least styles may revive at any time.
● epoch is a highly debated concept anyway.
31. Anonymous Entities
● How should we formalize “Poncho p1 is made of Alpaca”?
The naive way:
p1 made_of a1. p1 type Poncho. a1 type Alpaca.
p1 is a concrete object we can point to. What about a1?
● Consider: “Poncho p2 is also made of Alpaca”.
p2 made_of a2. p2 type Poncho. a2 type Alpaca.
Is a1=a2 or not?
We don't know and we don't care!
32. Anonymous Entities
● Proper formalization of “Poncho p1 is made of Alpaca”:
p1 type (made_of some Alpaca)
● meaning:
● p1 is an instance of the class (made_of some Alpaca)
● (made_of some Alpaca) is the class of all x such that there exists
and an a which is an instance of Alpaca.
short: “p1 is made of some instance of Alpaca”
33. Limited Reasoning in Virtuoso
● (made_of some Alpaca) is quantified class expression
(some is its quantifier)
● Problem with Virtuoso: it accepts quantified expressions, but
does not support reasoning on them.
● Example:
p1 type (made_of some Alpaca)
Alpaca subClassOf Camelido
=> p1 type (made_of some Camelido)
● Virtuoso cannot infer this conclusion.
34. Prototypes as Workaround
Workaround for the Quantification Problem
● introduce a class Prototype
● create for every class (if needed) a dedicated instance of
prototype.
● Example:
alpaca type Prototype. alpaca type Alpaca.
alpaca prototype_for Alpaca.
35. Prototypes as Workaround
Reasoning via prototypes
● Replace p1 type (made_of some Alpaca)
by p1 made_of alpaca.
● Now Virtuoso can deduce:
p1 made_of alpaca. Alpaca subClassOf Camelido.
=> p1 made_of ?x. ?x type Camelido.
● Note:
● prototypes, in contrast to regular physical individuals, are not
located in space and time ( => modeling conflict )
● alpaca prototype_for Alpaca is not OWL conform.
36. Ontological Mistakes
Confusing subclass and instance with part of:
● lake Titicaca is a spatial part of the Andes, but not a subclass of it.
● weaving is temporal part of garment production (dying another
one), but neither an instance nor a subclass of it.
● part of is a super property of spatial- and temporal-part of.
Confusing subclass with instance:
● Poncho (as indefinite word) is not an instance of garment but a
subclass: the class of all concrete ponchos.
37. Ontological Mistakes
Confusing determined with undetermined objects:
● in “this poncho (p1) is made of Alpaca”
Alpaca should not be modelled as a certain instance of the class
Alpaca!
Confusing equivalence with synonymy and/or translations:
● if cloak same as manto and manto same as coat,
then cloak same as coat.
● if chair same as Sessel and Sessel same as armchair,
then chair same as armchair.
38. Related Work
Controlled vocabularies:
● Getty Thesaurus of Geographic Names (TGN),
● Cataloging Cultural Objects (CCO),
● Categories for the Description of Works of Art (CDWA)
Foundational Ontologies:
● The CIDOC Conceptual Reference Model (CRM):
concepts and relationships used in cultural heritage documentation.
● DOLCE (Descriptive Ontology for Linguistic and Cognitive Engineering)
Linking open data (LOD):
● dbpedia, freebase, geonames, ... (http://linkeddata.org/)
● Linked Data and SPARQL service of British Museum
42. Migration of Knowledge Representations
Separation of knowledge modelling:
● TBox knowledge created with graph drawing tools (http://www.yworks.com)
● ABox facts created in spreadsheets
Technical challenges:
● migration to target format for TBox and ABox: RDF triples
(source node - link - target node)
● TBox migration: easy
● ABox migration: difficult - due to irregular spreadsheets
● TBox & ABox vocabulary alignment: tedious