This document discusses understanding change in versioned knowledge organization systems (KOS) on the web. It describes work being done in the CEDAR project to harmonize and publish historical Dutch census data as RDF data cubes. The document outlines challenges with concept drift (changes in meaning over time) in dynamic classifications and ontologies used in census data from different time periods. It proposes using machine learning techniques to predict where and when versioned KOS on the web will change based on analyzing patterns of change in past versions. Features related to structural changes and changes in class membership are discussed for use in the machine learning models.
Department of Health Compounder Question Solution 2022.pdf
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)
1. Understanding
Change
in
Versioned
KOS
on
the
Web
Albert
Meroño-‐Peñuela
Christophe
Guéret
Stefan
Schlobach
@albertmeronyo
EvoluFon
and
variaFon
of
classificaFon
systems
–
KnoweScape
workshop
04-‐03-‐2015
6. Uniform
queries
on
the
Web
1795
1830
1840
1849
1859
1869
1879
1889
1899
1909
1919
1920
1930
1947
1956
1960
1971
(through
~3K
heterogeneous
tables)
7. RDF
Data
Cube
“There
are
many
situaFons
where
it
would
be
useful
to
be
able
to
publish
mulF-‐
dimensional
data,
such
as
staFsFcs,
on
the
web
in
such
a
way
that
they
can
be
linked
to
related
data
sets
and
concepts.”
10. RDF
Data
Cube
vocabulary
(QB)
• SDMX
compaFble
• Defines
cubes
as
a
set
of
observa*ons
that
consist
of
dimensions,
measures
and
a/ributes
•
Dimensions:
Fme
period,
region,
sex
(qb:DimensionProperty)
•
Measure:
populaFon
life
expectancy
(qb:MeasureProperty)
•
Ajribute:
unit
of
measure
=
years,
metadata
status
=
measured
(qb:AttributeProperty)
ObservaFon:
“the
measured
life
expectancy
of
males
in
Newport
in
the
period
2004-‐2006
is
76.7
years”
22. PredicFng
Change
• KOS
version
chains:
subsequent
unique
version
iden*fiers
to
unique
states
of
KOS
• ProblemaFc
for
– Data
publishers
(KOS
maintainability)
– Data
users/linkers
(link
validity)
A.
Meroño-‐Peñuela,
C.
Guéret,
S.
Schlobach.
Predic1ng
Change
in
Versioned
Knowledge
Organisa1on
Systems
on
the
Web.
IJCAI
2015
(under
review)
23. PredicFng
Change
• Proposal:
generic
approach
to
predict
when
and
where
a
Web
KOS
of
any
domain
will
change
– Using
supervised
learning
on
past
versions
of
KOS
• SotA1:
predicFon
of
class
extension
in
– 1
OBO/OWL
version
chain
(Gene
Ontology)
– using
few
classifiers
• Contribu1on2:
predicFon
of
concept
dri:
in
– 150
Web
KOS
version
chains
– using
all
(21)
SotA
classifiers
(WEKA
API)
2
A.
Meroño-‐Peñuela,
C.
Guéret,
S.
Schlobach.
“Predic1ng
Change
in
Versioned
Knowledge
Organisa1on
Systems
on
the
Web”.
IJCAI
2015
(under
review)
1
C.
Pesquita,
F.M.
Couto.
“Predic1ng
the
extension
of
biomedical
ontologies”.
PLoS
computa1onal
biology
8
(9),
e1002630
24. Concept
Dris
• Proxy
for
change
of
meaning
over
Fme1
– Intension
dri:
occurs
when
there
is
a
difference
in
the
properFes
or
ajributes
of
two
variants
of
the
same
concept
– Extension
dri:
occurs
when
there
is
a
difference
in
the
individuals
that
belong
to
two
variants
of
the
same
concept
– Label
dri:
occurs
when
there
is
a
difference
in
the
labels
of
two
variants
of
the
same
concept
1
S.
Wang,
S.
Schlobach,
K.
Klein.
“What
Is
Concept
DriR
and
How
to
Measure
It?”.
EKAW
2010.
25. Input
Datasets
KOS
version
chains
from
• HISCO/CEDAR
(1
version
chain)
• DBpedia
(2
version
chains)
• Linked
Open
Vocabularies1
(134
version
chains)
• *Ontology
chains
from
637
SPARQL
endpoints2
(6
version
chains)
1
hjp://lov.okfn.org/
2
hjps://github.com/albertmeronyo/ConceptDris-‐data/tree/master/src
26. Features
• From
which
data
characterisFcs
(related
to
change)
should
we
learn?
• SotA
in
Ontology
Change
[Stojanovic
2004]
– Structure-‐driven
(rdfs:subClassOf,
skos:broader)
• maxDepth,
children,
parents,
siblings
– Data-‐driven
(rdf:type)
• members,
childMembers,
parentMembers,
siblingMembers
– Usage-‐driven
• incExtLinks
(on
the
Web!)
35. Conclusions
• SemanFc
technology
for
Social
History
– It
saved
work!
• Historical
datasets
as
an
observatory
of
dynamic
KOS
– Logging
usage
of
KOS
in
Linked
StaFsFcal
Data
• Modeling
change
in
Web
KOS
– Version
chains
are
scarce
(beware
of
bias)
– Chain
recipe:
nSnapshots,
avgTreeDepth,
raFoStructural,
raFoInserts,
raFoComm
– Classifier
dependence:
avgGap,
totalSize
36. Thank you
Questions, suggestions, comments
most welcome
@albertmeronyo
https://github.com/albertmeronyo/ConceptDrift
http://www.cedar-project.nl
http://krr.cs.vu.nl/
http://easy.dans.knaw.nl/
http://lsd-dimensions.org/
37. Me
in
6
tweets
hjp://www.albertmeronyo.org
• Background:
Computer
Science,
Web
hacker,
AI
&
Law
• PhD
candidate
at
the
VU
University
Amsterdam,
DANS,
and
eHumaniFes
group
(KNAW)
• Topic:
SemanFc
Web
for
the
HumaniFes
• CEDAR
project
(2012-‐2015):
harmonized
historical
Dutch
censuses
in
the
SemanFc
Web
• Problem:
staFsFcal
data
publishing,
concept
dris
and
dynamics
of
meaning
• Last
paper:
What
is
Linked
Historical
Data?
(EKAW
2014)