SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Supporting Data Interlinking 
in Semantic Libraries 
with Microtask Crowdsourcing 
Cristina Sarasua 
SWIB 2014, Bonn 
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 2
relation 
a b 
MARC 21 
EDM FRBR 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 3
relation 
a b 
MARC 21 
EDM FRBR 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 4
Please share your thoughts on interlinking! 
https://etherpad.mozilla.org/4IfZDaTBIe 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 5
https://etherpad.mozilla.org/4IfZDaTBIe 
Interlinking on the Web of Data 
Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, 
Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 6
Cross-dataset links 
D1 
https://etherpad.mozilla.org/4IfZDaTBIe 
(a,r,b) | a in D1, b in D2 D2 
d1:timbl owl:sameAs d2:timbernerslee; 
d1:donostia owl:sameAs d2:sansebastian; 
d1:timbl owl:sameAs d2:timbernerslee; 
d1:donostia owl:sameAs d2:sansebastian; 
d1:bjork dc:creator d2:volta; 
d1:bjork dc:creator d2:volta; 
d1:Bonn wgs84:location d2:Germany; 
d1:work2012 o:inspiredBy d2:song1900; 
d1:Bonn wgs84:location d2:Germany; 
d1:work2012 o:inspiredBy d2:song1900; 
o1:Conference owl:equivalentClass o2:Congress; 
o1:Conference owl:equivalentClass o2:Congress; 
o1:Democracy skos:related o2:Government; 
o1:Democracy skos:related o2:Government; 
o1:Publication skos:broader o2:JournalArticle; 
o1:Publication skos:broader o2:JournalArticle; 
o1:ImpressionistPainting rdfs:subClassOf o2:Painting; 
o1:ImpressionistPainting rdfs:subClassOf o2:Painting; 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 7
https://etherpad.mozilla.org/4IfZDaTBIe 
Why is interlinking important? 
 Enhance the 
description of local 
entities 
 Richer queries over 
aggregated data 
 Cross-data set 
browsing 
What is known about Berlin? 
What is known about Berlin? 
x:berlin owl:sameAs 
x:berlin owl:sameAs 
dbpedia:Berlin; 
tour:berlin; 
dbpedia:Berlin; 
tour:berlin; 
x:berlin o:homeOf 
x:berlin o:homeOf 
authors:berlin; 
authors:berlin; 
x:img09112014 
x:img09112014 
lode:atPlace geo:brandtor; 
lode:atPlace geo:brandtor; 
SELECT ?city 
WHERE { 
?city1 gov:population ?pop . 
?city1 owl:sameAs ?city2 . 
?city2 unesco:count ?mon . 
FILTER (?pop > 1000000 
SELECT ?city 
WHERE { 
?city1 gov:population ?pop . 
?city1 owl:sameAs ?city2 . 
?city2 unesco:count ?mon . 
FILTER (?pop > 1000000 
?mon > 50)} 
?mon > 50)} 
http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 8
Generating links 
D1 D2 Identify the 
Comparison 
criteria 
https://etherpad.mozilla.org/4IfZDaTBIe 
resources to 
be connected 
with relation R 
Picture: 
https://www.assembla.com/spaces/silk/wiki/Managin 
g_Reference_Links 
Decision boundary between 
link and non-link 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 9
He is already busy 
Attribution: Thomas Leuthard 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 10
Attribution: Thomas Leuthard 
He is already busy 
… but still would like 
correct and useful links 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 11
Crowdsourced Interlinking 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 12
Crowdsourcing 
“Crowdsourcing represents the act of a company or institution taking a 
function once performed by employees and outsourcing it to an undefined 
(and generally large) network of people in the form of an open call” 
Jeff Howe, 2006 
Scalable 
Fast 
Microtask 
crowdsourcing 
Macrotask 
crowdsourcing 
Contest-based 
crowdsourcing 
CCiittiizzeenn SScciieennccee 
-E.g. tweet sentiment 
analysis 
-Seconds, reward cents 
-Crowd workers register 
with simple profile, limited 
filtering 
-E.g. writing an E-Book 
-Months, $30per hour / 
hundreds or thousands of 
dollars 
-Freelancers recruitment, 
interviews 
-E.g. NLP algorithm for a 
particular challenging 
scenario 
-Months, up to thousands 
of dollards 
-Final evaluation and 
winner selection 
-E.g. classify galaxies in 
pictures 
- seconds/minutes, no 
money 
- Open to everyone 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 13
An interlinking microtask 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 14
An interlinking microtask 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 15
An interlinking microtask 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 16
Approach 
D1 D2 
1 
cl1: cl1: ((s,s,p,p,o) 
o) 
cl2: cl2: ((s,s,p,p,o) 
o) 
… 
… 
cln: cln: ((s,s,p,p,o) 
o) 
candidate links 
2 
3 
Analyse crowd workers 
Collect crowd responses for the candidate links to 
be processed 
Aggregated 
response 
Collect 
responses 
Generate 
RDF file with 
final links 
cl5: cl5: ((s,s,p,p,o) 
o) 
… 
… 
cln: cln: ((s,s,p,p,o) o) 
crowd interlinking 
4 
Parse 
RDF links 
Query 
D1,D2 
Generate 
and publish 
microtasks 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 17
Approach (II) 
 Analyse crowd workers to filter out people 
– With bad intentions (i.e. scammers) 
– Who do not have enough knowledge 
 Select representative links from which the answer is known 
(ground truth) and assess people → domain expert useful 
x:b rdfs:label “Berlin”; 
x:b2 rdfs:label “Berlinale”; 
x:b rdfs:label “Berlin”; 
rdf:type o:City; 
rdf:type o:City; 
x:b2 rdfs:label “Berlinale”; 
x:b rdfs:label “Córdoba”; 
x:b2 rdfs:label “Córdoba”; 
x:b rdfs:label “Córdoba”; 
rdf:type o:City; 
rdf:type o:City; 
rdf:type o:Event; 
rdf:type o:Event; 
x:b2 rdfs:label “Córdoba”; 
Measure 
difficulty based 
on data 
heuristics 
rdf:rdf:type type o:o:City; City; 
Select 
different 
matching 
cases 
x:b rdfs:label “Córdoba”; 
x:b rdfs:label “Córdoba”; 
rdf:type o:City; 
wgs84:lat -31.400; 
rdf:type o:City; 
wgs84:lat -31.400; 
x:b2 rdf:type o:City; 
x:b2 rdf:type o:City; 
wgs84:lat 37.883; 
wgs84:lat 37.883; 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 18
Approach (II) 
 Analyse crowd workers to filter out people 
– With bad intentions (i.e. scammers) 
– Who do not have enough knowledge 
 Select representative links from which the answer is known 
(ground truth) and assess people → domain expert useful 
x:b rdfs:label “Berlin”; 
x:b2 rdfs:label “Berlinale”; 
x:b rdfs:label “Berlin”; 
rdf:type o:City; 
rdf:type o:City; 
x:b2 rdfs:label “Berlinale”; 
Two-way feedback 
x:b rdfs:label “Córdoba”; 
x:b2 rdfs:label “Córdoba”; 
x:b rdfs:label “Córdoba”; 
rdf:type o:City; 
rdf:type o:City; 
rdf:type o:Event; 
rdf:type o:Event; 
x:b2 rdfs:label “Córdoba”; 
Measure 
difficulty based 
on data 
heuristics 
rdf:rdf:type type o:o:City; City; 
Select 
different 
matching 
cases 
x:b rdfs:label “Córdoba”; 
x:b rdfs:label “Córdoba”; 
rdf:type o:City; 
wgs84:lat -31.400; 
rdf:type o:City; 
wgs84:lat -31.400; 
x:b2 rdf:type o:City; 
x:b2 rdf:type o:City; 
wgs84:lat 37.883; 
wgs84:lat 37.883; 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 19
Approach 
D1 D2 
1 
cl1: cl1: ((s,s,p,p,o) 
o) 
cl2: cl2: ((s,s,p,p,o) 
o) 
… 
… 
cln: cln: ((s,s,p,p,o) 
o) 
candidate links 
2 
3 
Context information 
Analyse crowd workers 
Collect crowd responses for the candidate links to 
be processed 
Aggregated 
response 
Collect 
responses 
Generate 
RDF file with 
final links 
#workers per link 
cl5: cl5: ((s,s,p,p,o) 
o) 
… 
… 
cln: cln: ((s,s,p,p,o) o) 
crowd interlinking 
4 
Parse 
RDF links 
Query 
D1,D2 
Generate 
and publish 
microtasks 
agreement 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 20
Approach (II) 
D1 D2 
Manual interlinking 
D1 D2 
Algorithm 
Guide Review 
HCOMP interlinking 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 21
Use cases 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 22
Mapping vocabularies 
Run an automatic ontology 
alignment tool and post-process 
the results with the crowd 
See also: [Sarasua et al., 2012] 
Context information 
pre-configured 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 23
Discovering links between instances 
 
 
 
 
 
 
 
a) To extract the patterns of the linkage rules (i.e. labelling) 
b) To post-process irregular multilingual values, different name versions 
c) To automatically identify patterns of errors in a resulting set of links, which 
may be afterwards reviewed by the experts 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 24
Curating mapping extensions to authority files 
Quality control can be done 
by giving these answers to 
other crowd workers 
Checking usefulness of links with library users 
 There are different possible targets for the interlinking of a dataset: 
which possibility to select for the Web portal? 
 Embed Web site in a microtask and ask for specific information or 
observe next Web site opened 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 25
3 Challenges 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 26
# Deciding whether to crowdsource or not 
 Depends to a large extent on the data 
– Specific domains require more crowd management effort 
– Benefit compared to automatically generated links may vary 
– Availability of workers may change in time 
Libraries and the cultural heritage domain have high potential 
(multilinguality, different naming conventions, knowledge exploration) 
 What should be processed by the crowd 
– Criteria for selecting subsets of the data (e.g. confidence of 
machine) 
> Trial, error and assessment 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 27
# Building a loyal workforce 
 Attracting good crowd workers 
– Microtasks are constantly being published 
– Higher reward may also attract more malicious workers 
 Working with people repeatedly is not supported by majority of 
crowdsourcing platforms 
 How to make crowd workers keep on working in these microtasks 
without them getting demotivated? 
> Be fair (see also Guidelines on 
Crowd Work for Academic Researchers, 
2014) 
> Listen to crowd workers (e.g. direct 
comments, twitter, ratings, monitor online 
discussions) 
> Recognize their work 
> Be aware that gamification is not always 
the best solution 
It's really easy to change people's motivations, [at 
Zooniverse] we find people are motivated by wanting 
to contribute, they want a sense that this is something 
real. And in adding game-like elements you can 
destroy that quite quickly” Chris Lintott, Zoouniverse 
http://www.wired.co.uk/news/archive/2013- 
09/12/fraxinus-gamifying-science/viewgallery/307960 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 28
# Working with unknown humans 
 Open call can be a problem and an opportinty at the same 
time: people have diverse 
– Motivation and dedication 
– Context and profile 
– Background knowledge 
 Crowdsourcing platforms have limited support for 
personalisation 
 Working with suitable crowd 
– Identify what they can do best 
▪ Type of task / data level 
▪ Competences vs experience cross platform analysis 
– Assign work accordingly 
▪ Weight vs reject 
>Towards a Crowd Work CV 
See also: [Sarasua et al., 2014] 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 29
Plea to this community 
 Interlinking is much more than deduplication, consider using 
also other relations 
 Consider connecting library datasets to different 
complementary domains 
 Interlinking to non editorial data can also be enriching 
 The more datasets you connect the better 
 Document your interlinking on the VoiD description of your 
dataset 
 Query and make use of available links 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 30
If you need humans to process data while interlinking 
datasets, consider crowd intervention because it can be 
very valuable for enhancing your results. 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 31
Thank you for your attention! 
Cristina Sarasua 
Institute for Web Science and Technologies 
Universität Koblenz-Landau 
csarasua@uni-koblenz.de 
http://de.slideshare.net/cristinasarasua 
https://github.com/criscod 
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
References 
 Sarasua, C., Simperl, E., Noy, N.F.: CrowdMAP: Crowdsourcing ontology 
alignment with microtasks. In: Proceedings of the 
11th International Semantic Web Conference (ISWC). (2012) 
 Sarasua, C., Thimm, M. Crowd Work CV: Recognition for Micro Work. In: 
SoHuman workshop, co-located with Social Informatics (SocInfo). (2014) 
 Guidelines on Crowd Work for Academic Researchers (2014). 
http://wiki.wearedynamo.org/index.php/Guidelines_for_Academic_Requesters 
Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 33

Más contenido relacionado

Destacado

Clothes and complements - ESO 1C - School year 2013-14
Clothes and complements - ESO 1C - School year 2013-14Clothes and complements - ESO 1C - School year 2013-14
Clothes and complements - ESO 1C - School year 2013-14gemlops
 
Medical Career
Medical CareerMedical Career
Medical CareerAlan Lopez
 
1 a clothes and complements
1 a  clothes and complements1 a  clothes and complements
1 a clothes and complementsgemlops
 
A Short Guide to Bed Bugs
A Short Guide to Bed BugsA Short Guide to Bed Bugs
A Short Guide to Bed Bugsbrown57
 
Clothes and complements - ESO 1A - School year 2013-14
Clothes and complements - ESO 1A - School year 2013-14Clothes and complements - ESO 1A - School year 2013-14
Clothes and complements - ESO 1A - School year 2013-14gemlops
 
Clothes and complements - ESO 1B - School year 2013-14
Clothes and complements - ESO 1B - School year 2013-14Clothes and complements - ESO 1B - School year 2013-14
Clothes and complements - ESO 1B - School year 2013-14gemlops
 
Un tool per la visualizzazione e l'analisi di reti biologiche e sociali
Un tool per la visualizzazione e l'analisi di reti biologiche e socialiUn tool per la visualizzazione e l'analisi di reti biologiche e sociali
Un tool per la visualizzazione e l'analisi di reti biologiche e socialiFabio Rinnone
 
Nixmap. Funzionalità ed aspetti implementativi
Nixmap. Funzionalità ed aspetti implementativiNixmap. Funzionalità ed aspetti implementativi
Nixmap. Funzionalità ed aspetti implementativiFabio Rinnone
 
MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...
MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...
MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...Fabio Rinnone
 
I contenuti digitali open source: le regole per editare su Wikipedia
I contenuti digitali open source: le regole per editare su WikipediaI contenuti digitali open source: le regole per editare su Wikipedia
I contenuti digitali open source: le regole per editare su WikipediaFabio Rinnone
 
1 c clothes and complements
1 c  clothes and complements1 c  clothes and complements
1 c clothes and complementsgemlops
 
Nuovi Media: Definizioni | Case Histories | Lo stato dell'arte
Nuovi Media: Definizioni | Case Histories | Lo stato dell'arteNuovi Media: Definizioni | Case Histories | Lo stato dell'arte
Nuovi Media: Definizioni | Case Histories | Lo stato dell'arteSimone Arcagni
 
Wiki Loves Monuments: il contest fotografico di Wikimedia Italia a Niscemi
Wiki Loves Monuments: il contest fotografico di Wikimedia Italia a NiscemiWiki Loves Monuments: il contest fotografico di Wikimedia Italia a Niscemi
Wiki Loves Monuments: il contest fotografico di Wikimedia Italia a NiscemiFabio Rinnone
 
Dbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_openDbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_openCristina Sarasua
 

Destacado (15)

Clothes and complements - ESO 1C - School year 2013-14
Clothes and complements - ESO 1C - School year 2013-14Clothes and complements - ESO 1C - School year 2013-14
Clothes and complements - ESO 1C - School year 2013-14
 
Medical Career
Medical CareerMedical Career
Medical Career
 
1 a clothes and complements
1 a  clothes and complements1 a  clothes and complements
1 a clothes and complements
 
A Short Guide to Bed Bugs
A Short Guide to Bed BugsA Short Guide to Bed Bugs
A Short Guide to Bed Bugs
 
Clothes and complements - ESO 1A - School year 2013-14
Clothes and complements - ESO 1A - School year 2013-14Clothes and complements - ESO 1A - School year 2013-14
Clothes and complements - ESO 1A - School year 2013-14
 
Clothes and complements - ESO 1B - School year 2013-14
Clothes and complements - ESO 1B - School year 2013-14Clothes and complements - ESO 1B - School year 2013-14
Clothes and complements - ESO 1B - School year 2013-14
 
Un tool per la visualizzazione e l'analisi di reti biologiche e sociali
Un tool per la visualizzazione e l'analisi di reti biologiche e socialiUn tool per la visualizzazione e l'analisi di reti biologiche e sociali
Un tool per la visualizzazione e l'analisi di reti biologiche e sociali
 
Nixmap. Funzionalità ed aspetti implementativi
Nixmap. Funzionalità ed aspetti implementativiNixmap. Funzionalità ed aspetti implementativi
Nixmap. Funzionalità ed aspetti implementativi
 
MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...
MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...
MobileMap Enna: un'applicazione web-mobile per la consultazione di cartografi...
 
I contenuti digitali open source: le regole per editare su Wikipedia
I contenuti digitali open source: le regole per editare su WikipediaI contenuti digitali open source: le regole per editare su Wikipedia
I contenuti digitali open source: le regole per editare su Wikipedia
 
MobileMap Agrigento
MobileMap AgrigentoMobileMap Agrigento
MobileMap Agrigento
 
1 c clothes and complements
1 c  clothes and complements1 c  clothes and complements
1 c clothes and complements
 
Nuovi Media: Definizioni | Case Histories | Lo stato dell'arte
Nuovi Media: Definizioni | Case Histories | Lo stato dell'arteNuovi Media: Definizioni | Case Histories | Lo stato dell'arte
Nuovi Media: Definizioni | Case Histories | Lo stato dell'arte
 
Wiki Loves Monuments: il contest fotografico di Wikimedia Italia a Niscemi
Wiki Loves Monuments: il contest fotografico di Wikimedia Italia a NiscemiWiki Loves Monuments: il contest fotografico di Wikimedia Italia a Niscemi
Wiki Loves Monuments: il contest fotografico di Wikimedia Italia a Niscemi
 
Dbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_openDbpedia leipzig2014 csarasua_open
Dbpedia leipzig2014 csarasua_open
 

Similar a Swib2014csarasua

Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Webhala Skaf
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataEUCLID project
 
Data-mining the Semantic Web
Data-mining the Semantic WebData-mining the Semantic Web
Data-mining the Semantic WebFrank Lynam
 
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)Joachim Neubert
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...Jennifer Bowen
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Amit Sheth
 
Modelling research output expressions : metadata schema modelling of publicat...
Modelling research output expressions : metadata schema modelling of publicat...Modelling research output expressions : metadata schema modelling of publicat...
Modelling research output expressions : metadata schema modelling of publicat...CILIP MDG
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Richard Ingram
 
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Beat Signer
 
PhD Projects in Pervasive Computing Research Help
PhD Projects in Pervasive Computing Research HelpPhD Projects in Pervasive Computing Research Help
PhD Projects in Pervasive Computing Research HelpPhD Services
 
PhD Projects in Digital Forensics Research Guidance
PhD Projects in Digital Forensics Research GuidancePhD Projects in Digital Forensics Research Guidance
PhD Projects in Digital Forensics Research GuidancePhD Services
 
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projectsM phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projectsVijay Karan
 
M.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining ProjectsM.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining ProjectsVijay Karan
 
20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructure20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructureStefan Gradmann
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 

Similar a Swib2014csarasua (20)

Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Web
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Data-mining the Semantic Web
Data-mining the Semantic WebData-mining the Semantic Web
Data-mining the Semantic Web
 
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
Linking Knowledge Organization Systems via Wikidata (DCMI conference 2018)
 
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...Moving Library Metadata Toward Linked Data:  Opportunities Provided by the eX...
Moving Library Metadata Toward Linked Data: Opportunities Provided by the eX...
 
Knowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly CommunicationKnowledge Graphs for Scholarly Communication
Knowledge Graphs for Scholarly Communication
 
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
Semantic Web & Information Brokering: Opportunities, Commercialization and Ch...
 
Modelling research output expressions : metadata schema modelling of publicat...
Modelling research output expressions : metadata schema modelling of publicat...Modelling research output expressions : metadata schema modelling of publicat...
Modelling research output expressions : metadata schema modelling of publicat...
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
Semantic Web and Web 3.0 - Web Technologies (1019888BNR)
 
SDoW2010 keynote
SDoW2010 keynoteSDoW2010 keynote
SDoW2010 keynote
 
PhD Projects in Pervasive Computing Research Help
PhD Projects in Pervasive Computing Research HelpPhD Projects in Pervasive Computing Research Help
PhD Projects in Pervasive Computing Research Help
 
PhD Projects in Digital Forensics Research Guidance
PhD Projects in Digital Forensics Research GuidancePhD Projects in Digital Forensics Research Guidance
PhD Projects in Digital Forensics Research Guidance
 
M phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projectsM phil-computer-science-data-mining-projects
M phil-computer-science-data-mining-projects
 
M.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining ProjectsM.Phil Computer Science Data Mining Projects
M.Phil Computer Science Data Mining Projects
 
20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructure20130719 dh2013 beyond_infrastructure
20130719 dh2013 beyond_infrastructure
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 

Más de Cristina Sarasua

Editing Behavior over Time Power vs. Standard Wikidata Editors
Editing Behavior over Time  Power vs. Standard Wikidata EditorsEditing Behavior over Time  Power vs. Standard Wikidata Editors
Editing Behavior over Time Power vs. Standard Wikidata EditorsCristina Sarasua
 
Methods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataMethods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataCristina Sarasua
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Introduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonIntroduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonCristina Sarasua
 
Interlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsInterlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsCristina Sarasua
 
Crowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCrowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCristina Sarasua
 
Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Cristina Sarasua
 

Más de Cristina Sarasua (14)

Editing Behavior over Time Power vs. Standard Wikidata Editors
Editing Behavior over Time  Power vs. Standard Wikidata EditorsEditing Behavior over Time  Power vs. Standard Wikidata Editors
Editing Behavior over Time Power vs. Standard Wikidata Editors
 
Methods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of DataMethods for Intrinsic Evaluation of Links in the Web of Data
Methods for Intrinsic Evaluation of Links in the Web of Data
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Closing session
Closing sessionClosing session
Closing session
 
Reviews and awards
Reviews and awardsReviews and awards
Reviews and awards
 
Crowd statement marathon
Crowd statement marathonCrowd statement marathon
Crowd statement marathon
 
Paper presentations1
Paper presentations1Paper presentations1
Paper presentations1
 
Paper presentations2
Paper presentations2Paper presentations2
Paper presentations2
 
Hello session
Hello sessionHello session
Hello session
 
Tecnología e Igualdad
Tecnología e IgualdadTecnología e Igualdad
Tecnología e Igualdad
 
Introduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata EditathonIntroduccion a Wikidata DSS Wikidata Editathon
Introduccion a Wikidata DSS Wikidata Editathon
 
Interlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAsInterlinking Is More Than owl:sameAs
Interlinking Is More Than owl:sameAs
 
Crowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro WorkCrowd Work CV: Recognition for Micro Work
Crowd Work CV: Recognition for Micro Work
 
Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...Exploring the challenge of linking scientific publications and studies with c...
Exploring the challenge of linking scientific publications and studies with c...
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 

Swib2014csarasua

  • 1. Supporting Data Interlinking in Semantic Libraries with Microtask Crowdsourcing Cristina Sarasua SWIB 2014, Bonn Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
  • 2. Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 2
  • 3. relation a b MARC 21 EDM FRBR Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 3
  • 4. relation a b MARC 21 EDM FRBR Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 4
  • 5. Please share your thoughts on interlinking! https://etherpad.mozilla.org/4IfZDaTBIe Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 5
  • 6. https://etherpad.mozilla.org/4IfZDaTBIe Interlinking on the Web of Data Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/ Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 6
  • 7. Cross-dataset links D1 https://etherpad.mozilla.org/4IfZDaTBIe (a,r,b) | a in D1, b in D2 D2 d1:timbl owl:sameAs d2:timbernerslee; d1:donostia owl:sameAs d2:sansebastian; d1:timbl owl:sameAs d2:timbernerslee; d1:donostia owl:sameAs d2:sansebastian; d1:bjork dc:creator d2:volta; d1:bjork dc:creator d2:volta; d1:Bonn wgs84:location d2:Germany; d1:work2012 o:inspiredBy d2:song1900; d1:Bonn wgs84:location d2:Germany; d1:work2012 o:inspiredBy d2:song1900; o1:Conference owl:equivalentClass o2:Congress; o1:Conference owl:equivalentClass o2:Congress; o1:Democracy skos:related o2:Government; o1:Democracy skos:related o2:Government; o1:Publication skos:broader o2:JournalArticle; o1:Publication skos:broader o2:JournalArticle; o1:ImpressionistPainting rdfs:subClassOf o2:Painting; o1:ImpressionistPainting rdfs:subClassOf o2:Painting; Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 7
  • 8. https://etherpad.mozilla.org/4IfZDaTBIe Why is interlinking important?  Enhance the description of local entities  Richer queries over aggregated data  Cross-data set browsing What is known about Berlin? What is known about Berlin? x:berlin owl:sameAs x:berlin owl:sameAs dbpedia:Berlin; tour:berlin; dbpedia:Berlin; tour:berlin; x:berlin o:homeOf x:berlin o:homeOf authors:berlin; authors:berlin; x:img09112014 x:img09112014 lode:atPlace geo:brandtor; lode:atPlace geo:brandtor; SELECT ?city WHERE { ?city1 gov:population ?pop . ?city1 owl:sameAs ?city2 . ?city2 unesco:count ?mon . FILTER (?pop > 1000000 SELECT ?city WHERE { ?city1 gov:population ?pop . ?city1 owl:sameAs ?city2 . ?city2 unesco:count ?mon . FILTER (?pop > 1000000 ?mon > 50)} ?mon > 50)} http://www.w3.org/2005/Incubator/lld/XGR-lld-20111025/ Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 8
  • 9. Generating links D1 D2 Identify the Comparison criteria https://etherpad.mozilla.org/4IfZDaTBIe resources to be connected with relation R Picture: https://www.assembla.com/spaces/silk/wiki/Managin g_Reference_Links Decision boundary between link and non-link Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 9
  • 10. He is already busy Attribution: Thomas Leuthard Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 10
  • 11. Attribution: Thomas Leuthard He is already busy … but still would like correct and useful links Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 11
  • 12. Crowdsourced Interlinking Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 12
  • 13. Crowdsourcing “Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call” Jeff Howe, 2006 Scalable Fast Microtask crowdsourcing Macrotask crowdsourcing Contest-based crowdsourcing CCiittiizzeenn SScciieennccee -E.g. tweet sentiment analysis -Seconds, reward cents -Crowd workers register with simple profile, limited filtering -E.g. writing an E-Book -Months, $30per hour / hundreds or thousands of dollars -Freelancers recruitment, interviews -E.g. NLP algorithm for a particular challenging scenario -Months, up to thousands of dollards -Final evaluation and winner selection -E.g. classify galaxies in pictures - seconds/minutes, no money - Open to everyone Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 13
  • 14. An interlinking microtask Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 14
  • 15. An interlinking microtask Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 15
  • 16. An interlinking microtask Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 16
  • 17. Approach D1 D2 1 cl1: cl1: ((s,s,p,p,o) o) cl2: cl2: ((s,s,p,p,o) o) … … cln: cln: ((s,s,p,p,o) o) candidate links 2 3 Analyse crowd workers Collect crowd responses for the candidate links to be processed Aggregated response Collect responses Generate RDF file with final links cl5: cl5: ((s,s,p,p,o) o) … … cln: cln: ((s,s,p,p,o) o) crowd interlinking 4 Parse RDF links Query D1,D2 Generate and publish microtasks Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 17
  • 18. Approach (II)  Analyse crowd workers to filter out people – With bad intentions (i.e. scammers) – Who do not have enough knowledge  Select representative links from which the answer is known (ground truth) and assess people → domain expert useful x:b rdfs:label “Berlin”; x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Berlin”; rdf:type o:City; rdf:type o:City; x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Córdoba”; x:b2 rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; rdf:type o:City; rdf:type o:City; rdf:type o:Event; rdf:type o:Event; x:b2 rdfs:label “Córdoba”; Measure difficulty based on data heuristics rdf:rdf:type type o:o:City; City; Select different matching cases x:b rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; rdf:type o:City; wgs84:lat -31.400; rdf:type o:City; wgs84:lat -31.400; x:b2 rdf:type o:City; x:b2 rdf:type o:City; wgs84:lat 37.883; wgs84:lat 37.883; Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 18
  • 19. Approach (II)  Analyse crowd workers to filter out people – With bad intentions (i.e. scammers) – Who do not have enough knowledge  Select representative links from which the answer is known (ground truth) and assess people → domain expert useful x:b rdfs:label “Berlin”; x:b2 rdfs:label “Berlinale”; x:b rdfs:label “Berlin”; rdf:type o:City; rdf:type o:City; x:b2 rdfs:label “Berlinale”; Two-way feedback x:b rdfs:label “Córdoba”; x:b2 rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; rdf:type o:City; rdf:type o:City; rdf:type o:Event; rdf:type o:Event; x:b2 rdfs:label “Córdoba”; Measure difficulty based on data heuristics rdf:rdf:type type o:o:City; City; Select different matching cases x:b rdfs:label “Córdoba”; x:b rdfs:label “Córdoba”; rdf:type o:City; wgs84:lat -31.400; rdf:type o:City; wgs84:lat -31.400; x:b2 rdf:type o:City; x:b2 rdf:type o:City; wgs84:lat 37.883; wgs84:lat 37.883; Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 19
  • 20. Approach D1 D2 1 cl1: cl1: ((s,s,p,p,o) o) cl2: cl2: ((s,s,p,p,o) o) … … cln: cln: ((s,s,p,p,o) o) candidate links 2 3 Context information Analyse crowd workers Collect crowd responses for the candidate links to be processed Aggregated response Collect responses Generate RDF file with final links #workers per link cl5: cl5: ((s,s,p,p,o) o) … … cln: cln: ((s,s,p,p,o) o) crowd interlinking 4 Parse RDF links Query D1,D2 Generate and publish microtasks agreement Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 20
  • 21. Approach (II) D1 D2 Manual interlinking D1 D2 Algorithm Guide Review HCOMP interlinking Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 21
  • 22. Use cases Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 22
  • 23. Mapping vocabularies Run an automatic ontology alignment tool and post-process the results with the crowd See also: [Sarasua et al., 2012] Context information pre-configured Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 23
  • 24. Discovering links between instances        a) To extract the patterns of the linkage rules (i.e. labelling) b) To post-process irregular multilingual values, different name versions c) To automatically identify patterns of errors in a resulting set of links, which may be afterwards reviewed by the experts Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 24
  • 25. Curating mapping extensions to authority files Quality control can be done by giving these answers to other crowd workers Checking usefulness of links with library users  There are different possible targets for the interlinking of a dataset: which possibility to select for the Web portal?  Embed Web site in a microtask and ask for specific information or observe next Web site opened Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 25
  • 26. 3 Challenges Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 26
  • 27. # Deciding whether to crowdsource or not  Depends to a large extent on the data – Specific domains require more crowd management effort – Benefit compared to automatically generated links may vary – Availability of workers may change in time Libraries and the cultural heritage domain have high potential (multilinguality, different naming conventions, knowledge exploration)  What should be processed by the crowd – Criteria for selecting subsets of the data (e.g. confidence of machine) > Trial, error and assessment Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 27
  • 28. # Building a loyal workforce  Attracting good crowd workers – Microtasks are constantly being published – Higher reward may also attract more malicious workers  Working with people repeatedly is not supported by majority of crowdsourcing platforms  How to make crowd workers keep on working in these microtasks without them getting demotivated? > Be fair (see also Guidelines on Crowd Work for Academic Researchers, 2014) > Listen to crowd workers (e.g. direct comments, twitter, ratings, monitor online discussions) > Recognize their work > Be aware that gamification is not always the best solution It's really easy to change people's motivations, [at Zooniverse] we find people are motivated by wanting to contribute, they want a sense that this is something real. And in adding game-like elements you can destroy that quite quickly” Chris Lintott, Zoouniverse http://www.wired.co.uk/news/archive/2013- 09/12/fraxinus-gamifying-science/viewgallery/307960 Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 28
  • 29. # Working with unknown humans  Open call can be a problem and an opportinty at the same time: people have diverse – Motivation and dedication – Context and profile – Background knowledge  Crowdsourcing platforms have limited support for personalisation  Working with suitable crowd – Identify what they can do best ▪ Type of task / data level ▪ Competences vs experience cross platform analysis – Assign work accordingly ▪ Weight vs reject >Towards a Crowd Work CV See also: [Sarasua et al., 2014] Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 29
  • 30. Plea to this community  Interlinking is much more than deduplication, consider using also other relations  Consider connecting library datasets to different complementary domains  Interlinking to non editorial data can also be enriching  The more datasets you connect the better  Document your interlinking on the VoiD description of your dataset  Query and make use of available links Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 30
  • 31. If you need humans to process data while interlinking datasets, consider crowd intervention because it can be very valuable for enhancing your results. Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 31
  • 32. Thank you for your attention! Cristina Sarasua Institute for Web Science and Technologies Universität Koblenz-Landau csarasua@uni-koblenz.de http://de.slideshare.net/cristinasarasua https://github.com/criscod Institute for Web Science and Technologies · University of Koblenz-Landau, Germany
  • 33. References  Sarasua, C., Simperl, E., Noy, N.F.: CrowdMAP: Crowdsourcing ontology alignment with microtasks. In: Proceedings of the 11th International Semantic Web Conference (ISWC). (2012)  Sarasua, C., Thimm, M. Crowd Work CV: Recognition for Micro Work. In: SoHuman workshop, co-located with Social Informatics (SocInfo). (2014)  Guidelines on Crowd Work for Academic Researchers (2014). http://wiki.wearedynamo.org/index.php/Guidelines_for_Academic_Requesters Supporting Data Interlinking in Semantic Libraries with Microtask Cristina Sarasua Crowdsourcing 33