SlideShare a Scribd company logo
1 of 43
HUMAN COMPUTATION
Irene Celino ā€“ irene.celino@cefriel.com
Cefriel, Viale Sarca 226, 20126 Milano
Seminar @ Data Semantics course ā€“ April 11th, 2018
1. Introduction
2. Linked Data and Knowledge Graph Refinement
3. Human Computation and Games with a Purpose
4. Examples of GWAP for Data Linking
5. Truth Inference and Open Science
6. Guidelines
7. Indirect People Involvement
2copyright Ā© 2018 Cefriel ā€“ All rights reserved
from ideation to business value
3
1. INTRODUCTION
Is the Web a pure technological artefact?
What role can people play on the Web?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
WEB AS A SOCIAL ARTEFACT
ā€œThe Web isnā€™t about what you can do with computers.
Itā€™s people and, yes, they are connected by computers.
But computer science, as the study of
what happens in a computer, doesnā€™t tell
you about what happens on the Webā€
ā€“ sir Tim Berners-Lee
4copyright Ā© 2018 Cefriel ā€“ All rights reserved
Open Source Software
ā€œGiven enough eyeballs, all bugs are
shallow.ā€ Eric S. Raymond
(The Cathedral and the Bazaar)
OPEN EVERYTHING
Open Content
ā€œIt is easy when you skip the
intermediariesā€
original motto of Creative Commons
(EN video) (IT video)
Open Data
5copyright Ā© 2018 Cefriel ā€“ All rights reserved
ā€œRaw. Data. Now.ā€ Tim Berners-Lee
(The year open data went worldwide ā€“
TED Talk)
COOPERATION ON THE WEB TO PRODUCE OPEN KNOWLEDGE
6copyright Ā© 2018 Cefriel ā€“ All rights reserved
WISDOM OF CROWDS
ā€¢ ā€œWhy the Many Are Smarter Than the Few and
How Collective Wisdom Shapes Business, Economies, Societies and Nationsā€
ā€¢ Criteria for a wise crowd
ā€¢ Diversity of opinion (importance of interpretation)
ā€¢ Independence (not a ā€œsingle mindā€)
ā€¢ Decentralization (importance of local knowledge)
ā€¢ Aggregation (aim to get a collective decision)
ā€¢ The are also failures/risks in crowd decisions:
ā€¢ Homogeneity, centralization, division, imitation, emotionality
7copyright Ā© 2018 Cefriel ā€“ All rights reserved
James Surowiecki
The wisdom of crowds
Anchor, 2005
from ideation to business value
8
2. LINKED DATA & KNOWLEDGE
GRAPH REFINEMENT
Do we need to involve people in Semantic Web systems?
What semantic data management tasks can we effectively ā€œoutsourceā€ to humans?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
HUMANS IN THE SEMANTIC WEB
ā€¢ Knowledge-intensive and/or context-speciļ¬c character of Semantic Web tasks:
ā€¢ e.g., conceptual modelling, multi-language resource labelling, content annotation with
ontologies, concept/entity similarity recognition, ā€¦
ā€¢ Need to engage users and involve them in executing tasks:
ā€¢ e.g., wikis for semantic content authoring, folksonomies to bootstrap formal
ontologies, instance creation by data entry, ā€¦
9copyright Ā© 2018 Cefriel ā€“ All rights reserved
SEMANTIC WEB TASKS (ALSO) FOR HUMANS
10copyright Ā© 2018 Cefriel ā€“ All rights reserved
Fact level
Schema level
Collection Creation CorrectionValidation Filtering Ranking Linking
Conceptual
modelling
Ontology
population
Quality
assessment
Ontology re-
engineering
Ontology
pruning
Ontology
elicitation
Knowledge
acquisition
Ontology
repair
Knowledge
base update
Data search/
selection Link
generation
Ontology
alignment
Ontology
matching
AUTOMATIC METHODS IN THE SEMANTIC WEB?
ā€¢ Knowledge Graph Refinement (and, in general, linked dataset refinement) is an
emerging and hot topic to (1) identify and correct errors and (2) add missing knowledge
ā€¢ e.g., completing type assertions via classification, predicting relations from textual
sources, finding erroneous type assertions, identifying erroneous literal values
through anomaly/outlier detection, ā€¦
ā€¢ Statistical and machine learning approaches require some partial gold standard,
i.e. a ā€œground truthā€ dataset to train automatic models
ā€¢ Ground truth is usually put together manually by expert
ā€¢ Sourcing gold standard from humans is expensive!
11copyright Ā© 2018 Cefriel ā€“ All rights reserved
Heiko Paulheim. Knowledge graph refinement: A survey of
approaches and evaluation methods. Semantic Web Journal, 2017
DATA LINKING
ā€¢ Creation of links in the form of RDF triples (subject, predicate, object)
ā€¢ Within the same dataset (i.e. generating new connections between resources of the
same dataset or knowledge graph)
ā€¢ Across different datasets (i.e. creating RDF links, as named in the Linked Data world)
ā€¢ Note:
ā€¢ In literature, data linking often means finding equivalent resources (similarly to record
linkage in database research), i.e. triples with correspondence/match predicate (e.g.
owl:sameAs) ļƒ  in the following, data linking is intended in its broader meaning (i.e. links
with any predicate)
12copyright Ā© 2018 Cefriel ā€“ All rights reserved
DATA LINKING: SOME DEFINITIONS
ā€¢ Resources R is the set of all resources (and literals), whenever possible also described by the
respective types. More speciļ¬cally: R = Rs āˆŖ Ro, where Rs is the set of resources that can take the
role of subject in a triple and Ro is the set of resources that can take the role of object in a triple; as said
above the two sets are not necessarily disjoint, i.e. it can happen that Rs āˆ© Ro ā‰  āˆ….
ā€¢ Predicates P is the set of all predicates, whenever possible also described by the respective domain
and range.
ā€¢ Links L is the set of all links; since links are triples created between resources and predicates it is:
L āŠ‚ Rs Ɨ P Ɨ Ro; each link is deļ¬ned as l = (rs,p,ro) āˆˆ L with rs āˆˆ Rs, p āˆˆ P, ro āˆˆ Ro.
L is usually smaller than the full Cartesian product of Rs, P, Ro, because in each link (rs,p,ro) it
must be true that rs āˆˆ domain(p) and ro āˆˆ range(p).
ā€¢ Link scores Ļƒ is the score of a link, i.e. a value indicating the conļ¬dence on the truth value of the link;
usually Ļƒ āˆˆ [0,1]; each link l āˆˆ L can have an associated score.
13copyright Ā© 2018 Cefriel ā€“ All rights reserved
CASES OF DATA LINKING
ā€¢ Link creation: a link l is created: given R = Rs āˆŖ Ro and P, the link l = (rs,p,ro), with rs āˆˆ Rs,
p āˆˆ P, ro āˆˆ Ro is created and added to L
ā€¢ e.g., music classiļ¬cation: assign one or more music styles to audio tracks by creating the link
(track,genre,style)
ā€¢ Link ranking: given the set of links L, a score Ļƒ āˆˆ [0,1] is assigned to each link l. The score
represents the probability of the link to be recognized as true. Links can be ordered on the basis of their
score Ļƒ, thus obtaining a ranking
ā€¢ e.g., ranking photos depicting a speciļ¬c person (an actor, a singer, a politician) to identify the
pictures in which the person is more recognizable or more clearly depicted
ā€¢ Link validation: given the set of links L, a score Ļƒ āˆˆ [0,1] is assigned to each link l. The score
represents the actual truth value of the link. A threshold t āˆˆ [0,1] is set so that all links with score
Ļƒ ā‰„ t are considered true
ā€¢ e.g., assessing the correct music style identiļ¬cation in audio tracks (music classification)
14copyright Ā© 2018 Cefriel ā€“ All rights reserved
from ideation to business value
15
3. HUMAN COMPUTATION &
GAMES WITH A PURPOSE
What goals can humans help machines to achieve? How to involve a crowd of persons?
What extrinsic rewards (money, prizes, etc.) or intrinsic incentives can we adopt to
motivate people?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
HUMAN COMPUTATION
ā€¢ Human Computation is a computer science technique in which a computational process
is performed by outsourcing certain steps to humans. Unlike traditional computation,
in which a human delegates a task to a computer, in Human Computation the computer
asks a person or a large group of people to solve a problem; then it collects, interprets
and integrates their solutions
ā€¢ The original concept of Human Computation by its inventor Luis von Ahn derived from the
common sense observation that people are intrinsically very good at solving some
kinds of tasks which are, on the other hand, very hard to address for a computer;
this is the case of a number of targets of Artiļ¬cial Intelligence (like image recognition or
natural language understanding) for which research is still open
16copyright Ā© 2018 Cefriel ā€“ All rights reserved
Edith Law and Luis von Ahn. Human computation.
Synthesis Lectures on Artiļ¬cial Intelligence and Machine Learning, 2011
HUMAN COMPUTATION
17copyright Ā© 2018 Cefriel ā€“ All rights reserved
Problem: an Artificial Intelligence
algorithm is unable to achieve an
adequate result with a satisfactory
level of confidence
Solution: ask people to intervene
when the AI system fails, ā€œmaskingā€
the task within another human
process
Example: https://www.google.com/recaptcha/
CROWDSOURCING
ā€¢ Crowdsourcing is the process to outsource tasks to a ā€œcrowdā€ of distributed people.
The possibility to exploit the Internet as vehicle to recruit contributors and to assign
tasks led to the rise of micro-work platforms, thus often (but not always) implying a
monetary reward. The term Crowdsourcing, although quite recent, is used to indicate a
wide range of practices; however, the most common meaning of Crowdsourcing implies
that the ā€œcrowdā€ of workers involved in the solution of tasks is different from the traditional
or intended groups of task solvers
18copyright Ā© 2018 Cefriel ā€“ All rights reserved
Jeff Howe. Crowdsourcing: How the power of the crowd
is driving the future of business. Random House, 2008
CROWDSOURCING
19copyright Ā© 2018 Cefriel ā€“ All rights reserved
Problem: a company needs to
execute a lot of simple tasks,
but cannot afford hiring a
person to do that job
Solution: pack tasks in
bunches (human intelligence
tasks or HITs) and outsource
them to a very cheap workforce
through an online platform
Example: https://www.mturk.com/
CITIZEN SCIENCE
ā€¢ Citizen Science is the involvement of volunteers to collect or process data as part of
a scientiļ¬c or research experiment; those volunteers can be the scientists and
researchers themselves, but more often the name of this discipline ā€œimplies a form of
science developed and enacted by citizensā€ including those ā€œoutside of formal scientiļ¬c
institutionsā€, thus representing a form of public participation to science. Formally, Citizen
Science has been deļ¬ned as ā€œthe systematic collection and analysis of data; development
of technology; testing of natural phenomena; and the dissemination of these activities by
researchers on a primarily avocational basisā€.
20copyright Ā© 2018 Cefriel ā€“ All rights reserved
Alan Irwin. Citizen science: A study of people, expertise
and sustainable development. Psychology Press, 1995
CITIZEN SCIENCE
21copyright Ā© 2018 Cefriel ā€“ All rights reserved
Example: https://www.zooniverse.org/
Problem: a scientific
experiment requires the
execution of a lot of simple
tasks, but researchers are busy
Solution: engage the general
audience in solving those tasks,
explaining that they are
contributing to science,
research and the public good
SPOT THE DIFFERENCEā€¦
ā€¢ Similarities:
ā€¢ Involvement of people
ā€¢ No automatic replacement
ā€¢ Variations:
ā€¢ Motivation
ā€¢ Reward (glory, money, passion/need)
ā€¢ Hybrids or parallel!
22copyright Ā© 2018 Cefriel ā€“ All rights reserved
Citizen Science
Crowdsourcing
Human
Computation
GAMES WITH A PURPOSE
ā€¢ A GWAP lets to outsource to humans some steps of a computational process in an
entertaining way
ā€¢ The application has a ā€œcollateral effectā€, because playersā€™ actions are exploited to
solve a hidden task
ā€¢ The application *IS* a fully-fledged game (opposed to gamification, which is the use
of game-like features in non-gaming environments)
ā€¢ The players are (usually) unaware of the hidden purpose, they simply meet game
challenges
23copyright Ā© 2018 Cefriel ā€“ All rights reserved
Luis Von Ahn. Games with a purpose. Computer, 39(6):92ā€“94, 2006
Luis Von Ahn and Laura Dabbish. Designing games with a purpose.
Communications of the ACM, 51(8):58ā€“67, 2008
GAMES WITH A PURPOSE (GWAP)
24copyright Ā© 2018 Cefriel ā€“ All rights reserved
Problem: itā€™s the same of
Human Computation (ask
humans when AI fails)
Solution: Solution: hide the
task within a game, so that
users are motivated by game
challenges, often remaining
unaware of the hidden purpose,
task solution comes from
agreement between players
from ideation to business value
25
4. GWAPS FOR DATA LINKING
Can we embed data linking tasks within Games with a Purpose?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
26
ā€¢ Input: set of all links
<asset>
foaf:depiction
<photo>
ā€¢ Goal: assign score šœŽ to
rank links on their
recognisability/representa-
tiveness
ā€¢ The score šœŽ is a function of
š‘‹ š‘ where š‘‹ is the no. of
successes (=recognitions)
and š‘ the no. of trials of
the Bernoulli process
(guess or not guess)
realized by the game
ā€¢ Cultural heritage assets in Milano and their pictures
LINK RANKING
copyright Ā© 2018 Cefriel ā€“ All rights reserved
http://bit.ly/indomilando
Pure GWAP with
hidden purpose
Points, badges,
leaderboard as
intrinsic reward
Link ranking is a result
of the ā€œagreementā€
between players
But also an
educational
ā€œcollateral effectā€
Irene Celino, Andrea Fiano, Riccardo Fino. Analysis of a Cultural Heritage Game with a Purpose
with an Educational Incentive. 16th International Conference on Web Engineering, 2016
27
ā€¢ Input: set of links
<land-area>
clc:hasLandCover
<land-cover>
ā€¢ Goal: assign score šœŽ to
each link to discover the
ā€œrightā€ land cover class
ā€¢ Score šœŽ of each link is
updated on the basis of
playersā€™ choices
(incremented if link
selected, decremented if
link not selected)
ā€¢ When the score of a link
overcomes the threshold
šœŽ ā‰„ š‘” , the link is considered
ā€œtrueā€ (and removed from
the game)
ā€¢ Two automatic classifications in disagreement:
<land-cover-assigned-by-DUSAF> ā‰  <land-cover-assigned-by-GL30>
LINK VALIDATION
copyright Ā© 2018 Cefriel ā€“ All rights reserved
https://youtu.be/Q0ru1hhDM9Q
http://bit.ly/foss4game
Pure GWAP with
not-so-hidden purpose
(played by ā€œexpertsā€)
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
guess one of the two
disagreeing classifications
Link validation is a result
of the ā€œagreementā€
between players
Maria Antonia Brovelli, Irene Celino, Andrea Fiano, Monia Elisa Molinari, Vijaycharan Venkatachalam.
A crowdsourcing-based game for land cover validation. Applied Geomatics, 2017
28
ā€¢ Input: set of subject
resources (pictures) and
object resources
(classification categories)
ā€¢ Goal: create links
<picture> hasCategory
<category> and assign
score šœŽ to each link
ā€¢ Score šœŽ of each link is
updated on the basis of
playersā€™ choices
(incremented if link
selected)
ā€¢ When the score of a link
overcomes the threshold
šœŽ ā‰„ š‘” , the link is considered
ā€œtrueā€ (and the picture is
removed from the game)
ā€¢ Identify pictures of cities from above between those taken on board of the ISS (the pictures are
used then in a scientific process in light pollution research)
LINK COLLECTION & VALIDATION
copyright Ā© 2018 Cefriel ā€“ All rights reserved
http://nightknights.eu
Pure GWAP with
not-so-hidden purpose
(but played by anybody)
Points, badges,
leaderboard as
intrinsic reward
A player scores if he/she
agrees with another player
ā€œBonusā€ intrinsic reward
with NASA pictures!
Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning:
an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
from ideation to business value
29
5. TRUTH INFERENCE &
OPEN SCIENCE
How do we aggregate the contributions from the crowd?
Are individual contribution of any value?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
AGGREGATION OF CONTRIBUTIONS
ā€¢ The same task is usually given to multiple human contributors (named workers in crowdsourcing)
ā€¢ Results on the same task are then aggregated across different contributors (ā€œwisdom of crowdsā€)
ā€¢ How to perform the truth inference process?
ā€¢ Simplistic solution: majority voting across all contributors
ā€¢ Butā€¦ are all contributors ā€œcreated equalā€? No! Less simplistic solutions:
ā€¢ Majority voting across ā€œqualityā€ contributors (filtering out ā€œspammersā€)
ā€¢ Weighted majority voting with estimation of contributors ā€œreliabilityā€
ā€¢ Expectation maximization
ā€¢ Message passingā€¦ and a lot more!
ā€¢ How to compute contributor reliability?
ā€¢ Assessment tasks (gold standard) with known solution to measure reliability
ā€¢ History of contributions/past behaviours to compute a ā€œreputationā€ value
30copyright Ā© 2018 Cefriel ā€“ All rights reserved
TRUTH INFERENCE GENERIC ALGORITHM
31copyright Ā© 2018 Cefriel ā€“ All rights reserved
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng.
Truth Inference in Crowdsourcing: Is the Problem Solved? VLDB 2017
Input: contributions
Output: truth and reliability
Step 2: compute an estimation
of contributor reliability (e.g.
precision on truth estimation)
Step 1: compute an
estimation of the truth
(e.g. majority voting)
Iterate until convergence (e.g.
until some difference w.r.t.
previous step is really small)
OPEN SCIENCE: ENABLING COMPARE & CONTRAST
ā€¢ Open Science has the aim to make scientific research and data accessible to all levels of society
ā€¢ Repeatability and reproducibility are among the foundational principles of open science
ā€¢ Human Computation aims at involving people in some step of the scientific process
ā€¢ Human contributors generate data to solve assigned tasks
ā€¢ Algorithms aggregate contributions in the truth inference process
ā€¢ Can we compare different truth inference algorithms?
ā€¢ Yes, if we make available the data of the Human Computation process!
ā€¢ What can we share, e.g. in the case of data linking tasks?
ā€¢ ā€œTrueā€ and ā€œfalseā€ links
ā€¢ Confidence scores of the links
ā€¢ Individual contributions and aggregation process
32copyright Ā© 2018 Cefriel ā€“ All rights reserved
PROV-O AND HUMAN COMPUTATION ONTOLOGY
ā€¢ Provenance is information about entities, activities, and people involved in producing
a piece of data or thing (used to assess its quality, reliability or trustworthiness)
ā€¢ W3C defined the PROV-O ontology to capture provenance information
https://www.w3.org/TR/prov-o/
ā€¢ The Human Computation ontology extends PROV-O
to describe the data shared within a Human Computation Process
http://swa.cefriel.it/ontologies/hc
ā€¢ Data linking process information can be published
according to linked data principles described with the HC ontology
(e.g. data from the Urbanopoly GWAP at http://swa.cefriel.it/linkeddata/)
33copyright Ā© 2018 Cefriel ā€“ All rights reserved
aggregatedFrom
Contributor
Contribution
Human
Computation Task
provo:Agent
provo:Entity
provo:Activity
Consolidated
Information
solvedBy
enabledBy
contributionFrom
solutionTo
aggregatedBy
Human
Computation
Algorithm
Irene Celino. Human Computation VGI Provenance: Semantic Web-based Representation and Publishing.
IEEE Transactions on Geoscience and Remote Sensing, 2013
from ideation to business value
34
6. GUIDELINES
Is it that easy to involve people on the Web?
What should we care of when designing a human computation system?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
MICE AND MEN (OR: KEEP IT SIMPLE)
ā€¢ Crowdsourcing workers behave like mice
ā€¢ Mice prefer to use their motor skills (biologically cheap, e.g. pressing a lever to get food) rather
than their cognitive skills (biologically expensive, e.g. going through a labyrinth to get food)
ā€¢ Workers prefer/are better at simple tasks (e.g. those that can be solved at first sight) and
discard/are worse at more complex tasks (e.g. those that require logics)
ā€¢ Crowdsourcing tasks should be carefully designed
ā€¢ Tasks as simple as possible for the workers to solve
ā€¢ Complex tasks together with other incentives (e.g. variety/novelty)
35copyright Ā© 2018 Cefriel ā€“ All rights reserved
Panos Ipeirotis. On Mice and Men: The Role of Biology in Crowdsourcing,
Keynote talk at Collective Intelligence, 2012.
DIVIDE ET IMPERA (OR: FIND-FIX-VERIFY)
ā€¢ Find-Fix-Verify crowd programming pattern
ā€¢ A long and ā€œexpensiveā€ taskā€¦
ā€¢ Summarize a text to shorten its total length
ā€¢ ā€¦is decomposed in more atomic tasksā€¦
1. find sentences that need to be shortened
2. fix a sentence by shortening it
3. verify which summarized sentence maintains original meaning
ā€¢ ā€¦and the complex task is turned into a workflow of simple
tasks, and each step is outsourced to a crowd
36copyright Ā© 2018 Cefriel ā€“ All rights reserved
M. Bernstein, G. Little, R. Miller, B. Hartmann, M. Ackerman, D. Karger, D. Crowell, K. Panovich.
Soylent: A Word Processor with a Crowd Inside, UIST Proceedings, 2010.
COMPARE AND CONTRAST
ā€¢ A sort of ā€œwisdom of the crowd(sourcing methods)ā€:
(1) apply different approaches to solve the same problem
and (2) compare results
ā€¢ Which is the best approach
for a specific use case?
ā€¢ Which is the most suitable crowd?
ā€¢ Is human computation better/faster/cheaper
than machine computation?
ā€¢ Knowledge Graph Refinement: use Human Computation
to ā€œcrowdsourceā€ a gold standard and then use it to train
some statistical/machine learning algorithm
37copyright Ā© 2018 Cefriel ā€“ All rights reserved
input
task
output
solution
Human
Computation
Machine
Computation
input
task
output
solution
Human Computation
Machine Computation
input
task
output
solution
Machine
Computation
Human
Computation
input
task
output
solution
Machine
Computation
Human
Computation
Human
Computation
Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning:
an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
FINAL NOTE ON DISAGREEMENT
ā€¢ Is there always a ā€œright answerā€? Or is there a ā€œcrowd truthā€?
ā€¢ Not always true/false, because of human subjectivity,
ambiguity and uncertainty
ā€¢ Disagreement across contributors is not necessarily bad,
but a sign of: different opinions, interpretations, contexts,
perspectives, ā€¦
ā€¢ Remember the long tail theoryā€¦
ā€¢ ā€¦and ask yourself who are your users
and who you want to involve
38copyright Ā© 2018 Cefriel ā€“ All rights reserved
Lora Aroyo, Chris Welty. Truth is a Lie: 7 Myths about Human Annotation. AI Magazine 2014.
from ideation to business value
39
7. INDIRECT PEOPLE INVOLVEMENT
Are there indirect ways to involve humans in data processing?
copyright Ā© 2018 Cefriel ā€“ All rights reserved
HUMANS AS A SOURCE OF INFORMATION
ā€¢ People are not only task executors, they are also information providers!
ā€¢ Opportunistic sensing
ā€¢ Voluntary or involuntary digital traces of human-related activities
ā€¢ e.g., phone call logs, GPS traces, social media activities
ā€¢ Open content and cooperative knowledge
ā€¢ Data explicitly provided by people can ā€œhideā€ further information
ā€¢ e.g., logs of wiki editing, statistical distribution of contributes
40copyright Ā© 2018 Cefriel ā€“ All rights reserved
FROM POI INFORMATION AND PHONE CALL LOGS TO LAND USE
ā€¢ General topic: exploit ā€œlow-costā€ information about a geographic area as features to
train a predictive model that outputs ā€œexpensiveā€ information about the same area
ā€¢ ā€œInexpensiveā€ input information:
ā€¢ Geo-information about points of interests
ā€¢ Mobile traffic data processed using different time series techniques ā€“
smoothing, decomposition, ļ¬ltering, time-windowing
ā€¢ ā€œExpensiveā€ output information:
ā€¢ Land use characterization (usually collected through long and expensive
workflows that mix machine processing and costly human labour)
41copyright Ā© 2018 Cefriel ā€“ All rights reserved
Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Extracting Urban Land Use from Linked Open Geospatial Data. IJGI, 2015
Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Filtering and Windowing Mobile Traffic Time Series for Territorial Land Use Classification. COMCOM, 2016
FROM SPATIAL ANALYTICS TO GEO-ONTOLOGY ENGINEERING
ā€¢ OpenStreetMap collects information about points of interest (POI)
ā€¢ Spatial distribution and conglomeration of specific POIs can give hints
about the geographical space
ā€¢ Re-engineering of spatial features through comparison between areas:
same POI type shows different distribution ļƒ  evidence for different
semantics (e.g. what is a pub in Milano vs. London)
ā€¢ Semantic specification of spatial neighbourhoods:
ā€¢ Emerging neighbourhoods from spatial clustering of POIs (opposed
to administrative divisions)
ā€¢ Spatial version of tf-idf to compare between different areas (e.g.
central or peripheral areas in different cities) and to characterise
neighbourhoods (e.g. shopping district)
42copyright Ā© 2018 Cefriel ā€“ All rights reserved
Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni. Supporting Geo-Ontology
Engineering through Spatial Data Analytics. 13th Extended Semantic Web Conference, 2016
MILANO
viale Sarca 226,
20126,
Milano - Italy
LONDON
4th floor
57 Rathbone Place
London W1T 1JU ā€“ UK
NEW YORK
One Liberty Plaza,
165 Broadway, 23rd Floor,
New York City, New York, 10006 USA
Cefriel.com
Thanks for your attention!
Any question?
Irene Celino
Knowledge Technologies
Digital Interaction Division
irene.celino@cefriel.com

More Related Content

What's hot

Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Big Data Spain
Ā 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebJames Hendler
Ā 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
Ā 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
Ā 
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation MatrixOWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation MatrixParis Open Source Summit
Ā 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agavejuanaya
Ā 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Benjamin Heitmann
Ā 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - IreneSSSW
Ā 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for RealJames Hendler
Ā 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside StoryJames Hendler
Ā 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Artificial Intelligence Institute at UofSC
Ā 
SSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialSSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialIrene Celino
Ā 

What's hot (15)

Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Human-in-the-loop: a design pattern for managing teams which leverage ML by P...
Ā 
On Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the WebOn Beyond OWL: challenges for ontologies on the Web
On Beyond OWL: challenges for ontologies on the Web
Ā 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
Ā 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
Ā 
PhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher ThomasPhD thesis defense of Christopher Thomas
PhD thesis defense of Christopher Thomas
Ā 
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation MatrixOWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
OWF14 - Plenary Session : Ori Pekelman, Founder, Constellation Matrix
Ā 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agave
Ā 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...
Ā 
Wither OWL
Wither OWLWither OWL
Wither OWL
Ā 
Tutorial Cognition - Irene
Tutorial Cognition - IreneTutorial Cognition - Irene
Tutorial Cognition - Irene
Ā 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
Ā 
Semantic Web: The Inside Story
Semantic Web: The Inside StorySemantic Web: The Inside Story
Semantic Web: The Inside Story
Ā 
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Hemant Purohit PhD Defense: Mining Citizen Sensor Communities for Cooperation...
Ā 
Implementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big DataImplementing Artificial Intelligence with Big Data
Implementing Artificial Intelligence with Big Data
Ā 
SSSW 2016 Cognition Tutorial
SSSW 2016 Cognition TutorialSSSW 2016 Cognition Tutorial
SSSW 2016 Cognition Tutorial
Ā 

Similar to Human Computation

Human computation @ Data Semantics
Human computation @ Data SemanticsHuman computation @ Data Semantics
Human computation @ Data SemanticsIrene Celino
Ā 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsSebastian Dennerlein
Ā 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera, Inc.
Ā 
Human Computation for VGI Management
Human Computation for VGI ManagementHuman Computation for VGI Management
Human Computation for VGI ManagementIrene Celino
Ā 
Metadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionMetadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionKevin Rundblad
Ā 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...DATAVERSITY
Ā 
JIMS Rohini IT Flash Monthly Newsletter - October Issue
JIMS Rohini IT Flash Monthly Newsletter  - October IssueJIMS Rohini IT Flash Monthly Newsletter  - October Issue
JIMS Rohini IT Flash Monthly Newsletter - October IssueJIMS Rohini Sector 5
Ā 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxjuliennehar
Ā 
Human-in-the-loop @ ISWS 2019
Human-in-the-loop @ ISWS 2019Human-in-the-loop @ ISWS 2019
Human-in-the-loop @ ISWS 2019Irene Celino
Ā 
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Bernhard Rieder
Ā 
Data Viz for Data Discovery
Data Viz for Data DiscoveryData Viz for Data Discovery
Data Viz for Data DiscoveryMegan Bowe
Ā 
Machine Learning and Social Participation
Machine Learning and Social ParticipationMachine Learning and Social Participation
Machine Learning and Social ParticipationYasodara Cordova
Ā 
Metalayer, now Colayer at Internet Expo
Metalayer, now Colayer at Internet ExpoMetalayer, now Colayer at Internet Expo
Metalayer, now Colayer at Internet ExpoMarkus Hegi
Ā 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
Ā 
Seminar 20221027 v4.pptx
Seminar 20221027 v4.pptxSeminar 20221027 v4.pptx
Seminar 20221027 v4.pptxISSIP
Ā 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
Ā 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningJulian Bright
Ā 
Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...
Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...
Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...Janna Joceli Omena
Ā 
Thwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AIThwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AINeo4j
Ā 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
Ā 

Similar to Human Computation (20)

Human computation @ Data Semantics
Human computation @ Data SemanticsHuman computation @ Data Semantics
Human computation @ Data Semantics
Ā 
CC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithmsCC TEL- Simulation-based co-design of algorithms
CC TEL- Simulation-based co-design of algorithms
Ā 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Ā 
Human Computation for VGI Management
Human Computation for VGI ManagementHuman Computation for VGI Management
Human Computation for VGI Management
Ā 
Metadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge ProductionMetadata in a Crowd: Shared Knowledge Production
Metadata in a Crowd: Shared Knowledge Production
Ā 
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Smart Data Webinar: Choosing the Right Data Management Architecture for Cogni...
Ā 
JIMS Rohini IT Flash Monthly Newsletter - October Issue
JIMS Rohini IT Flash Monthly Newsletter  - October IssueJIMS Rohini IT Flash Monthly Newsletter  - October Issue
JIMS Rohini IT Flash Monthly Newsletter - October Issue
Ā 
Toward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docxToward a System Building Agenda for Data Integration(and Dat.docx
Toward a System Building Agenda for Data Integration(and Dat.docx
Ā 
Human-in-the-loop @ ISWS 2019
Human-in-the-loop @ ISWS 2019Human-in-the-loop @ ISWS 2019
Human-in-the-loop @ ISWS 2019
Ā 
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Analyzing Social Media with Digital Methods. Possibilities, Requirements, and...
Ā 
Data Viz for Data Discovery
Data Viz for Data DiscoveryData Viz for Data Discovery
Data Viz for Data Discovery
Ā 
Machine Learning and Social Participation
Machine Learning and Social ParticipationMachine Learning and Social Participation
Machine Learning and Social Participation
Ā 
Metalayer, now Colayer at Internet Expo
Metalayer, now Colayer at Internet ExpoMetalayer, now Colayer at Internet Expo
Metalayer, now Colayer at Internet Expo
Ā 
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
Ā 
Seminar 20221027 v4.pptx
Seminar 20221027 v4.pptxSeminar 20221027 v4.pptx
Seminar 20221027 v4.pptx
Ā 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
Ā 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
Ā 
Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...
Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...
Networks, Hashtags, Memes: A Quali-Quantitative Approach for Exploring Social...
Ā 
Thwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AIThwart Fraud Using Graph-Enhanced Machine Learning and AI
Thwart Fraud Using Graph-Enhanced Machine Learning and AI
Ā 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
Ā 

More from Irene Celino

Knowledge Technologies group at Cefriel
Knowledge Technologies group at CefrielKnowledge Technologies group at Cefriel
Knowledge Technologies group at CefrielIrene Celino
Ā 
Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...
Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...
Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...Irene Celino
Ā 
A Framework to build Games with a Purpose for Linked Data Reļ¬nement
A Framework to build Games with a Purpose  for Linked Data Reļ¬nementA Framework to build Games with a Purpose  for Linked Data Reļ¬nement
A Framework to build Games with a Purpose for Linked Data Reļ¬nementIrene Celino
Ā 
Involving people in Citizen Science through game incentives: the case of the ...
Involving people in Citizen Science through game incentives: the case of the ...Involving people in Citizen Science through game incentives: the case of the ...
Involving people in Citizen Science through game incentives: the case of the ...Irene Celino
Ā 
Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...
Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...
Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...Irene Celino
Ā 
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility BehavioursNinja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility BehavioursIrene Celino
Ā 
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...Irene Celino
Ā 
Give and Take in Citizen Science
Give and Take in Citizen ScienceGive and Take in Citizen Science
Give and Take in Citizen ScienceIrene Celino
Ā 
Ninja Riders @ Human Factory Day 2017
Ninja Riders @ Human Factory Day 2017Ninja Riders @ Human Factory Day 2017
Ninja Riders @ Human Factory Day 2017Irene Celino
Ā 
Night Knights: exploiting games to engage people in a citizen science campaign
Night Knights: exploiting games to engage people in a citizen science campaignNight Knights: exploiting games to engage people in a citizen science campaign
Night Knights: exploiting games to engage people in a citizen science campaignIrene Celino
Ā 
STARS4ALL-CAPSSI-Workshop
STARS4ALL-CAPSSI-WorkshopSTARS4ALL-CAPSSI-Workshop
STARS4ALL-CAPSSI-WorkshopIrene Celino
Ā 
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...Irene Celino
Ā 
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...Irene Celino
Ā 
Supporting Geo-Ontology Engineering through Spatial Data Analytics
Supporting Geo-Ontology Engineering through Spatial Data AnalyticsSupporting Geo-Ontology Engineering through Spatial Data Analytics
Supporting Geo-Ontology Engineering through Spatial Data AnalyticsIrene Celino
Ā 
Smart City Semantics - Data Analytics and Human Computation to understand the...
Smart City Semantics - Data Analytics and Human Computation to understand the...Smart City Semantics - Data Analytics and Human Computation to understand the...
Smart City Semantics - Data Analytics and Human Computation to understand the...Irene Celino
Ā 
Towards a Semantic City Service Ecosystem
Towards a Semantic City Service EcosystemTowards a Semantic City Service Ecosystem
Towards a Semantic City Service EcosystemIrene Celino
Ā 
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014Irene Celino
Ā 
Urbanopoly @ PlanetData review
Urbanopoly @ PlanetData reviewUrbanopoly @ PlanetData review
Urbanopoly @ PlanetData reviewIrene Celino
Ā 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Irene Celino
Ā 
Urbanopoly minute madness
Urbanopoly minute madnessUrbanopoly minute madness
Urbanopoly minute madnessIrene Celino
Ā 

More from Irene Celino (20)

Knowledge Technologies group at Cefriel
Knowledge Technologies group at CefrielKnowledge Technologies group at Cefriel
Knowledge Technologies group at Cefriel
Ā 
Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...
Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...
Interplay of Game Incentives, Player Proļ¬les and Task Diļ¬ƒculty in Games with ...
Ā 
A Framework to build Games with a Purpose for Linked Data Reļ¬nement
A Framework to build Games with a Purpose  for Linked Data Reļ¬nementA Framework to build Games with a Purpose  for Linked Data Reļ¬nement
A Framework to build Games with a Purpose for Linked Data Reļ¬nement
Ā 
Involving people in Citizen Science through game incentives: the case of the ...
Involving people in Citizen Science through game incentives: the case of the ...Involving people in Citizen Science through game incentives: the case of the ...
Involving people in Citizen Science through game incentives: the case of the ...
Ā 
Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...
Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...
Ninja Riders: sensibilizzare i giovani a una mobilitĆ  piĆ¹ sicura attraverso i...
Ā 
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility BehavioursNinja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
Ninja Riders - Youth and Road Safety: Discovering Urban Mobility Behaviours
Ā 
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
BotDCAT-AP: An Extension of the DCAT Application Profile for Describing Datas...
Ā 
Give and Take in Citizen Science
Give and Take in Citizen ScienceGive and Take in Citizen Science
Give and Take in Citizen Science
Ā 
Ninja Riders @ Human Factory Day 2017
Ninja Riders @ Human Factory Day 2017Ninja Riders @ Human Factory Day 2017
Ninja Riders @ Human Factory Day 2017
Ā 
Night Knights: exploiting games to engage people in a citizen science campaign
Night Knights: exploiting games to engage people in a citizen science campaignNight Knights: exploiting games to engage people in a citizen science campaign
Night Knights: exploiting games to engage people in a citizen science campaign
Ā 
STARS4ALL-CAPSSI-Workshop
STARS4ALL-CAPSSI-WorkshopSTARS4ALL-CAPSSI-Workshop
STARS4ALL-CAPSSI-Workshop
Ā 
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
Towards Talkin'Piazza: Engaging Citizens through Playful Interaction with Urb...
Ā 
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
Analysis of a Cultural Heritage Game with a Purpose with an Educational Incen...
Ā 
Supporting Geo-Ontology Engineering through Spatial Data Analytics
Supporting Geo-Ontology Engineering through Spatial Data AnalyticsSupporting Geo-Ontology Engineering through Spatial Data Analytics
Supporting Geo-Ontology Engineering through Spatial Data Analytics
Ā 
Smart City Semantics - Data Analytics and Human Computation to understand the...
Smart City Semantics - Data Analytics and Human Computation to understand the...Smart City Semantics - Data Analytics and Human Computation to understand the...
Smart City Semantics - Data Analytics and Human Computation to understand the...
Ā 
Towards a Semantic City Service Ecosystem
Towards a Semantic City Service EcosystemTowards a Semantic City Service Ecosystem
Towards a Semantic City Service Ecosystem
Ā 
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
Living Land Use - Telecom Big Data Challenge - Trento ICT Days 2014
Ā 
Urbanopoly @ PlanetData review
Urbanopoly @ PlanetData reviewUrbanopoly @ PlanetData review
Urbanopoly @ PlanetData review
Ā 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Ā 
Urbanopoly minute madness
Urbanopoly minute madnessUrbanopoly minute madness
Urbanopoly minute madness
Ā 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜RTylerCroy
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
Ā 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
Ā 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
Ā 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
Ā 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Ā 
šŸ¬ The future of MySQL is Postgres šŸ˜
šŸ¬  The future of MySQL is Postgres   šŸ˜šŸ¬  The future of MySQL is Postgres   šŸ˜
šŸ¬ The future of MySQL is Postgres šŸ˜
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Ā 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Ā 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
Ā 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Ā 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Ā 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Ā 

Human Computation

  • 1. HUMAN COMPUTATION Irene Celino ā€“ irene.celino@cefriel.com Cefriel, Viale Sarca 226, 20126 Milano Seminar @ Data Semantics course ā€“ April 11th, 2018
  • 2. 1. Introduction 2. Linked Data and Knowledge Graph Refinement 3. Human Computation and Games with a Purpose 4. Examples of GWAP for Data Linking 5. Truth Inference and Open Science 6. Guidelines 7. Indirect People Involvement 2copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 3. from ideation to business value 3 1. INTRODUCTION Is the Web a pure technological artefact? What role can people play on the Web? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 4. WEB AS A SOCIAL ARTEFACT ā€œThe Web isnā€™t about what you can do with computers. Itā€™s people and, yes, they are connected by computers. But computer science, as the study of what happens in a computer, doesnā€™t tell you about what happens on the Webā€ ā€“ sir Tim Berners-Lee 4copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 5. Open Source Software ā€œGiven enough eyeballs, all bugs are shallow.ā€ Eric S. Raymond (The Cathedral and the Bazaar) OPEN EVERYTHING Open Content ā€œIt is easy when you skip the intermediariesā€ original motto of Creative Commons (EN video) (IT video) Open Data 5copyright Ā© 2018 Cefriel ā€“ All rights reserved ā€œRaw. Data. Now.ā€ Tim Berners-Lee (The year open data went worldwide ā€“ TED Talk)
  • 6. COOPERATION ON THE WEB TO PRODUCE OPEN KNOWLEDGE 6copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 7. WISDOM OF CROWDS ā€¢ ā€œWhy the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nationsā€ ā€¢ Criteria for a wise crowd ā€¢ Diversity of opinion (importance of interpretation) ā€¢ Independence (not a ā€œsingle mindā€) ā€¢ Decentralization (importance of local knowledge) ā€¢ Aggregation (aim to get a collective decision) ā€¢ The are also failures/risks in crowd decisions: ā€¢ Homogeneity, centralization, division, imitation, emotionality 7copyright Ā© 2018 Cefriel ā€“ All rights reserved James Surowiecki The wisdom of crowds Anchor, 2005
  • 8. from ideation to business value 8 2. LINKED DATA & KNOWLEDGE GRAPH REFINEMENT Do we need to involve people in Semantic Web systems? What semantic data management tasks can we effectively ā€œoutsourceā€ to humans? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 9. HUMANS IN THE SEMANTIC WEB ā€¢ Knowledge-intensive and/or context-speciļ¬c character of Semantic Web tasks: ā€¢ e.g., conceptual modelling, multi-language resource labelling, content annotation with ontologies, concept/entity similarity recognition, ā€¦ ā€¢ Need to engage users and involve them in executing tasks: ā€¢ e.g., wikis for semantic content authoring, folksonomies to bootstrap formal ontologies, instance creation by data entry, ā€¦ 9copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 10. SEMANTIC WEB TASKS (ALSO) FOR HUMANS 10copyright Ā© 2018 Cefriel ā€“ All rights reserved Fact level Schema level Collection Creation CorrectionValidation Filtering Ranking Linking Conceptual modelling Ontology population Quality assessment Ontology re- engineering Ontology pruning Ontology elicitation Knowledge acquisition Ontology repair Knowledge base update Data search/ selection Link generation Ontology alignment Ontology matching
  • 11. AUTOMATIC METHODS IN THE SEMANTIC WEB? ā€¢ Knowledge Graph Refinement (and, in general, linked dataset refinement) is an emerging and hot topic to (1) identify and correct errors and (2) add missing knowledge ā€¢ e.g., completing type assertions via classification, predicting relations from textual sources, finding erroneous type assertions, identifying erroneous literal values through anomaly/outlier detection, ā€¦ ā€¢ Statistical and machine learning approaches require some partial gold standard, i.e. a ā€œground truthā€ dataset to train automatic models ā€¢ Ground truth is usually put together manually by expert ā€¢ Sourcing gold standard from humans is expensive! 11copyright Ā© 2018 Cefriel ā€“ All rights reserved Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web Journal, 2017
  • 12. DATA LINKING ā€¢ Creation of links in the form of RDF triples (subject, predicate, object) ā€¢ Within the same dataset (i.e. generating new connections between resources of the same dataset or knowledge graph) ā€¢ Across different datasets (i.e. creating RDF links, as named in the Linked Data world) ā€¢ Note: ā€¢ In literature, data linking often means finding equivalent resources (similarly to record linkage in database research), i.e. triples with correspondence/match predicate (e.g. owl:sameAs) ļƒ  in the following, data linking is intended in its broader meaning (i.e. links with any predicate) 12copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 13. DATA LINKING: SOME DEFINITIONS ā€¢ Resources R is the set of all resources (and literals), whenever possible also described by the respective types. More speciļ¬cally: R = Rs āˆŖ Ro, where Rs is the set of resources that can take the role of subject in a triple and Ro is the set of resources that can take the role of object in a triple; as said above the two sets are not necessarily disjoint, i.e. it can happen that Rs āˆ© Ro ā‰  āˆ…. ā€¢ Predicates P is the set of all predicates, whenever possible also described by the respective domain and range. ā€¢ Links L is the set of all links; since links are triples created between resources and predicates it is: L āŠ‚ Rs Ɨ P Ɨ Ro; each link is deļ¬ned as l = (rs,p,ro) āˆˆ L with rs āˆˆ Rs, p āˆˆ P, ro āˆˆ Ro. L is usually smaller than the full Cartesian product of Rs, P, Ro, because in each link (rs,p,ro) it must be true that rs āˆˆ domain(p) and ro āˆˆ range(p). ā€¢ Link scores Ļƒ is the score of a link, i.e. a value indicating the conļ¬dence on the truth value of the link; usually Ļƒ āˆˆ [0,1]; each link l āˆˆ L can have an associated score. 13copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 14. CASES OF DATA LINKING ā€¢ Link creation: a link l is created: given R = Rs āˆŖ Ro and P, the link l = (rs,p,ro), with rs āˆˆ Rs, p āˆˆ P, ro āˆˆ Ro is created and added to L ā€¢ e.g., music classiļ¬cation: assign one or more music styles to audio tracks by creating the link (track,genre,style) ā€¢ Link ranking: given the set of links L, a score Ļƒ āˆˆ [0,1] is assigned to each link l. The score represents the probability of the link to be recognized as true. Links can be ordered on the basis of their score Ļƒ, thus obtaining a ranking ā€¢ e.g., ranking photos depicting a speciļ¬c person (an actor, a singer, a politician) to identify the pictures in which the person is more recognizable or more clearly depicted ā€¢ Link validation: given the set of links L, a score Ļƒ āˆˆ [0,1] is assigned to each link l. The score represents the actual truth value of the link. A threshold t āˆˆ [0,1] is set so that all links with score Ļƒ ā‰„ t are considered true ā€¢ e.g., assessing the correct music style identiļ¬cation in audio tracks (music classification) 14copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 15. from ideation to business value 15 3. HUMAN COMPUTATION & GAMES WITH A PURPOSE What goals can humans help machines to achieve? How to involve a crowd of persons? What extrinsic rewards (money, prizes, etc.) or intrinsic incentives can we adopt to motivate people? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 16. HUMAN COMPUTATION ā€¢ Human Computation is a computer science technique in which a computational process is performed by outsourcing certain steps to humans. Unlike traditional computation, in which a human delegates a task to a computer, in Human Computation the computer asks a person or a large group of people to solve a problem; then it collects, interprets and integrates their solutions ā€¢ The original concept of Human Computation by its inventor Luis von Ahn derived from the common sense observation that people are intrinsically very good at solving some kinds of tasks which are, on the other hand, very hard to address for a computer; this is the case of a number of targets of Artiļ¬cial Intelligence (like image recognition or natural language understanding) for which research is still open 16copyright Ā© 2018 Cefriel ā€“ All rights reserved Edith Law and Luis von Ahn. Human computation. Synthesis Lectures on Artiļ¬cial Intelligence and Machine Learning, 2011
  • 17. HUMAN COMPUTATION 17copyright Ā© 2018 Cefriel ā€“ All rights reserved Problem: an Artificial Intelligence algorithm is unable to achieve an adequate result with a satisfactory level of confidence Solution: ask people to intervene when the AI system fails, ā€œmaskingā€ the task within another human process Example: https://www.google.com/recaptcha/
  • 18. CROWDSOURCING ā€¢ Crowdsourcing is the process to outsource tasks to a ā€œcrowdā€ of distributed people. The possibility to exploit the Internet as vehicle to recruit contributors and to assign tasks led to the rise of micro-work platforms, thus often (but not always) implying a monetary reward. The term Crowdsourcing, although quite recent, is used to indicate a wide range of practices; however, the most common meaning of Crowdsourcing implies that the ā€œcrowdā€ of workers involved in the solution of tasks is different from the traditional or intended groups of task solvers 18copyright Ā© 2018 Cefriel ā€“ All rights reserved Jeff Howe. Crowdsourcing: How the power of the crowd is driving the future of business. Random House, 2008
  • 19. CROWDSOURCING 19copyright Ā© 2018 Cefriel ā€“ All rights reserved Problem: a company needs to execute a lot of simple tasks, but cannot afford hiring a person to do that job Solution: pack tasks in bunches (human intelligence tasks or HITs) and outsource them to a very cheap workforce through an online platform Example: https://www.mturk.com/
  • 20. CITIZEN SCIENCE ā€¢ Citizen Science is the involvement of volunteers to collect or process data as part of a scientiļ¬c or research experiment; those volunteers can be the scientists and researchers themselves, but more often the name of this discipline ā€œimplies a form of science developed and enacted by citizensā€ including those ā€œoutside of formal scientiļ¬c institutionsā€, thus representing a form of public participation to science. Formally, Citizen Science has been deļ¬ned as ā€œthe systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basisā€. 20copyright Ā© 2018 Cefriel ā€“ All rights reserved Alan Irwin. Citizen science: A study of people, expertise and sustainable development. Psychology Press, 1995
  • 21. CITIZEN SCIENCE 21copyright Ā© 2018 Cefriel ā€“ All rights reserved Example: https://www.zooniverse.org/ Problem: a scientific experiment requires the execution of a lot of simple tasks, but researchers are busy Solution: engage the general audience in solving those tasks, explaining that they are contributing to science, research and the public good
  • 22. SPOT THE DIFFERENCEā€¦ ā€¢ Similarities: ā€¢ Involvement of people ā€¢ No automatic replacement ā€¢ Variations: ā€¢ Motivation ā€¢ Reward (glory, money, passion/need) ā€¢ Hybrids or parallel! 22copyright Ā© 2018 Cefriel ā€“ All rights reserved Citizen Science Crowdsourcing Human Computation
  • 23. GAMES WITH A PURPOSE ā€¢ A GWAP lets to outsource to humans some steps of a computational process in an entertaining way ā€¢ The application has a ā€œcollateral effectā€, because playersā€™ actions are exploited to solve a hidden task ā€¢ The application *IS* a fully-fledged game (opposed to gamification, which is the use of game-like features in non-gaming environments) ā€¢ The players are (usually) unaware of the hidden purpose, they simply meet game challenges 23copyright Ā© 2018 Cefriel ā€“ All rights reserved Luis Von Ahn. Games with a purpose. Computer, 39(6):92ā€“94, 2006 Luis Von Ahn and Laura Dabbish. Designing games with a purpose. Communications of the ACM, 51(8):58ā€“67, 2008
  • 24. GAMES WITH A PURPOSE (GWAP) 24copyright Ā© 2018 Cefriel ā€“ All rights reserved Problem: itā€™s the same of Human Computation (ask humans when AI fails) Solution: Solution: hide the task within a game, so that users are motivated by game challenges, often remaining unaware of the hidden purpose, task solution comes from agreement between players
  • 25. from ideation to business value 25 4. GWAPS FOR DATA LINKING Can we embed data linking tasks within Games with a Purpose? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 26. 26 ā€¢ Input: set of all links <asset> foaf:depiction <photo> ā€¢ Goal: assign score šœŽ to rank links on their recognisability/representa- tiveness ā€¢ The score šœŽ is a function of š‘‹ š‘ where š‘‹ is the no. of successes (=recognitions) and š‘ the no. of trials of the Bernoulli process (guess or not guess) realized by the game ā€¢ Cultural heritage assets in Milano and their pictures LINK RANKING copyright Ā© 2018 Cefriel ā€“ All rights reserved http://bit.ly/indomilando Pure GWAP with hidden purpose Points, badges, leaderboard as intrinsic reward Link ranking is a result of the ā€œagreementā€ between players But also an educational ā€œcollateral effectā€ Irene Celino, Andrea Fiano, Riccardo Fino. Analysis of a Cultural Heritage Game with a Purpose with an Educational Incentive. 16th International Conference on Web Engineering, 2016
  • 27. 27 ā€¢ Input: set of links <land-area> clc:hasLandCover <land-cover> ā€¢ Goal: assign score šœŽ to each link to discover the ā€œrightā€ land cover class ā€¢ Score šœŽ of each link is updated on the basis of playersā€™ choices (incremented if link selected, decremented if link not selected) ā€¢ When the score of a link overcomes the threshold šœŽ ā‰„ š‘” , the link is considered ā€œtrueā€ (and removed from the game) ā€¢ Two automatic classifications in disagreement: <land-cover-assigned-by-DUSAF> ā‰  <land-cover-assigned-by-GL30> LINK VALIDATION copyright Ā© 2018 Cefriel ā€“ All rights reserved https://youtu.be/Q0ru1hhDM9Q http://bit.ly/foss4game Pure GWAP with not-so-hidden purpose (played by ā€œexpertsā€) Points, badges, leaderboard as intrinsic reward A player scores if he/she guess one of the two disagreeing classifications Link validation is a result of the ā€œagreementā€ between players Maria Antonia Brovelli, Irene Celino, Andrea Fiano, Monia Elisa Molinari, Vijaycharan Venkatachalam. A crowdsourcing-based game for land cover validation. Applied Geomatics, 2017
  • 28. 28 ā€¢ Input: set of subject resources (pictures) and object resources (classification categories) ā€¢ Goal: create links <picture> hasCategory <category> and assign score šœŽ to each link ā€¢ Score šœŽ of each link is updated on the basis of playersā€™ choices (incremented if link selected) ā€¢ When the score of a link overcomes the threshold šœŽ ā‰„ š‘” , the link is considered ā€œtrueā€ (and the picture is removed from the game) ā€¢ Identify pictures of cities from above between those taken on board of the ISS (the pictures are used then in a scientific process in light pollution research) LINK COLLECTION & VALIDATION copyright Ā© 2018 Cefriel ā€“ All rights reserved http://nightknights.eu Pure GWAP with not-so-hidden purpose (but played by anybody) Points, badges, leaderboard as intrinsic reward A player scores if he/she agrees with another player ā€œBonusā€ intrinsic reward with NASA pictures! Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
  • 29. from ideation to business value 29 5. TRUTH INFERENCE & OPEN SCIENCE How do we aggregate the contributions from the crowd? Are individual contribution of any value? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 30. AGGREGATION OF CONTRIBUTIONS ā€¢ The same task is usually given to multiple human contributors (named workers in crowdsourcing) ā€¢ Results on the same task are then aggregated across different contributors (ā€œwisdom of crowdsā€) ā€¢ How to perform the truth inference process? ā€¢ Simplistic solution: majority voting across all contributors ā€¢ Butā€¦ are all contributors ā€œcreated equalā€? No! Less simplistic solutions: ā€¢ Majority voting across ā€œqualityā€ contributors (filtering out ā€œspammersā€) ā€¢ Weighted majority voting with estimation of contributors ā€œreliabilityā€ ā€¢ Expectation maximization ā€¢ Message passingā€¦ and a lot more! ā€¢ How to compute contributor reliability? ā€¢ Assessment tasks (gold standard) with known solution to measure reliability ā€¢ History of contributions/past behaviours to compute a ā€œreputationā€ value 30copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 31. TRUTH INFERENCE GENERIC ALGORITHM 31copyright Ā© 2018 Cefriel ā€“ All rights reserved Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, Reynold Cheng. Truth Inference in Crowdsourcing: Is the Problem Solved? VLDB 2017 Input: contributions Output: truth and reliability Step 2: compute an estimation of contributor reliability (e.g. precision on truth estimation) Step 1: compute an estimation of the truth (e.g. majority voting) Iterate until convergence (e.g. until some difference w.r.t. previous step is really small)
  • 32. OPEN SCIENCE: ENABLING COMPARE & CONTRAST ā€¢ Open Science has the aim to make scientific research and data accessible to all levels of society ā€¢ Repeatability and reproducibility are among the foundational principles of open science ā€¢ Human Computation aims at involving people in some step of the scientific process ā€¢ Human contributors generate data to solve assigned tasks ā€¢ Algorithms aggregate contributions in the truth inference process ā€¢ Can we compare different truth inference algorithms? ā€¢ Yes, if we make available the data of the Human Computation process! ā€¢ What can we share, e.g. in the case of data linking tasks? ā€¢ ā€œTrueā€ and ā€œfalseā€ links ā€¢ Confidence scores of the links ā€¢ Individual contributions and aggregation process 32copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 33. PROV-O AND HUMAN COMPUTATION ONTOLOGY ā€¢ Provenance is information about entities, activities, and people involved in producing a piece of data or thing (used to assess its quality, reliability or trustworthiness) ā€¢ W3C defined the PROV-O ontology to capture provenance information https://www.w3.org/TR/prov-o/ ā€¢ The Human Computation ontology extends PROV-O to describe the data shared within a Human Computation Process http://swa.cefriel.it/ontologies/hc ā€¢ Data linking process information can be published according to linked data principles described with the HC ontology (e.g. data from the Urbanopoly GWAP at http://swa.cefriel.it/linkeddata/) 33copyright Ā© 2018 Cefriel ā€“ All rights reserved aggregatedFrom Contributor Contribution Human Computation Task provo:Agent provo:Entity provo:Activity Consolidated Information solvedBy enabledBy contributionFrom solutionTo aggregatedBy Human Computation Algorithm Irene Celino. Human Computation VGI Provenance: Semantic Web-based Representation and Publishing. IEEE Transactions on Geoscience and Remote Sensing, 2013
  • 34. from ideation to business value 34 6. GUIDELINES Is it that easy to involve people on the Web? What should we care of when designing a human computation system? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 35. MICE AND MEN (OR: KEEP IT SIMPLE) ā€¢ Crowdsourcing workers behave like mice ā€¢ Mice prefer to use their motor skills (biologically cheap, e.g. pressing a lever to get food) rather than their cognitive skills (biologically expensive, e.g. going through a labyrinth to get food) ā€¢ Workers prefer/are better at simple tasks (e.g. those that can be solved at first sight) and discard/are worse at more complex tasks (e.g. those that require logics) ā€¢ Crowdsourcing tasks should be carefully designed ā€¢ Tasks as simple as possible for the workers to solve ā€¢ Complex tasks together with other incentives (e.g. variety/novelty) 35copyright Ā© 2018 Cefriel ā€“ All rights reserved Panos Ipeirotis. On Mice and Men: The Role of Biology in Crowdsourcing, Keynote talk at Collective Intelligence, 2012.
  • 36. DIVIDE ET IMPERA (OR: FIND-FIX-VERIFY) ā€¢ Find-Fix-Verify crowd programming pattern ā€¢ A long and ā€œexpensiveā€ taskā€¦ ā€¢ Summarize a text to shorten its total length ā€¢ ā€¦is decomposed in more atomic tasksā€¦ 1. find sentences that need to be shortened 2. fix a sentence by shortening it 3. verify which summarized sentence maintains original meaning ā€¢ ā€¦and the complex task is turned into a workflow of simple tasks, and each step is outsourced to a crowd 36copyright Ā© 2018 Cefriel ā€“ All rights reserved M. Bernstein, G. Little, R. Miller, B. Hartmann, M. Ackerman, D. Karger, D. Crowell, K. Panovich. Soylent: A Word Processor with a Crowd Inside, UIST Proceedings, 2010.
  • 37. COMPARE AND CONTRAST ā€¢ A sort of ā€œwisdom of the crowd(sourcing methods)ā€: (1) apply different approaches to solve the same problem and (2) compare results ā€¢ Which is the best approach for a specific use case? ā€¢ Which is the most suitable crowd? ā€¢ Is human computation better/faster/cheaper than machine computation? ā€¢ Knowledge Graph Refinement: use Human Computation to ā€œcrowdsourceā€ a gold standard and then use it to train some statistical/machine learning algorithm 37copyright Ā© 2018 Cefriel ā€“ All rights reserved input task output solution Human Computation Machine Computation input task output solution Human Computation Machine Computation input task output solution Machine Computation Human Computation input task output solution Machine Computation Human Computation Human Computation Gloria Re Calegari, Gioele Nasi, Irene Celino. Human Computation vs. Machine Learning: an Experimental Comparison for Image Classification. Human Computation Journal, 2018.
  • 38. FINAL NOTE ON DISAGREEMENT ā€¢ Is there always a ā€œright answerā€? Or is there a ā€œcrowd truthā€? ā€¢ Not always true/false, because of human subjectivity, ambiguity and uncertainty ā€¢ Disagreement across contributors is not necessarily bad, but a sign of: different opinions, interpretations, contexts, perspectives, ā€¦ ā€¢ Remember the long tail theoryā€¦ ā€¢ ā€¦and ask yourself who are your users and who you want to involve 38copyright Ā© 2018 Cefriel ā€“ All rights reserved Lora Aroyo, Chris Welty. Truth is a Lie: 7 Myths about Human Annotation. AI Magazine 2014.
  • 39. from ideation to business value 39 7. INDIRECT PEOPLE INVOLVEMENT Are there indirect ways to involve humans in data processing? copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 40. HUMANS AS A SOURCE OF INFORMATION ā€¢ People are not only task executors, they are also information providers! ā€¢ Opportunistic sensing ā€¢ Voluntary or involuntary digital traces of human-related activities ā€¢ e.g., phone call logs, GPS traces, social media activities ā€¢ Open content and cooperative knowledge ā€¢ Data explicitly provided by people can ā€œhideā€ further information ā€¢ e.g., logs of wiki editing, statistical distribution of contributes 40copyright Ā© 2018 Cefriel ā€“ All rights reserved
  • 41. FROM POI INFORMATION AND PHONE CALL LOGS TO LAND USE ā€¢ General topic: exploit ā€œlow-costā€ information about a geographic area as features to train a predictive model that outputs ā€œexpensiveā€ information about the same area ā€¢ ā€œInexpensiveā€ input information: ā€¢ Geo-information about points of interests ā€¢ Mobile traffic data processed using different time series techniques ā€“ smoothing, decomposition, ļ¬ltering, time-windowing ā€¢ ā€œExpensiveā€ output information: ā€¢ Land use characterization (usually collected through long and expensive workflows that mix machine processing and costly human labour) 41copyright Ā© 2018 Cefriel ā€“ All rights reserved Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Extracting Urban Land Use from Linked Open Geospatial Data. IJGI, 2015 Gloria Re Calegari, Emanuela Carlino, Diego Peroni, Irene Celino. Filtering and Windowing Mobile Traffic Time Series for Territorial Land Use Classification. COMCOM, 2016
  • 42. FROM SPATIAL ANALYTICS TO GEO-ONTOLOGY ENGINEERING ā€¢ OpenStreetMap collects information about points of interest (POI) ā€¢ Spatial distribution and conglomeration of specific POIs can give hints about the geographical space ā€¢ Re-engineering of spatial features through comparison between areas: same POI type shows different distribution ļƒ  evidence for different semantics (e.g. what is a pub in Milano vs. London) ā€¢ Semantic specification of spatial neighbourhoods: ā€¢ Emerging neighbourhoods from spatial clustering of POIs (opposed to administrative divisions) ā€¢ Spatial version of tf-idf to compare between different areas (e.g. central or peripheral areas in different cities) and to characterise neighbourhoods (e.g. shopping district) 42copyright Ā© 2018 Cefriel ā€“ All rights reserved Gloria Re Calegari, Emanuela Carlino, Irene Celino, Diego Peroni. Supporting Geo-Ontology Engineering through Spatial Data Analytics. 13th Extended Semantic Web Conference, 2016
  • 43. MILANO viale Sarca 226, 20126, Milano - Italy LONDON 4th floor 57 Rathbone Place London W1T 1JU ā€“ UK NEW YORK One Liberty Plaza, 165 Broadway, 23rd Floor, New York City, New York, 10006 USA Cefriel.com Thanks for your attention! Any question? Irene Celino Knowledge Technologies Digital Interaction Division irene.celino@cefriel.com