3. … and after a while it all looks the same
it is difficult to form a global picture on a topic
3
4. … thus, content without context is difficult to process
events can help create context around content
4
5. …, but events are not easy to deal with
• Events are vague
• Event semantics are difficult
• Events can be viewed and interpreted from multiple perspectives and interpretations
e.g. of participants interpretation: The mayor of the city called the celebration a success.
• Events can be presented at different levels of granularities
e.g. of spatial disagreement: The celebration took place in every city in the Netherlands.
• People are not consistent in the way they talk about or use events
e.g.: The celebration took place last week, fireworks shows were held everywhere.
5
6. … a lot of ground truth is needed to learn event specifics
• Traditional ground truth collection doesn’t scale:
• there is not really ‘one type of experts’ when it comes to events
• the annotation guidelines for events are difficult to define
• the annotation of events can be a tedious process
• all of the above can result in high inter-annotator disagreement
• Crowdsourcing could be an alternative
• but is still not a robust & replicable approach
6
7. … let’s look at some examples
According to department policy prosecutors must make
a strong showing that lawyers' fees came from assets
tainted by illegal profits before any attempts at seizure
are made.
The unit makes intravenous pumps used by hospitals
and had more than $110 million in sales last year
according to Advanced Medical.
7
8. … here is what experts annotate on these sentences
[According] to department policy prosecutors must make
a strong [showing] that lawyers' fees [came] from assets
tainted by illegal profits before any [attempts] at [seizure]
are [made].
The unit makes intravenous pumps used by hospitals
and [had] more than $110 million in [sales] last year
according to Advanced Medical.
8
9. … here is what the crowd annotates on them
According to department policy prosecutors must make
a [strong [showing]] that lawyers' fees [[came] from
assets] [tainted] by illegal profits before any [attempts] at
[seizure] are [made].
The unit [makes] intravenous pumps [used] by hospitals
and [[had] more than $110 million in [sales]] last year
according to Advanced Medical.
9
10. … here is what the machines can detect
According to department policy prosecutors must [make]
a strong showing that lawyers' fees [came] from assets
[tainted] by illegal profits before any attempts at seizure
are made.
The unit [makes] intravenous pumps [used] by hospitals
and [had] more than $110 million in sales last year
according to Advanced Medical.
10
11. Research Questions
• Can crowdsourcing help in improving event detection?
• Can we provide reliable crowdsourced training data?
• Can we optimize the crowdsourcing process by using results from
NLP tools?
• Can we achieve a replicable data collection process across different
data types and use cases?
11
13. Preliminary Results - Crowd vs. Experts
● 200 news snippets from TimeBank● 3019 tweets published in 2014
● potential relevant tweets for events such as ‘whaling’,
‘Davos 2014’ among others
CrowdTruth approach outperforms the-state-of-the-art
crowdsourcing approaches such as single annotator and
majority vote
The crowd performs almost as good as the experts due to
very linguistic-specialized guidelines for expert annotators13
17. Approach: Disagreement is Signal
Principles for disagreement-based
crowdsourcing
• Do not enforce agreement
• Capture a multitude of views
• Take advantage of existing
tools, reuse their functionality
This results in teaching machines to reason in
the disagreement space
18
18. Overall Methodology
1. Instantiate the research methodology with specific data, domain
• Video synopsis, news
2. Identify state-of-the-art IE approaches that can be used
• NER tools for identifying events and their participating entities in the video synopsis
3. Evaluate IE approaches and identify their drawbacks
• Poor performance in extracting events
4. Combine IE with crowdsourcing tasks in a complementary way
• Use crowdsourcing for identifying the events and linking them with their participating entities
5. Evaluate crowdsourcing results with CrowdTruth disagreement-first approach
• Evaluate the input unit, the workers and the annotations
6. Instantiate the same workflow with different data and/or different domain
• Tweets, Twitter
7. Perform cross-domain analysis
• Event extraction in video synopsis vs. event extraction in tweets 19
Massive amount of information
One of the main characteristics of today is the massive, even overwhelming amount of information around us
Just think at all the videos, images and the infinite amount of web pages, tweets that you get as search results when you want to learn about a topic
However, this unconceivable amount of information starts to ‘look all the same’ to the users and they are not able to properly consume the information and get an overview of the topic
and this happens because content without context is difficult to process.
but, events can help create context around content
Experts can be inconsistent - despite the traditional believe that they are always right
The crowd overlaps with the experts in proportion of 88%, i.e. it detects almost the same events as the experts
But the added value is that crowd finds even more events and it is more specific
Another point is that the crowd seems to be more consistent :-)
And how little the machines are able to detect from this - so they need to learn more, thus more training data is needed for them
majority vote - the answer that was picked by the majority of the workers and all the answers that were picked by at least half of the total number of workers
single - randomly sampled from the set of workers annotating it; to show that having more annotators generates better quality data.
CT scores consistently above the majority vote and single annotator and its performance is also comparable to that of domain experts.
The crowdsourcing task where workers choose annotations from a fixed number of options perform better at higher thresholds, e.g. (Twitter event extraction). Whereas open annotation tasks (event extraction) perform better when the threshold is at its lowest, thus ensuring the most diverse opinions are accounted in the resulting ground truth.
Message of the results
Data on which the experiments were performed
Have two hypothesis for this
Experts are inconsistent
Automatic tools detect less; difficult to see what is the focus
The crowd is much more specific than the experts
The crowd overlaps a lot with the experts
Experts have some difficult events
Experts are not consistent
Automatic tools detect less; difficult to see what is the focus
The crowd is much more specific than the experts
The crowd overlaps a lot with the experts
Experts have some difficult events
Experts are not consistent