SlideShare a Scribd company logo
1 of 19
Machine-Crowd Annotation Workflow
for Event Understanding
across Collections & Domains
Oana Inel Extended Semantic Web Conference
PhD Symposium
May 30th 2016
Too much information ...
e.g., if you are interested in the topic of “whaling”
2
… and after a while it all looks the same
it is difficult to form a global picture on a topic
3
… thus, content without context is difficult to process
events can help create context around content
4
…, but events are not easy to deal with
• Events are vague
• Event semantics are difficult
• Events can be viewed and interpreted from multiple perspectives and interpretations
e.g. of participants interpretation: The mayor of the city called the celebration a success.
• Events can be presented at different levels of granularities
e.g. of spatial disagreement: The celebration took place in every city in the Netherlands.
• People are not consistent in the way they talk about or use events
e.g.: The celebration took place last week, fireworks shows were held everywhere.
5
… a lot of ground truth is needed to learn event specifics
• Traditional ground truth collection doesn’t scale:
• there is not really ‘one type of experts’ when it comes to events
• the annotation guidelines for events are difficult to define
• the annotation of events can be a tedious process
• all of the above can result in high inter-annotator disagreement
• Crowdsourcing could be an alternative
• but is still not a robust & replicable approach
6
… let’s look at some examples
According to department policy prosecutors must make
a strong showing that lawyers' fees came from assets
tainted by illegal profits before any attempts at seizure
are made.
The unit makes intravenous pumps used by hospitals
and had more than $110 million in sales last year
according to Advanced Medical.
7
… here is what experts annotate on these sentences
[According] to department policy prosecutors must make
a strong [showing] that lawyers' fees [came] from assets
tainted by illegal profits before any [attempts] at [seizure]
are [made].
The unit makes intravenous pumps used by hospitals
and [had] more than $110 million in [sales] last year
according to Advanced Medical.
8
… here is what the crowd annotates on them
According to department policy prosecutors must make
a [strong [showing]] that lawyers' fees [[came] from
assets] [tainted] by illegal profits before any [attempts] at
[seizure] are [made].
The unit [makes] intravenous pumps [used] by hospitals
and [[had] more than $110 million in [sales]] last year
according to Advanced Medical.
9
… here is what the machines can detect
According to department policy prosecutors must [make]
a strong showing that lawyers' fees [came] from assets
[tainted] by illegal profits before any attempts at seizure
are made.
The unit [makes] intravenous pumps [used] by hospitals
and [had] more than $110 million in sales last year
according to Advanced Medical.
10
Research Questions
• Can crowdsourcing help in improving event detection?
• Can we provide reliable crowdsourced training data?
• Can we optimize the crowdsourcing process by using results from
NLP tools?
• Can we achieve a replicable data collection process across different
data types and use cases?
11
Current Hypothesis:
Disagreement-based approach to crowdsource ground truth
is reliable and produces quality results
12
Preliminary Results - Crowd vs. Experts
● 200 news snippets from TimeBank● 3019 tweets published in 2014
● potential relevant tweets for events such as ‘whaling’,
‘Davos 2014’ among others
CrowdTruth approach outperforms the-state-of-the-art
crowdsourcing approaches such as single annotator and
majority vote
The crowd performs almost as good as the experts due to
very linguistic-specialized guidelines for expert annotators13
Current Hypothesis:
Disagreement-based approach to crowdsource ground truth
can be optimised by using results from NLP tools
15
Preliminary Results - Hybrid Workflow
ENTITY EXTRACTION
EVENTS CROWDSOURCING AND
LINKING TO CONCEPTS
SEGMENTATION & KEYFRAMES
LINKING EVENTS AND
CONCEPTS TO KEYFRAMES
diveplus.beeldengeluid.nl
16
Preliminary Results - Hybrid Workflow Outcome
17diveplus.beeldengeluid.nl
Approach: Disagreement is Signal
Principles for disagreement-based
crowdsourcing
• Do not enforce agreement
• Capture a multitude of views
• Take advantage of existing
tools, reuse their functionality
This results in teaching machines to reason in
the disagreement space
18
Overall Methodology
1. Instantiate the research methodology with specific data, domain
• Video synopsis, news
2. Identify state-of-the-art IE approaches that can be used
• NER tools for identifying events and their participating entities in the video synopsis
3. Evaluate IE approaches and identify their drawbacks
• Poor performance in extracting events
4. Combine IE with crowdsourcing tasks in a complementary way
• Use crowdsourcing for identifying the events and linking them with their participating entities
5. Evaluate crowdsourcing results with CrowdTruth disagreement-first approach
• Evaluate the input unit, the workers and the annotations
6. Instantiate the same workflow with different data and/or different domain
• Tweets, Twitter
7. Perform cross-domain analysis
• Event extraction in video synopsis vs. event extraction in tweets 19
Project Websites
http://CrowdTruth.org
http://diveproject.beeldengeluid.nl
Tools & Code
http://dev.CrowdTruth.org
http://github.com/CrowdTruth
http://diveplus.beeldengeluid.nl
Data
http://data.crowdtruth.org
http://data.dive.beeldengeluid.nl
20

More Related Content

Viewers also liked

Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)Lora Aroyo
 
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 Lora Aroyo
 
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Lora Aroyo
 
Towards Better Media Understanding and Searchability
Towards Better Media Understanding and SearchabilityTowards Better Media Understanding and Searchability
Towards Better Media Understanding and Searchabilityoanainel
 
Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?CrowdTruth
 
Visualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataVisualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataCrowdTruth
 
Crowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain QuestionsCrowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain QuestionsBenjamin Timmermans
 
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...CrowdTruth
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
 
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Lora Aroyo
 
Dive+@ICTOpen2017
Dive+@ICTOpen2017Dive+@ICTOpen2017
Dive+@ICTOpen2017oanainel
 
Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015CrowdTruth
 
CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015Lora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Harnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event ExtractionHarnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event Extractionoanainel
 
DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation Victor de Boer
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
Truth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human AnnotationTruth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human AnnotationAnca Dumitrache
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishingTobias Kuhn
 

Viewers also liked (20)

Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Lora)
 
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014 (Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
(Presentation Chris) Crowdsourcing & Semantic Web: Dagstuhl 2014
 
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
Truth is a Lie: Rules & Semantics from Crowd Perspectives (RR'2015 Keynote)
 
Towards Better Media Understanding and Searchability
Towards Better Media Understanding and SearchabilityTowards Better Media Understanding and Searchability
Towards Better Media Understanding and Searchability
 
Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?Gamification of crowdsourcing tasks: What motivates a medical expert?
Gamification of crowdsourcing tasks: What motivates a medical expert?
 
Visualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing DataVisualization of Disagreement-based Quality Metrics of Crowdsourcing Data
Visualization of Disagreement-based Quality Metrics of Crowdsourcing Data
 
Crowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain QuestionsCrowdsourcing Disagreement on Open-Domain Questions
Crowdsourcing Disagreement on Open-Domain Questions
 
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
Utilizing Social Health Websites for Cognitive Computing and Clinical Decisio...
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
 
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
 
Dive+@ICTOpen2017
Dive+@ICTOpen2017Dive+@ICTOpen2017
Dive+@ICTOpen2017
 
Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015Dive+ NL eScience symposium 2015
Dive+ NL eScience symposium 2015
 
CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Harnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event ExtractionHarnessing the Power of Machines & Crowds for Event Extraction
Harnessing the Power of Machines & Crowds for Event Extraction
 
Kick-off meeting Linkflows project
Kick-off meeting Linkflows projectKick-off meeting Linkflows project
Kick-off meeting Linkflows project
 
DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation DIVE Semantic Web Challenge Presentation
DIVE Semantic Web Challenge Presentation
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
Truth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human AnnotationTruth is a Lie - 7 Myths of Human Annotation
Truth is a Lie - 7 Myths of Human Annotation
 
Genuine semantic publishing
Genuine semantic publishingGenuine semantic publishing
Genuine semantic publishing
 

Similar to ESWC - PhD Symposium 2016

W4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformW4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformOpen Knowledge Belgium
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachLive Union
 
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...Claire Ingram Bogusz
 
How Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event PortfolioHow Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event PortfolioBear Analytics
 
Queuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer CollaborativeQueuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer CollaborativeDave Norton
 
Intro For Informative Essay
Intro For Informative EssayIntro For Informative Essay
Intro For Informative EssayLisa Johnson
 
Essay Radiology Career
Essay Radiology CareerEssay Radiology Career
Essay Radiology CareerAmy Williams
 
Accountability in Action - Step Seven
Accountability in Action - Step SevenAccountability in Action - Step Seven
Accountability in Action - Step Seventincancollective
 
Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014Shantel Jervey
 
10ictprojectforsocialchange
10ictprojectforsocialchange10ictprojectforsocialchange
10ictprojectforsocialchangeYoonaIm6
 
Crowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMsCrowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMsOlaf Janssen
 
ICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment TechnologiesICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment TechnologiesMark Jhon Oxillo
 
Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.Lisa Richardson
 
Prospecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners GuideProspecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners GuideBen Rymer
 
Personal Data and Trust Network inaugural Event 11 march 2015 - record
Personal Data and Trust Network inaugural Event   11 march 2015 - recordPersonal Data and Trust Network inaugural Event   11 march 2015 - record
Personal Data and Trust Network inaugural Event 11 march 2015 - recordDigital Catapult
 
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdam
Speech Maarten Brouwer at  Open Data for Development Camp, May 2011,  AmsterdamSpeech Maarten Brouwer at  Open Data for Development Camp, May 2011,  Amsterdam
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdamopenforchange
 
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...gjhouben
 

Similar to ESWC - PhD Symposium 2016 (20)

W4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platformW4P-Launch - Open Source Crowdsourcing platform
W4P-Launch - Open Source Crowdsourcing platform
 
Where to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approachWhere to focus event innovation? - An audience led approach
Where to focus event innovation? - An audience led approach
 
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
CROWDFUNDING IN ACTION: HOW INSTITUTIONAL LOGICS ENCOURAGE AND CONSTRAIN AFFO...
 
How Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event PortfolioHow Customer Intelligence Will Future Proof Your Event Portfolio
How Customer Intelligence Will Future Proof Your Event Portfolio
 
Audience Lessons
Audience LessonsAudience Lessons
Audience Lessons
 
Matchbox presentation
Matchbox presentation Matchbox presentation
Matchbox presentation
 
Queuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer CollaborativeQueuing and The Age of Context: Release 1 The Digital Consumer Collaborative
Queuing and The Age of Context: Release 1 The Digital Consumer Collaborative
 
Intro For Informative Essay
Intro For Informative EssayIntro For Informative Essay
Intro For Informative Essay
 
Essay Radiology Career
Essay Radiology CareerEssay Radiology Career
Essay Radiology Career
 
Accountability in Action - Step Seven
Accountability in Action - Step SevenAccountability in Action - Step Seven
Accountability in Action - Step Seven
 
Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014Essay On Current Affairs Of Pakistan 2014
Essay On Current Affairs Of Pakistan 2014
 
10ictprojectforsocialchange
10ictprojectforsocialchange10ictprojectforsocialchange
10ictprojectforsocialchange
 
Crowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMsCrowdsourcing 101 for GLAMs
Crowdsourcing 101 for GLAMs
 
ICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment TechnologiesICT Project for Social Change - Empowerment Technologies
ICT Project for Social Change - Empowerment Technologies
 
Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.Bad Effects Of Smoking Short Essay. Online assignment writing service.
Bad Effects Of Smoking Short Essay. Online assignment writing service.
 
EIA2016 Turin - Alberto Giusti. Crowdfunding
EIA2016 Turin - Alberto Giusti.  CrowdfundingEIA2016 Turin - Alberto Giusti.  Crowdfunding
EIA2016 Turin - Alberto Giusti. Crowdfunding
 
Prospecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners GuideProspecting & Screening: A Beginners Guide
Prospecting & Screening: A Beginners Guide
 
Personal Data and Trust Network inaugural Event 11 march 2015 - record
Personal Data and Trust Network inaugural Event   11 march 2015 - recordPersonal Data and Trust Network inaugural Event   11 march 2015 - record
Personal Data and Trust Network inaugural Event 11 march 2015 - record
 
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdam
Speech Maarten Brouwer at  Open Data for Development Camp, May 2011,  AmsterdamSpeech Maarten Brouwer at  Open Data for Development Camp, May 2011,  Amsterdam
Speech Maarten Brouwer at Open Data for Development Camp, May 2011, Amsterdam
 
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
 

Recently uploaded

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

ESWC - PhD Symposium 2016

  • 1. Machine-Crowd Annotation Workflow for Event Understanding across Collections & Domains Oana Inel Extended Semantic Web Conference PhD Symposium May 30th 2016
  • 2. Too much information ... e.g., if you are interested in the topic of “whaling” 2
  • 3. … and after a while it all looks the same it is difficult to form a global picture on a topic 3
  • 4. … thus, content without context is difficult to process events can help create context around content 4
  • 5. …, but events are not easy to deal with • Events are vague • Event semantics are difficult • Events can be viewed and interpreted from multiple perspectives and interpretations e.g. of participants interpretation: The mayor of the city called the celebration a success. • Events can be presented at different levels of granularities e.g. of spatial disagreement: The celebration took place in every city in the Netherlands. • People are not consistent in the way they talk about or use events e.g.: The celebration took place last week, fireworks shows were held everywhere. 5
  • 6. … a lot of ground truth is needed to learn event specifics • Traditional ground truth collection doesn’t scale: • there is not really ‘one type of experts’ when it comes to events • the annotation guidelines for events are difficult to define • the annotation of events can be a tedious process • all of the above can result in high inter-annotator disagreement • Crowdsourcing could be an alternative • but is still not a robust & replicable approach 6
  • 7. … let’s look at some examples According to department policy prosecutors must make a strong showing that lawyers' fees came from assets tainted by illegal profits before any attempts at seizure are made. The unit makes intravenous pumps used by hospitals and had more than $110 million in sales last year according to Advanced Medical. 7
  • 8. … here is what experts annotate on these sentences [According] to department policy prosecutors must make a strong [showing] that lawyers' fees [came] from assets tainted by illegal profits before any [attempts] at [seizure] are [made]. The unit makes intravenous pumps used by hospitals and [had] more than $110 million in [sales] last year according to Advanced Medical. 8
  • 9. … here is what the crowd annotates on them According to department policy prosecutors must make a [strong [showing]] that lawyers' fees [[came] from assets] [tainted] by illegal profits before any [attempts] at [seizure] are [made]. The unit [makes] intravenous pumps [used] by hospitals and [[had] more than $110 million in [sales]] last year according to Advanced Medical. 9
  • 10. … here is what the machines can detect According to department policy prosecutors must [make] a strong showing that lawyers' fees [came] from assets [tainted] by illegal profits before any attempts at seizure are made. The unit [makes] intravenous pumps [used] by hospitals and [had] more than $110 million in sales last year according to Advanced Medical. 10
  • 11. Research Questions • Can crowdsourcing help in improving event detection? • Can we provide reliable crowdsourced training data? • Can we optimize the crowdsourcing process by using results from NLP tools? • Can we achieve a replicable data collection process across different data types and use cases? 11
  • 12. Current Hypothesis: Disagreement-based approach to crowdsource ground truth is reliable and produces quality results 12
  • 13. Preliminary Results - Crowd vs. Experts ● 200 news snippets from TimeBank● 3019 tweets published in 2014 ● potential relevant tweets for events such as ‘whaling’, ‘Davos 2014’ among others CrowdTruth approach outperforms the-state-of-the-art crowdsourcing approaches such as single annotator and majority vote The crowd performs almost as good as the experts due to very linguistic-specialized guidelines for expert annotators13
  • 14. Current Hypothesis: Disagreement-based approach to crowdsource ground truth can be optimised by using results from NLP tools 15
  • 15. Preliminary Results - Hybrid Workflow ENTITY EXTRACTION EVENTS CROWDSOURCING AND LINKING TO CONCEPTS SEGMENTATION & KEYFRAMES LINKING EVENTS AND CONCEPTS TO KEYFRAMES diveplus.beeldengeluid.nl 16
  • 16. Preliminary Results - Hybrid Workflow Outcome 17diveplus.beeldengeluid.nl
  • 17. Approach: Disagreement is Signal Principles for disagreement-based crowdsourcing • Do not enforce agreement • Capture a multitude of views • Take advantage of existing tools, reuse their functionality This results in teaching machines to reason in the disagreement space 18
  • 18. Overall Methodology 1. Instantiate the research methodology with specific data, domain • Video synopsis, news 2. Identify state-of-the-art IE approaches that can be used • NER tools for identifying events and their participating entities in the video synopsis 3. Evaluate IE approaches and identify their drawbacks • Poor performance in extracting events 4. Combine IE with crowdsourcing tasks in a complementary way • Use crowdsourcing for identifying the events and linking them with their participating entities 5. Evaluate crowdsourcing results with CrowdTruth disagreement-first approach • Evaluate the input unit, the workers and the annotations 6. Instantiate the same workflow with different data and/or different domain • Tweets, Twitter 7. Perform cross-domain analysis • Event extraction in video synopsis vs. event extraction in tweets 19
  • 19. Project Websites http://CrowdTruth.org http://diveproject.beeldengeluid.nl Tools & Code http://dev.CrowdTruth.org http://github.com/CrowdTruth http://diveplus.beeldengeluid.nl Data http://data.crowdtruth.org http://data.dive.beeldengeluid.nl 20

Editor's Notes

  1. Massive amount of information One of the main characteristics of today is the massive, even overwhelming amount of information around us Just think at all the videos, images and the infinite amount of web pages, tweets that you get as search results when you want to learn about a topic
  2. However, this unconceivable amount of information starts to ‘look all the same’ to the users and they are not able to properly consume the information and get an overview of the topic
  3. and this happens because content without context is difficult to process. but, events can help create context around content
  4. Experts can be inconsistent - despite the traditional believe that they are always right
  5. The crowd overlaps with the experts in proportion of 88%, i.e. it detects almost the same events as the experts But the added value is that crowd finds even more events and it is more specific Another point is that the crowd seems to be more consistent :-)
  6. And how little the machines are able to detect from this - so they need to learn more, thus more training data is needed for them
  7. majority vote - the answer that was picked by the majority of the workers and all the answers that were picked by at least half of the total number of workers single - randomly sampled from the set of workers annotating it; to show that having more annotators generates better quality data. CT scores consistently above the majority vote and single annotator and its performance is also comparable to that of domain experts. The crowdsourcing task where workers choose annotations from a fixed number of options perform better at higher thresholds, e.g. (Twitter event extraction). Whereas open annotation tasks (event extraction) perform better when the threshold is at its lowest, thus ensuring the most diverse opinions are accounted in the resulting ground truth.
  8. Message of the results Data on which the experiments were performed
  9. Have two hypothesis for this
  10. Experts are inconsistent
  11. Automatic tools detect less; difficult to see what is the focus The crowd is much more specific than the experts The crowd overlaps a lot with the experts Experts have some difficult events Experts are not consistent
  12. Automatic tools detect less; difficult to see what is the focus The crowd is much more specific than the experts The crowd overlaps a lot with the experts Experts have some difficult events Experts are not consistent