SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Anca Dumitrache, Lora Aroyo, Chris Welty
http://CrowdTruth.org
Measures for
Language Ambiguity
Medical Relation Extraction
Linked Data for Information Extraction @ ISWC2015
#CrowdTruth @anouk_anca @laroyo @cawelty #LD4IE2015
• Most	
  knowledge	
  is	
  in	
  text,	
  but	
  it’s	
  not	
  
structured	
  
•  Linked	
  Data	
  sources	
  are	
  a	
  good	
  start,	
  
but	
  incomplete	
  
•  Goal	
  (Distance	
  Supervision):	
  	
  
–  extract	
  LD	
  triples	
  from	
  text	
  
–  given	
  exis@ng	
  tuples	
  find	
  sentences	
  
that	
  men@on	
  both	
  args	
  
–  use	
  resul@ng	
  sentences	
  as	
  TP	
  to	
  train	
  
a	
  classifier	
  
• But	
  can	
  some8mes	
  be	
  wrong	
  
–  <PALPATION>	
  loca8on	
  <CHEST>	
  
–  feeling	
  the	
  way	
  CHEST	
  expands	
  
(PALPATION),	
  can	
  iden8fy	
  areas	
  of	
  
lung	
  that	
  are	
  full	
  of	
  fluid	
  
• Standard	
  approach	
   	
  Expert	
  Annota8on	
  
Background	
  
http://CrowdTruth.org
•  Human	
  annotators	
  with	
  domain	
  
knowledge	
  provide	
  be>er	
  
annotated	
  data,	
  e.g	
  
–  if	
  you	
  want	
  medical	
  texts	
  annotated	
  
for	
  medical	
  rela@ons	
  you	
  need	
  
medical	
  experts	
  
•  But	
  experts	
  are	
  expensive	
  &	
  
don’t	
  scale	
  
	
  
•  MulFple	
  perspecFves	
  on	
  data	
  
can	
  be	
  useful,	
  beyond	
  what	
  
experts	
  believe	
  is	
  salient	
  or	
  
correct	
  	
  
Human	
  AnnotaFon	
  
Myth:	
  
Experts	
  know	
  best	
  
What	
  if	
  the	
  CROWD	
  IS	
  BETTER?	
  
http://CrowdTruth.org
What is the relation between the highlighted terms?
	
  
He was the first physician to identify the relationship
between HEMOPHILIA	
  and HEMOPHILIC	
  ARTHROPATHY.	
  
	
  
Experts	
  Know	
  Best?	
  
Crowd	
  reads	
  text	
  literally	
  -­‐	
  provide	
  be>er	
  examples	
  to	
  machine	
  
experts:	
  cause	
  	
  
crowd:	
  no	
  relaFon	
  
hMp://CrowdTruth.org	
  	
  
Experts	
  Know	
  Best?	
  
experts	
  vs.	
  crowd?	
  
What is the (medical) relation between the
highlighted (medical) terms?
	
  
•  91% of expert annotations covered by the crowd
•  expert annotators reach agreement only in 30%
•  most popular crowd vote covers 95% of this
expert annotation agreement
	
  
hMp://CrowdTruth.org	
  	
  
•  rather	
  than	
  accep@ng	
  
disagreement	
  as	
  a	
  natural	
  
property	
  of	
  seman@c	
  
interpreta@on	
  
•  tradi@onally,	
  disagreement	
  is	
  
considered	
  a	
  measure	
  of	
  poor	
  
quality	
  in	
  the	
  annota@on	
  task	
  
because:	
  
–  task	
  is	
  poorly	
  defined	
  or	
  	
  
–  annotators	
  lack	
  training	
  
	
  
This	
  makes	
  the	
  eliminaFon	
  of	
  
disagreement	
  a	
  goal	
  
Human	
  AnnotaFon	
  
Myth:	
  
Disagreement	
  is	
  Bad	
  
What	
  if	
  it	
  is	
  GOOD?	
  
http://CrowdTruth.org
Disagreement	
  Bad?	
  
Does each sentence express the TREAT relation?
	
  
ANTIBIOTICS are the first line treatment for indications of TYPHUS.
→ agreement 95%
Patients with TYPHUS who were given ANTIBIOTICS exhibited side-
effects.
→ agreement 80%
With ANTIBIOTICS in short supply, DDT was used during WWII to control
the insect vectors of TYPHUS.
→ agreement 50%
Disagreement	
  can	
  reflect	
  the	
  degree	
  of	
  clarity	
  in	
  a	
  sentence	
  
hMp://CrowdTruth.org	
  	
  
•  Annotator disagreement is
signal, not noise.
•  It is indicative of the
variation in human
semantic interpretation of
signs
•  It can indicate ambiguity,
vagueness, similarity, over-
generality, etc,
as well as quality
CrowdTruth	
  
http://CrowdTruth.org
•  Goal:
collecting a Medical RelEx Gold
Standard
improve the performance of a
RelEx Classifier
•  Approach:
crowdsource 900 medical
sentences
measure disagreement with
CrowdTruth Metrics
train & evaluate classifier with
CrowdTruth SRS Score
CrowdTruth	
  for	
  
medical	
  relaFon	
  
extracFon	
  
http://CrowdTruth.org
RelEx	
  TASK	
  in	
  CrowdFlower	
  
PaFents	
  with	
  ACUTE	
  FEVER	
  and	
  nausea	
  could	
  be	
  suffering	
  
from	
  INFLUENZA	
  AH1N1	
  
Is	
  ACUTE	
  FEVER	
  –	
  related	
  to	
  →	
  INFLUENZA	
  AH1N1?	
  
hMp://CrowdTruth.org	
  	
  
1 1 1
Worker	
  Vector	
  
hMp://CrowdTruth.org	
  	
  
1 1 1
1 1
1
1 1
1 1
1 1
1
1
1
0 1 1 0 0 4 3 0 0 5 1 0
Sentence	
  Vector	
  
hMp://CrowdTruth.org	
  	
  
Unclear	
  relaFonship	
  between	
  the	
  two	
  arguments	
  reflected	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
in	
  the	
  disagreement	
  
Sentence	
  Clarity	
  
hMp://CrowdTruth.org	
  	
  
Clearly	
  expressed	
  relaFon	
  between	
  the	
  two	
  arguments	
  reflected	
  in	
  
the	
  agreement	
  
Sentence	
  Clarity	
  
hMp://CrowdTruth.org	
  	
  
Measures	
  how	
  clearly	
  a	
  sentence	
  expresses	
  a	
  relaFon	
  
0 1 1 0 0 4 3 0 0 5 1 0
Unit vector for
relation R6
Sentence
Vector
Cosine = .55
Sentence-­‐RelaFon	
  Score	
  (SRS)	
  
hMp://CrowdTruth.org	
  	
  
0.907,	
  p	
  =	
  0:007	
  
0.844	
  
AnnotaFon	
  Quality	
  	
  
of	
  Expert	
  vs.	
  Crowd	
  AnnotaFons	
  
hMp://CrowdTruth.org	
  	
  
0.907,	
  p	
  =	
  0:007	
  
0.844	
  
[0.6	
  -­‐	
  0.8]	
  crowd	
  significantly	
  out-­‐performs	
  expert	
  	
  
with	
  max	
  in	
  0.907	
  F1	
  @	
  0.7	
  threshold	
  
AnnotaFon	
  Quality	
  	
  
of	
  Expert	
  vs.	
  Crowd	
  AnnotaFons	
  
hMp://CrowdTruth.org	
  	
  
•  Normally P = TP/(TP+FP)
• Intuition:
– some sentences make better examples
– more important to get the clear cases right
– but P normally treats all examples as equal
• We propose:
– weight P with sentence-relation score (SRS)
PW = ∑i (TPi x SRSi)
∑i (TPi x SRSi) + ∑i (FPi x SRSi)
*and similarly for F1, Recall, and Accuracy
Weighted	
  Precision*	
  
hMp://CrowdTruth.org	
  	
  
CrowdTruth	
  SRS	
  Score	
  as	
  a	
  Weight	
  	
  
for	
  AnnotaFon	
  Quality	
  F1	
  
Unweighted Weighted
Crowd@.5 0.8382 0.9329
Crowd@.7 0.9074 0.9626
Expert 0.8444 0.8611
Single 0.6637 0.7344
Baseline 0.6559 0.6891
	
  the	
  sentences	
  with	
  a	
  lot	
  of	
  disagreement	
  weigh	
  less	
  
hMp://CrowdTruth.org	
  	
  
hMp://CrowdTruth.org	
  	
  
weighted	
  F1	
  scores	
  higher	
  at	
  any	
  given	
  threshold	
  
RelEx	
  CAUSE	
  Classifier	
  
for	
  Crowd	
  &	
  Expert	
  
Weighted	
  vs.	
  Unweighted	
  F1	
  Score	
  
0.658
0.638
Crowd
Expert
0.642,	
  p	
  =	
  0:016	
  	
  
0.638	
  
RelEx	
  CAUSE	
  Classifier	
  F1	
  	
  
for	
  Crowd	
  vs.	
  Expert	
  AnnotaFons	
  
hMp://CrowdTruth.org	
  	
  
0.642,	
  p	
  =	
  0:016	
  	
  
0.638	
  
crowd	
  provides	
  training	
  data	
  that	
  is	
  at	
  least	
  as	
  good	
  
if	
  not	
  bePer	
  than	
  experts	
  
RelEx	
  CAUSE	
  Classifier	
  F1	
  	
  
for	
  Crowd	
  vs.	
  Expert	
  AnnotaFons	
  
hMp://CrowdTruth.org	
  	
  
Measured	
  per	
  worker	
  
0 1 1 0 0 4 3 0 0 5 1 0
Worker’s
sentence
vector
Sentence
Vector
AVG (Cosine)
Worker-­‐Sentence	
  Disagreement	
  
hMp://CrowdTruth.org	
  	
  
•  crowd can build a ground truth
•  performs just as well as medical
experts
•  crowd is also cheaper
•  crowd is always available
•  crowd can be used as a weight
•  improved F1 scores for crowd
and expert ground truths
•  CrowdTruth = a solution to Clinical
NLP Challenge:
•  lack of ground truth for training &
benchmarking
Experiments
showed:	
  
http://CrowdTruth.org
CrowdTruth.org
http://data.CrowdTruth.org/medical-relex
#CrowdTruth @anouk_anca @laroyo @cawelty #LD4IE2015 #ISWC2015

Más contenido relacionado

Similar a #CrowdTruth: Linked Data for Information Extraction @ISWC2015

Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The BasicsPeter Berger
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationAmazon Web Services
 
FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014Ewout Kramer
 
What’s in a name Business Vocabularies, Business Rules and DMN- Denis Gagne
What’s in a name  Business Vocabularies, Business Rules and DMN- Denis GagneWhat’s in a name  Business Vocabularies, Business Rules and DMN- Denis Gagne
What’s in a name Business Vocabularies, Business Rules and DMN- Denis GagneDenis Gagné
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
What Should Hospitals Think About When Medical Transcription Outsourcing and ...
What Should Hospitals Think About When Medical Transcription Outsourcing and ...What Should Hospitals Think About When Medical Transcription Outsourcing and ...
What Should Hospitals Think About When Medical Transcription Outsourcing and ...Acroseas
 
Critical appraisal
Critical appraisalCritical appraisal
Critical appraisalPaulaFunnell
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcriptionjeanrummy
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcriptionjeanrummy
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptxMOINDALVS
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...Kees van Bochove
 
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...Hans Constandt
 
Bringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HITBringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HITgueste165460
 
Bringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HITBringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HITYiscah Bracha
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize
 
ANOVA is a hypothesis testing technique used to compare the equali.docx
ANOVA is a hypothesis testing technique used to compare the equali.docxANOVA is a hypothesis testing technique used to compare the equali.docx
ANOVA is a hypothesis testing technique used to compare the equali.docxjustine1simpson78276
 
Taxonomy bootcamp explaining metadata - dc - nov 5 2013 - compressed
Taxonomy bootcamp   explaining metadata - dc - nov 5 2013 - compressedTaxonomy bootcamp   explaining metadata - dc - nov 5 2013 - compressed
Taxonomy bootcamp explaining metadata - dc - nov 5 2013 - compressedRuven Gotz
 

Similar a #CrowdTruth: Linked Data for Information Extraction @ISWC2015 (20)

Semantic Technology: The Basics
Semantic Technology: The BasicsSemantic Technology: The Basics
Semantic Technology: The Basics
 
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic InnovationLFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
LFS302_Real-World Evidence Platform to Enable Therapeutic Innovation
 
FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014
 
What’s in a name Business Vocabularies, Business Rules and DMN- Denis Gagne
What’s in a name  Business Vocabularies, Business Rules and DMN- Denis GagneWhat’s in a name  Business Vocabularies, Business Rules and DMN- Denis Gagne
What’s in a name Business Vocabularies, Business Rules and DMN- Denis Gagne
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
What Should Hospitals Think About When Medical Transcription Outsourcing and ...
What Should Hospitals Think About When Medical Transcription Outsourcing and ...What Should Hospitals Think About When Medical Transcription Outsourcing and ...
What Should Hospitals Think About When Medical Transcription Outsourcing and ...
 
Critical appraisal
Critical appraisalCritical appraisal
Critical appraisal
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcription
 
Introduction to medical transcription
Introduction to medical transcriptionIntroduction to medical transcription
Introduction to medical transcription
 
Hl7 Standards (November 6, 2016)
Hl7 Standards (November 6, 2016)Hl7 Standards (November 6, 2016)
Hl7 Standards (November 6, 2016)
 
Resume_Clasification.pptx
Resume_Clasification.pptxResume_Clasification.pptx
Resume_Clasification.pptx
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
 
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
 
Bringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HITBringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HIT
 
Bringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HITBringing Clinical Guidelines to the Point of Care with HIT
Bringing Clinical Guidelines to the Point of Care with HIT
 
HL7 Standards
HL7 StandardsHL7 Standards
HL7 Standards
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
 
ANOVA is a hypothesis testing technique used to compare the equali.docx
ANOVA is a hypothesis testing technique used to compare the equali.docxANOVA is a hypothesis testing technique used to compare the equali.docx
ANOVA is a hypothesis testing technique used to compare the equali.docx
 
Taxonomy bootcamp explaining metadata - dc - nov 5 2013 - compressed
Taxonomy bootcamp   explaining metadata - dc - nov 5 2013 - compressedTaxonomy bootcamp   explaining metadata - dc - nov 5 2013 - compressed
Taxonomy bootcamp explaining metadata - dc - nov 5 2013 - compressed
 

Más de Lora Aroyo

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfLora Aroyo
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningLora Aroyo
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Lora Aroyo
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AILora Aroyo
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumLora Aroyo
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorLora Aroyo
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataLora Aroyo
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumLora Aroyo
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsLora Aroyo
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesLora Aroyo
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the LoopLora Aroyo
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoLora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to SnapchatLora Aroyo
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyLora Aroyo
 

Más de Lora Aroyo (20)

NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdfNeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
NeurIPS2023 Keynote: The Many Faces of Responsible AI.pdf
 
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine LearningCATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
CATS4ML Data Challenge: Crowdsourcing Adverse Test Sets for Machine Learning
 
Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)Harnessing Human Semantics at Scale (updated)
Harnessing Human Semantics at Scale (updated)
 
Data excellence: Better data for better AI
Data excellence: Better data for better AIData excellence: Better data for better AI
Data excellence: Better data for better AI
 
CHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH SymposiumCHIP Demonstrator presentation @ CATCH Symposium
CHIP Demonstrator presentation @ CATCH Symposium
 
Semantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP DemonstratorSemantic Web Challenge: CHIP Demonstrator
Semantic Web Challenge: CHIP Demonstrator
 
The Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked DataThe Rijksmuseum Collection as Linked Data
The Rijksmuseum Collection as Linked Data
 
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @RijksmuseumKeynote at International Conference of Art Libraries 2018 @Rijksmuseum
Keynote at International Conference of Art Libraries 2018 @Rijksmuseum
 
FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

#CrowdTruth: Linked Data for Information Extraction @ISWC2015

  • 1. Anca Dumitrache, Lora Aroyo, Chris Welty http://CrowdTruth.org Measures for Language Ambiguity Medical Relation Extraction Linked Data for Information Extraction @ ISWC2015 #CrowdTruth @anouk_anca @laroyo @cawelty #LD4IE2015
  • 2. • Most  knowledge  is  in  text,  but  it’s  not   structured   •  Linked  Data  sources  are  a  good  start,   but  incomplete   •  Goal  (Distance  Supervision):     –  extract  LD  triples  from  text   –  given  exis@ng  tuples  find  sentences   that  men@on  both  args   –  use  resul@ng  sentences  as  TP  to  train   a  classifier   • But  can  some8mes  be  wrong   –  <PALPATION>  loca8on  <CHEST>   –  feeling  the  way  CHEST  expands   (PALPATION),  can  iden8fy  areas  of   lung  that  are  full  of  fluid   • Standard  approach    Expert  Annota8on   Background   http://CrowdTruth.org
  • 3. •  Human  annotators  with  domain   knowledge  provide  be>er   annotated  data,  e.g   –  if  you  want  medical  texts  annotated   for  medical  rela@ons  you  need   medical  experts   •  But  experts  are  expensive  &   don’t  scale     •  MulFple  perspecFves  on  data   can  be  useful,  beyond  what   experts  believe  is  salient  or   correct     Human  AnnotaFon   Myth:   Experts  know  best   What  if  the  CROWD  IS  BETTER?   http://CrowdTruth.org
  • 4. What is the relation between the highlighted terms?   He was the first physician to identify the relationship between HEMOPHILIA  and HEMOPHILIC  ARTHROPATHY.     Experts  Know  Best?   Crowd  reads  text  literally  -­‐  provide  be>er  examples  to  machine   experts:  cause     crowd:  no  relaFon   hMp://CrowdTruth.org    
  • 5. Experts  Know  Best?   experts  vs.  crowd?   What is the (medical) relation between the highlighted (medical) terms?   •  91% of expert annotations covered by the crowd •  expert annotators reach agreement only in 30% •  most popular crowd vote covers 95% of this expert annotation agreement   hMp://CrowdTruth.org    
  • 6. •  rather  than  accep@ng   disagreement  as  a  natural   property  of  seman@c   interpreta@on   •  tradi@onally,  disagreement  is   considered  a  measure  of  poor   quality  in  the  annota@on  task   because:   –  task  is  poorly  defined  or     –  annotators  lack  training     This  makes  the  eliminaFon  of   disagreement  a  goal   Human  AnnotaFon   Myth:   Disagreement  is  Bad   What  if  it  is  GOOD?   http://CrowdTruth.org
  • 7. Disagreement  Bad?   Does each sentence express the TREAT relation?   ANTIBIOTICS are the first line treatment for indications of TYPHUS. → agreement 95% Patients with TYPHUS who were given ANTIBIOTICS exhibited side- effects. → agreement 80% With ANTIBIOTICS in short supply, DDT was used during WWII to control the insect vectors of TYPHUS. → agreement 50% Disagreement  can  reflect  the  degree  of  clarity  in  a  sentence   hMp://CrowdTruth.org    
  • 8. •  Annotator disagreement is signal, not noise. •  It is indicative of the variation in human semantic interpretation of signs •  It can indicate ambiguity, vagueness, similarity, over- generality, etc, as well as quality CrowdTruth   http://CrowdTruth.org
  • 9. •  Goal: collecting a Medical RelEx Gold Standard improve the performance of a RelEx Classifier •  Approach: crowdsource 900 medical sentences measure disagreement with CrowdTruth Metrics train & evaluate classifier with CrowdTruth SRS Score CrowdTruth  for   medical  relaFon   extracFon   http://CrowdTruth.org
  • 10. RelEx  TASK  in  CrowdFlower   PaFents  with  ACUTE  FEVER  and  nausea  could  be  suffering   from  INFLUENZA  AH1N1   Is  ACUTE  FEVER  –  related  to  →  INFLUENZA  AH1N1?   hMp://CrowdTruth.org    
  • 11. 1 1 1 Worker  Vector   hMp://CrowdTruth.org    
  • 12. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 4 3 0 0 5 1 0 Sentence  Vector   hMp://CrowdTruth.org    
  • 13. Unclear  relaFonship  between  the  two  arguments  reflected                           in  the  disagreement   Sentence  Clarity   hMp://CrowdTruth.org    
  • 14. Clearly  expressed  relaFon  between  the  two  arguments  reflected  in   the  agreement   Sentence  Clarity   hMp://CrowdTruth.org    
  • 15. Measures  how  clearly  a  sentence  expresses  a  relaFon   0 1 1 0 0 4 3 0 0 5 1 0 Unit vector for relation R6 Sentence Vector Cosine = .55 Sentence-­‐RelaFon  Score  (SRS)   hMp://CrowdTruth.org    
  • 16. 0.907,  p  =  0:007   0.844   AnnotaFon  Quality     of  Expert  vs.  Crowd  AnnotaFons   hMp://CrowdTruth.org    
  • 17. 0.907,  p  =  0:007   0.844   [0.6  -­‐  0.8]  crowd  significantly  out-­‐performs  expert     with  max  in  0.907  F1  @  0.7  threshold   AnnotaFon  Quality     of  Expert  vs.  Crowd  AnnotaFons   hMp://CrowdTruth.org    
  • 18. •  Normally P = TP/(TP+FP) • Intuition: – some sentences make better examples – more important to get the clear cases right – but P normally treats all examples as equal • We propose: – weight P with sentence-relation score (SRS) PW = ∑i (TPi x SRSi) ∑i (TPi x SRSi) + ∑i (FPi x SRSi) *and similarly for F1, Recall, and Accuracy Weighted  Precision*   hMp://CrowdTruth.org    
  • 19. CrowdTruth  SRS  Score  as  a  Weight     for  AnnotaFon  Quality  F1   Unweighted Weighted Crowd@.5 0.8382 0.9329 Crowd@.7 0.9074 0.9626 Expert 0.8444 0.8611 Single 0.6637 0.7344 Baseline 0.6559 0.6891  the  sentences  with  a  lot  of  disagreement  weigh  less   hMp://CrowdTruth.org    
  • 20. hMp://CrowdTruth.org     weighted  F1  scores  higher  at  any  given  threshold   RelEx  CAUSE  Classifier   for  Crowd  &  Expert   Weighted  vs.  Unweighted  F1  Score   0.658 0.638 Crowd Expert
  • 21. 0.642,  p  =  0:016     0.638   RelEx  CAUSE  Classifier  F1     for  Crowd  vs.  Expert  AnnotaFons   hMp://CrowdTruth.org    
  • 22. 0.642,  p  =  0:016     0.638   crowd  provides  training  data  that  is  at  least  as  good   if  not  bePer  than  experts   RelEx  CAUSE  Classifier  F1     for  Crowd  vs.  Expert  AnnotaFons   hMp://CrowdTruth.org    
  • 23. Measured  per  worker   0 1 1 0 0 4 3 0 0 5 1 0 Worker’s sentence vector Sentence Vector AVG (Cosine) Worker-­‐Sentence  Disagreement   hMp://CrowdTruth.org    
  • 24. •  crowd can build a ground truth •  performs just as well as medical experts •  crowd is also cheaper •  crowd is always available •  crowd can be used as a weight •  improved F1 scores for crowd and expert ground truths •  CrowdTruth = a solution to Clinical NLP Challenge: •  lack of ground truth for training & benchmarking Experiments showed:   http://CrowdTruth.org