SlideShare a Scribd company logo
1 of 41
Download to read offline
Data Excellence:
Better Data for Better AI
Crowd Science Workshop
@ NeurIPS2020
Lora Aroyo
http://lora-aroyo.org
@laroyo
By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN
0-8109-6720-0) by Justin Foote (talk)., Fair use,
https://en.wikipedia.org/w/index.php?curid=3955850
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
2
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability (xRR),
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
3
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability (xRR),
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
4
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability (xRR),
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
5
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability (xRR),
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
http://lora-aroyo.org @laroyo 6
The Rise of the Machines
“AI Winter”
lab experiments
Expert Systems
small scale
experiments
http://lora-aroyo.org @laroyo 7
The Rise of the Machines
“AI Winter” → “AI Breakthroughs in Games”
IBM Watson Jeopardy
DeepMind AlphaGo
beat the humans
http://lora-aroyo.org @laroyo 8
The Rise of the Machines
“AI Winter” → “AI Breakthroughs in Games” → “Real World Tasks”
Health diagnostics
Flue prediction
Weather prediction
Text, Image and Video classification
Text Generation
Text Translation
Conversational AI
support the humans
http://lora-aroyo.org @laroyo 9
Mainstream Deployment of AI
“Real World Tasks” deployed in the wild → Unintended behaviors
Microsoft Tay bot
IBM Watson Oncology
Amazon Rekognition
Google Photos
Apple Face ID
Facebook chat bots
Various Speech Assistants
http://lora-aroyo.org @laroyo 10
data quality is essential for
guiding AI away from
unintended behaviours
getting computers to “see”
the diversity of data
Data is the compass for AI
http://lora-aroyo.org @laroyo 11
The Life of AI Data
“It exists!”
bootstrapping AI with data
Caltech101
LabelMe
Berkley-3D
https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
http://lora-aroyo.org @laroyo 12
The Life of AI Data
“It exists!” → “It is bigger!”
data hungry AI
ImageNet
SIFT10M
OpenImages
COCO
Web 1T 5-Gram
https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
http://lora-aroyo.org @laroyo 13
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
http://lora-aroyo.org @laroyo 14
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
it got worse ...
http://lora-aroyo.org @laroyo 15
Unintended Behaviors in AI
Adapted from “AI in the Open World: Discovering Blind Spots of AI”, SafeAI 2020, Ece Kumar
http://lora-aroyo.org @laroyo 16
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
but before it got better ...
reactive
data improvement
http://lora-aroyo.org @laroyo 17
The Life of AI Data
“It exists!” → “It is bigger!” → “It is better!”
to reach here
we need proactive
data improvement
http://lora-aroyo.org @laroyo 18
The Life of AI Data
Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24, 2 (2009)
In the decade since then, the research community have done a lot
with quantity, but quality has been left behind
http://lora-aroyo.org @laroyo 19
datasets are not easy to debug
low data quality is typically not a result of:
- software bugs, or
- just human errors
achieving excellent data requires change in the way
we gather data and use it to evaluate AI systems:
- ensure that the datasets represent the
problems they are intended for
- ensure that the human annotation process for
these datasets allows to capture the natural
variance & bias in the problem
- ensure that the metrics used to measure the AI
quality utilize the variance & bias in the human
responses
But ...Data Quality is not easy ...
http://lora-aroyo.org @laroyo
it is not easy to give Y/N answer
for most of our AI tasks
20
Do these images depict a GUITAR ?
Data Quality is not only human error
✓
✓ ✓
✘
✘
✘✘✓
✓
http://lora-aroyo.org @laroyo 21
Do these images depict NEW ZEALAND ?
Data Quality should consider context of use
it is not easy to give Y/N answer
for most of our AI tasks
the answer typically depends on
the context, on the task, on the
usage, etc
✓ ✘
✓ ✓ ✘
✘
http://lora-aroyo.org @laroyo 22
Do these images depict a WEDDING ?
Data Quality should include real world diversity
it is not easy to give Y/N answer
for most of our AI tasks
the answer typically depends on
the context, on the task, on the
usage, etc
disagreement is signal for
natural diversity and variance
in human annotations and
should be included in AI training
✓
✘
✓
✓
✘
✓
http://lora-aroyo.org @laroyo 23
Does the sentence express TREATS relation between Chloroquine, Malaria?
Data Quality is difficult even with experts
For prevention of malaria, use only in individuals traveling to malarious
areas where CHLOROQUINE resistant P. falciparum MALARIA
has not been reported.
Rheumatoid arthritis and MALARIA have been treated
with CHLOROQUINE for decades.
Among 56 subjects reporting to a clinic with symptoms of MALARIA
53 (95%) had ordinarily effective levels of CHLOROQUINE in blood.
✓
✘
✓
“Crowdsourcing Ground Truth for Medical Relation Extraction”, ACM TiiS Special Issue on Human-Centered Machine Learning, 2018. A. Dumitrache, L. Aroyo, C. Welty
http://lora-aroyo.org @laroyo
DISAGREEMENT IS SIGNAL
There are different source of disagreement
http://lora-aroyo.org @laroyo 25
Three sides of human interpretation
CROWDTRUTH Disagreement provides
guidance in task analysis:
● items with poor semantics
● items with salient terms
● items difficult to classify
● items that are ambiguous
● subjective annotations
● time-sensitive annotations
● difficult annotation tasks
● mis-translated annotations
● users with/without
specific knowledge
● communities of thought
● spammers
“Three Sides of CrowdTruth”, Human Computation Journal, 2014, L. Aroyo, C. Welty
http://lora-aroyo.org @laroyo 26
One truth: knowledge acquisition typically assumes one
correct interpretation for every example
Experts rule: knowledge is captured from domain experts
One is enough: single expert’s knowledge is sufficient
Disagreement bad: when people disagree, they must not
understand the problem
Detailed explanations help: if examples cause
disagreement - adding instructions should help
Once done, forever valid: knowledge is not updated; new
data not aligned with old
All examples are created equal: triples are triples, one is
not more important than another, they are all either true or
false
… but in current practices disagreement is continuously
avoided and the world is forced into a binary setting
7 Myths about Human Annotation
“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
27
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability,
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
novel method for crowdsourcing adverse test
sets for ML models (CATS4ML)
http://lora-aroyo.org @laroyo
Your AI model is as good as
your evaluation data
… but is your evaluation data
missing relevant examples?
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
Your AI model is as good as
your evaluation data
… but is your evaluation data
missing relevant examples?
How can we find such
examples, especially if they are
AI blindspots
(i.e. unknown unknowns)? cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
offers a crowdsourced solution for finding
blindspots of your AI models
inspired by prior research in human computation
Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” (Attenberg, Ipeirotis, Provost, 2014)
Vulnerability Reward Program (Google)
CATS4ML Challenge
Crowdsourcing Adverse Test Sets for ML
http://lora-aroyo.org @laroyo
offers a crowdsourced red team
for finding blindspots of your AI models
CATS4ML Challenge
Crowdsourcing Adverse Test Sets for ML
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
In this first version of the CATS4ML challenge
participants will discover AI blindspots in the Open Images Dataset
Lipstick?
Airplane?
Car?
Construction worker?
Thanksgiving?
These AI blindspots are real images with visual
patterns that confuse AI models
in ways humans might find meaningful
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
image1 label1
image2 label2
…
challenge
participants
labels comparison
Open
Images
dataset
submission
image 1: label 1: lipstick
image 2: label 2: thanksgiving
...
human-in-the-loop verification
image1-label1
image2-label2
...
image1-label1
image2-label2
...
submission scoring
vs.
The Challenge Process
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
cats4ml.humancomputation.com
http://lora-aroyo.org @laroyo
cats4ml.humancomputation.com
The challenge will run until 30 April, 2021
Winners and all participants will be invited to the final
(virtual) event in 2021
submission quota
http://lora-aroyo.org @laroyo
cats4ml.humancomputation.com
Join us now!
● build a team & join
● contribute some creative
adverse examples
● spread the word to your
external research community
● give us feedback
http://lora-aroyo.org @laroyo 39
The Team
Praveen Paritosh Ka Wong
Lora Aroyo Devi Krishna
ˈl ɪ k ə r t
http://lora-aroyo.org @laroyo
TAKE HOME MESSAGE
40
data is the compass for AI - AI advances where
there is data
data quality must be addressed in AI practices
especially in the way we evaluate AI
improving evaluation of AI must consider ways to
measure variance and capture bias to bring us
one step closer to data excellence
to address variance in AI evaluation we propose a
number of novel metrics for reliability (xRR),
significance (metrology for AI) and disagreement
(CrowdTruth)
to address bias in AI evaluation we propose a
method for crowdsourcing adverse test sets for
ML models (CATS4ML)
Data Excellence:
Better Data for Better AI
Crowd Science Workshop
@ NeurIPS2020
Lora Aroyo
http://lora-aroyo.org
@laroyo
By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN
0-8109-6720-0) by Justin Foote (talk)., Fair use,
https://en.wikipedia.org/w/index.php?curid=3955850

More Related Content

More from Lora Aroyo

FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18Lora Aroyo
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsLora Aroyo
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesLora Aroyo
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the LoopLora Aroyo
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoLora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...Lora Aroyo
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Lora Aroyo
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneLora Aroyo
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityLora Aroyo
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchLora Aroyo
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital AgeLora Aroyo
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to SnapchatLora Aroyo
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyLora Aroyo
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...Lora Aroyo
 
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumStitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumLora Aroyo
 
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Lora Aroyo
 
Keynote @Final NWO CATCH Program Event
Keynote @Final NWO CATCH Program EventKeynote @Final NWO CATCH Program Event
Keynote @Final NWO CATCH Program EventLora Aroyo
 
Closing Event - Watson Innovation Course
Closing Event - Watson Innovation CourseClosing Event - Watson Innovation Course
Closing Event - Watson Innovation CourseLora Aroyo
 
CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015Lora Aroyo
 
Semantic Digital Humanities Workshop 2015 @Oxford
Semantic Digital Humanities Workshop 2015 @OxfordSemantic Digital Humanities Workshop 2015 @Oxford
Semantic Digital Humanities Workshop 2015 @OxfordLora Aroyo
 

More from Lora Aroyo (20)

FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18FAIRview: Responsible Video Summarization @NYCML'18
FAIRview: Responsible Video Summarization @NYCML'18
 
Understanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithmsUnderstanding bias in video news & news filtering algorithms
Understanding bias in video news & news filtering algorithms
 
StorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & MachinesStorySourcing: Telling Stories with Humans & Machines
StorySourcing: Telling Stories with Humans & Machines
 
Data Science with Humans in the Loop
Data Science with Humans in the LoopData Science with Humans in the Loop
Data Science with Humans in the Loop
 
Digital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora AroyoDigital Humanities Benelux 2017: Keynote Lora Aroyo
Digital Humanities Benelux 2017: Keynote Lora Aroyo
 
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
DH Benelux 2017 Panel: A Pragmatic Approach to Understanding and Utilising Ev...
 
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
Crowdsourcing ambiguity aware ground truth - collective intelligence 2017
 
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort ZoneMy ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
My ESWC 2017 keynote: Disrupting the Semantic Comfort Zone
 
Data Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden UniversityData Science with Human in the Loop @Faculty of Science #Leiden University
Data Science with Human in the Loop @Faculty of Science #Leiden University
 
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New SearchSXSW2017 @NewDutchMedia Talk: Exploration is the New Search
SXSW2017 @NewDutchMedia Talk: Exploration is the New Search
 
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital AgeEuropeana GA 2016: Harnessing Crowds, Niches & Professionals  in the Digital Age
Europeana GA 2016: Harnessing Crowds, Niches & Professionals in the Digital Age
 
"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat"Video Killed the Radio Star": From MTV to Snapchat
"Video Killed the Radio Star": From MTV to Snapchat
 
UMAP 2016 Opening Ceremony
UMAP 2016 Opening CeremonyUMAP 2016 Opening Ceremony
UMAP 2016 Opening Ceremony
 
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...Crowdsourcing & Nichesourcing: Enriching Cultural Heritagewith Experts & Cr...
Crowdsourcing & Nichesourcing: Enriching Cultural Heritage with Experts & Cr...
 
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the RijksmuseumStitch by Stitch: Annotating Fashion at the Rijksmuseum
Stitch by Stitch: Annotating Fashion at the Rijksmuseum
 
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
Museums & the Web 2016 Presentation: Enriching Collections with Expert Knowle...
 
Keynote @Final NWO CATCH Program Event
Keynote @Final NWO CATCH Program EventKeynote @Final NWO CATCH Program Event
Keynote @Final NWO CATCH Program Event
 
Closing Event - Watson Innovation Course
Closing Event - Watson Innovation CourseClosing Event - Watson Innovation Course
Closing Event - Watson Innovation Course
 
CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015CrowdTruth Games @NLeSc eHumanities day 2015
CrowdTruth Games @NLeSc eHumanities day 2015
 
Semantic Digital Humanities Workshop 2015 @Oxford
Semantic Digital Humanities Workshop 2015 @OxfordSemantic Digital Humanities Workshop 2015 @Oxford
Semantic Digital Humanities Workshop 2015 @Oxford
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Data Excellence: Better Data for Better AI

  • 1. Data Excellence: Better Data for Better AI Crowd Science Workshop @ NeurIPS2020 Lora Aroyo http://lora-aroyo.org @laroyo By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN 0-8109-6720-0) by Justin Foote (talk)., Fair use, https://en.wikipedia.org/w/index.php?curid=3955850
  • 2. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 2 data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability (xRR), significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 3. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 3 data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability (xRR), significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 4. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 4 data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability (xRR), significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 5. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 5 data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability (xRR), significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 6. http://lora-aroyo.org @laroyo 6 The Rise of the Machines “AI Winter” lab experiments Expert Systems small scale experiments
  • 7. http://lora-aroyo.org @laroyo 7 The Rise of the Machines “AI Winter” → “AI Breakthroughs in Games” IBM Watson Jeopardy DeepMind AlphaGo beat the humans
  • 8. http://lora-aroyo.org @laroyo 8 The Rise of the Machines “AI Winter” → “AI Breakthroughs in Games” → “Real World Tasks” Health diagnostics Flue prediction Weather prediction Text, Image and Video classification Text Generation Text Translation Conversational AI support the humans
  • 9. http://lora-aroyo.org @laroyo 9 Mainstream Deployment of AI “Real World Tasks” deployed in the wild → Unintended behaviors Microsoft Tay bot IBM Watson Oncology Amazon Rekognition Google Photos Apple Face ID Facebook chat bots Various Speech Assistants
  • 10. http://lora-aroyo.org @laroyo 10 data quality is essential for guiding AI away from unintended behaviours getting computers to “see” the diversity of data Data is the compass for AI
  • 11. http://lora-aroyo.org @laroyo 11 The Life of AI Data “It exists!” bootstrapping AI with data Caltech101 LabelMe Berkley-3D https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
  • 12. http://lora-aroyo.org @laroyo 12 The Life of AI Data “It exists!” → “It is bigger!” data hungry AI ImageNet SIFT10M OpenImages COCO Web 1T 5-Gram https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
  • 13. http://lora-aroyo.org @laroyo 13 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ...
  • 14. http://lora-aroyo.org @laroyo 14 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ... it got worse ...
  • 15. http://lora-aroyo.org @laroyo 15 Unintended Behaviors in AI Adapted from “AI in the Open World: Discovering Blind Spots of AI”, SafeAI 2020, Ece Kumar
  • 16. http://lora-aroyo.org @laroyo 16 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” but before it got better ... reactive data improvement
  • 17. http://lora-aroyo.org @laroyo 17 The Life of AI Data “It exists!” → “It is bigger!” → “It is better!” to reach here we need proactive data improvement
  • 18. http://lora-aroyo.org @laroyo 18 The Life of AI Data Alon Halevy, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24, 2 (2009) In the decade since then, the research community have done a lot with quantity, but quality has been left behind
  • 19. http://lora-aroyo.org @laroyo 19 datasets are not easy to debug low data quality is typically not a result of: - software bugs, or - just human errors achieving excellent data requires change in the way we gather data and use it to evaluate AI systems: - ensure that the datasets represent the problems they are intended for - ensure that the human annotation process for these datasets allows to capture the natural variance & bias in the problem - ensure that the metrics used to measure the AI quality utilize the variance & bias in the human responses But ...Data Quality is not easy ...
  • 20. http://lora-aroyo.org @laroyo it is not easy to give Y/N answer for most of our AI tasks 20 Do these images depict a GUITAR ? Data Quality is not only human error ✓ ✓ ✓ ✘ ✘ ✘✘✓ ✓
  • 21. http://lora-aroyo.org @laroyo 21 Do these images depict NEW ZEALAND ? Data Quality should consider context of use it is not easy to give Y/N answer for most of our AI tasks the answer typically depends on the context, on the task, on the usage, etc ✓ ✘ ✓ ✓ ✘ ✘
  • 22. http://lora-aroyo.org @laroyo 22 Do these images depict a WEDDING ? Data Quality should include real world diversity it is not easy to give Y/N answer for most of our AI tasks the answer typically depends on the context, on the task, on the usage, etc disagreement is signal for natural diversity and variance in human annotations and should be included in AI training ✓ ✘ ✓ ✓ ✘ ✓
  • 23. http://lora-aroyo.org @laroyo 23 Does the sentence express TREATS relation between Chloroquine, Malaria? Data Quality is difficult even with experts For prevention of malaria, use only in individuals traveling to malarious areas where CHLOROQUINE resistant P. falciparum MALARIA has not been reported. Rheumatoid arthritis and MALARIA have been treated with CHLOROQUINE for decades. Among 56 subjects reporting to a clinic with symptoms of MALARIA 53 (95%) had ordinarily effective levels of CHLOROQUINE in blood. ✓ ✘ ✓ “Crowdsourcing Ground Truth for Medical Relation Extraction”, ACM TiiS Special Issue on Human-Centered Machine Learning, 2018. A. Dumitrache, L. Aroyo, C. Welty
  • 24. http://lora-aroyo.org @laroyo DISAGREEMENT IS SIGNAL There are different source of disagreement
  • 25. http://lora-aroyo.org @laroyo 25 Three sides of human interpretation CROWDTRUTH Disagreement provides guidance in task analysis: ● items with poor semantics ● items with salient terms ● items difficult to classify ● items that are ambiguous ● subjective annotations ● time-sensitive annotations ● difficult annotation tasks ● mis-translated annotations ● users with/without specific knowledge ● communities of thought ● spammers “Three Sides of CrowdTruth”, Human Computation Journal, 2014, L. Aroyo, C. Welty
  • 26. http://lora-aroyo.org @laroyo 26 One truth: knowledge acquisition typically assumes one correct interpretation for every example Experts rule: knowledge is captured from domain experts One is enough: single expert’s knowledge is sufficient Disagreement bad: when people disagree, they must not understand the problem Detailed explanations help: if examples cause disagreement - adding instructions should help Once done, forever valid: knowledge is not updated; new data not aligned with old All examples are created equal: triples are triples, one is not more important than another, they are all either true or false … but in current practices disagreement is continuously avoided and the world is forced into a binary setting 7 Myths about Human Annotation “Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty
  • 27. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 27 data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability, significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a novel method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 28. http://lora-aroyo.org @laroyo Your AI model is as good as your evaluation data … but is your evaluation data missing relevant examples? cats4ml.humancomputation.com
  • 29. http://lora-aroyo.org @laroyo Your AI model is as good as your evaluation data … but is your evaluation data missing relevant examples? How can we find such examples, especially if they are AI blindspots (i.e. unknown unknowns)? cats4ml.humancomputation.com
  • 30. http://lora-aroyo.org @laroyo offers a crowdsourced solution for finding blindspots of your AI models inspired by prior research in human computation Beat the Machine: Challenging Humans to Find a Predictive Model's “Unknown Unknowns” (Attenberg, Ipeirotis, Provost, 2014) Vulnerability Reward Program (Google) CATS4ML Challenge Crowdsourcing Adverse Test Sets for ML
  • 31. http://lora-aroyo.org @laroyo offers a crowdsourced red team for finding blindspots of your AI models CATS4ML Challenge Crowdsourcing Adverse Test Sets for ML cats4ml.humancomputation.com
  • 32. http://lora-aroyo.org @laroyo In this first version of the CATS4ML challenge participants will discover AI blindspots in the Open Images Dataset Lipstick? Airplane? Car? Construction worker? Thanksgiving? These AI blindspots are real images with visual patterns that confuse AI models in ways humans might find meaningful cats4ml.humancomputation.com
  • 33. http://lora-aroyo.org @laroyo image1 label1 image2 label2 … challenge participants labels comparison Open Images dataset submission image 1: label 1: lipstick image 2: label 2: thanksgiving ... human-in-the-loop verification image1-label1 image2-label2 ... image1-label1 image2-label2 ... submission scoring vs. The Challenge Process cats4ml.humancomputation.com
  • 37. http://lora-aroyo.org @laroyo cats4ml.humancomputation.com The challenge will run until 30 April, 2021 Winners and all participants will be invited to the final (virtual) event in 2021 submission quota
  • 38. http://lora-aroyo.org @laroyo cats4ml.humancomputation.com Join us now! ● build a team & join ● contribute some creative adverse examples ● spread the word to your external research community ● give us feedback
  • 39. http://lora-aroyo.org @laroyo 39 The Team Praveen Paritosh Ka Wong Lora Aroyo Devi Krishna ˈl ɪ k ə r t
  • 40. http://lora-aroyo.org @laroyo TAKE HOME MESSAGE 40 data is the compass for AI - AI advances where there is data data quality must be addressed in AI practices especially in the way we evaluate AI improving evaluation of AI must consider ways to measure variance and capture bias to bring us one step closer to data excellence to address variance in AI evaluation we propose a number of novel metrics for reliability (xRR), significance (metrology for AI) and disagreement (CrowdTruth) to address bias in AI evaluation we propose a method for crowdsourcing adverse test sets for ML models (CATS4ML)
  • 41. Data Excellence: Better Data for Better AI Crowd Science Workshop @ NeurIPS2020 Lora Aroyo http://lora-aroyo.org @laroyo By Scanned from The Magic of M. C. Escher. (Harry N. Abrams, Inc. ISBN 0-8109-6720-0) by Justin Foote (talk)., Fair use, https://en.wikipedia.org/w/index.php?curid=3955850