UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

•

1 recomendación•683 vistas

Joint work with Hyun Joon Jung describing our submission to this year's IRAT task, presented at NIST TREC conference (November 8, 2012)

Tecnología

TREC 2012 Crowdsourcing Track

Becoming IRATE : UT Austin’s
Image Relevance Assessment Task Enthusiasm!

Hyun Joon Jung Matthew Lease
hyunJoon@utexas.edu ml@ischool.utexas.edu

@mattlease

Key Points

• Interface design for efficient, cohesive judging

– Collected 44K labels for $40

• Off-the-shelf worker scoring metric (Raykar & Yu)

• Completely unsupervised (no training or tuning)

• Online label analysis (cf. Welinder & Perona’10)

• Personalized error reports for workers

• … and all in 3 weeks!  2

Scoring and Incentivizing Workers

V. Raykar, S. Yu, L. Zhao, G. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning
from crowds. Journal of Machine Learning Research, 99:1297–1322, 2010.
4

Past Work: Offline Crowdsourcing
e.g., Jung & Lease, HCOMP 2012

5

Here: Online Crowdsourcing
Unsupervised, Incremental, Iterative data collection

Label Collection Worker Evaluation

Trusted
Confident Ambiguous
workers

Iterative

Welinder & Perona. Online
Pseudo-Ground Truth
crowdsourcing: Rating annotators
and obtaining cost-effective labels.
CVPR’10 Workshops. 6

Collecting Labels
• Partition examples into subsets
• For each example in the current partition
– Collect 2k labels for the example
– If Jaccard agreement & high confidence
• Declare aggregate label as “pseudo-gold”
– Else if within budget and trusted workers exist
• Collect another label and re-test for pseudo-gold
– Else
• Give up, output best guess aggregate label

7

Identifying Trusted Workers

• For a subset of psuedo-gold examples
– Collect 2k labels for the example
• For each worker
– If spammer score > 0.5 over >= 100 examples
• Add worker to trusted pool

8

Number of labels & Cost Breakdown
# of workers per judgments
182, 1% 40, 0%

• 80% of judgments: labeled only twice. 3821, 19%
2 workers
• 99% of judgments: labeled only three 3 workers
times.
4 workers

15757, 80% 5 workers

Cost breakdown

• Label Collection: $22 (44,000 Labels / 100 labels per HIT * 0.05)
• Worker Evaluation: $5 (10,000 labels / 100 labels per HIT * 0.05)
• Bonus: $10 to 4 trusted workers based on our policy

10

Key Points
• Some interesting ideas to explore further

– Interface design

– Online label analysis (cf. Welinder & Perona’10)

– Personalized error reports for workers

• Some nice properties

– Unsupervised, 44K labels for $40, rapid development

• Preliminary results, more analysis needed…
12

Thanks!

NIST: Ellen & Ian

Track Org: Gabriella & Mark

ir.ischool.utexas.edu/crowd
Support
– Temple Fellowship

Matt Lease - ml@ischool.utexas.edu - @mattlease

Más contenido relacionado

Similar a UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

Human computation, crowdsourcing and social: An industrial perspectiveoralonso

Discovery for Knowledge WorkAKAGroup

Are you users' skills as up to date as your technology?Optimum Technology Transfer

Enable Lead ScoringMLWallace

Raab Reachforce AMA Data Qualitydraab

How to Avoid Bad Hires Through Reference CheckingHuman Capital Media

H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati

Hiring, Firing, and Co-Founders: My Founder Institute SessionRoy Rodenstein

Chapter 10: Data Miningitsvineeth209

Process improvement presentationDr. John Persico

SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...Dave Healey

Creating A Culture of ExperimentationIntuit Inc.

Randstad Professionals Masterhans_groenbech

Improving Organizational Performance Through Pre Employment TestingBruce Chesebrough

The Rare Find: Spotting Exceptional Talent Before Everyone Else with George ...Smith Family Business Initiative at Cornell

The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...Recruitment Innovation Summit

Performence apprislswaranjitsingh

TechnoCorp Solution - Enticing Training to placementTechnoCorp Solutions Pvt Ltd

SQC Guest Lecture- StarbucksBrandon Theiss, PE

The Carrot PrincipleSzu-Chia Huang

Similar a UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT) (20)

Human computation, crowdsourcing and social: An industrial perspective

Discovery for Knowledge Work

Are you users' skills as up to date as your technology?

Enable Lead Scoring

Raab Reachforce AMA Data Quality

How to Avoid Bad Hires Through Reference Checking

H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj

Hiring, Firing, and Co-Founders: My Founder Institute Session

Chapter 10: Data Mining

Process improvement presentation

SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...

Creating A Culture of Experimentation

Randstad Professionals Master

Improving Organizational Performance Through Pre Employment Testing

The Rare Find: Spotting Exceptional Talent Before Everyone Else with George ...

The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...

Performence apprisl

TechnoCorp Solution - Enticing Training to placement

SQC Guest Lecture- Starbucks

The Carrot Principle

Más de Matthew Lease

Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease

Explainable Fact Checking with Humans in-the-loopMatthew Lease

Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease

AI & Work, with Transparency & the Crowd Matthew Lease

Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease

Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease

But Who Protects the Moderators?Matthew Lease

Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease

Fact Checking & Information RetrievalMatthew Lease

Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease

What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease

Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease

The Rise of Crowd Computing (July 7, 2016)Matthew Lease

The Rise of Crowd Computing - 2016Matthew Lease

The Rise of Crowd Computing (December 2015)Matthew Lease

Toward Better Crowdsourcing ScienceMatthew Lease

Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease

The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease

Más de Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses

Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...

Explainable Fact Checking with Humans in-the-loop

Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...

AI & Work, with Transparency & the Crowd

Designing Human-AI Partnerships to Combat Misinfomation

Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...

But Who Protects the Moderators?

Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...

Fact Checking & Information Retrieval

Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...

What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...

Systematic Review is e-Discovery in Doctor’s Clothing

The Rise of Crowd Computing (July 7, 2016)

The Rise of Crowd Computing - 2016

The Rise of Crowd Computing (December 2015)

Toward Better Crowdsourcing Science

Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms

The Search for Truth in Objective & Subject Crowdsourcing

Último

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

AI as an Interface for Commercial BuildingsMemoori

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Training state-of-the-art general text embeddingZilliz

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz

Powerpoint exploring the locations used in television show Time Clashcharlottematthew16

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Story boards and shot lists for my a level piececharlottematthew16

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Gen AI in Business - Global Trends Report 2024.pdfAddepto

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

1. TREC 2012 Crowdsourcing Track Becoming IRATE : UT Austin’s Image Relevance Assessment Task Enthusiasm! Hyun Joon Jung Matthew Lease hyunJoon@utexas.edu ml@ischool.utexas.edu @mattlease

2. Key Points • Interface design for efficient, cohesive judging – Collected 44K labels for $40 • Off-the-shelf worker scoring metric (Raykar & Yu) • Completely unsupervised (no training or tuning) • Online label analysis (cf. Welinder & Perona’10) • Personalized error reports for workers • … and all in 3 weeks!  2

3. Interface Design 3

4. Scoring and Incentivizing Workers V. Raykar, S. Yu, L. Zhao, G. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 99:1297–1322, 2010. 4

5. Past Work: Offline Crowdsourcing e.g., Jung & Lease, HCOMP 2012 5

6. Here: Online Crowdsourcing Unsupervised, Incremental, Iterative data collection Label Collection Worker Evaluation Trusted Confident Ambiguous workers Iterative Welinder & Perona. Online Pseudo-Ground Truth crowdsourcing: Rating annotators and obtaining cost-effective labels. CVPR’10 Workshops. 6

7. Collecting Labels • Partition examples into subsets • For each example in the current partition – Collect 2k labels for the example – If Jaccard agreement & high confidence • Declare aggregate label as “pseudo-gold” – Else if within budget and trusted workers exist • Collect another label and re-test for pseudo-gold – Else • Give up, output best guess aggregate label 7

8. Identifying Trusted Workers • For a subset of psuedo-gold examples – Collect 2k labels for the example • For each worker – If spammer score > 0.5 over >= 100 examples • Add worker to trusted pool 8

9. Personalized Error Reports 9

10. Number of labels & Cost Breakdown # of workers per judgments 182, 1% 40, 0% • 80% of judgments: labeled only twice. 3821, 19% 2 workers • 99% of judgments: labeled only three 3 workers times. 4 workers 15757, 80% 5 workers Cost breakdown • Label Collection: $22 (44,000 Labels / 100 labels per HIT * 0.05) • Worker Evaluation: $5 (10,000 labels / 100 labels per HIT * 0.05) • Bonus: $10 to 4 trusted workers based on our policy 10

11. Effectiveness 11

12. Key Points • Some interesting ideas to explore further – Interface design – Online label analysis (cf. Welinder & Perona’10) – Personalized error reports for workers • Some nice properties – Unsupervised, 44K labels for $40, rapid development • Preliminary results, more analysis needed… 12

13. Thanks! NIST: Ellen & Ian Track Org: Gabriella & Mark ir.ischool.utexas.edu/crowd Support – Temple Fellowship Matt Lease - ml@ischool.utexas.edu - @mattlease

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

Recomendados

Recomendados

Más contenido relacionado

Similar a UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

Similar a UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT) (20)

Más de Matthew Lease

Más de Matthew Lease (20)

Último

Último (20)

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)