SlideShare una empresa de Scribd logo
1 de 13
TREC 2012 Crowdsourcing Track




       Becoming IRATE : UT Austin’s
Image Relevance Assessment Task Enthusiasm!

                                Hyun Joon Jung         Matthew Lease
                                 hyunJoon@utexas.edu   ml@ischool.utexas.edu

                                                           @mattlease
Key Points

• Interface design for efficient, cohesive judging

   – Collected 44K labels for $40

• Off-the-shelf worker scoring metric (Raykar & Yu)

• Completely unsupervised (no training or tuning)

• Online label analysis (cf. Welinder & Perona’10)

• Personalized error reports for workers

• … and all in 3 weeks!                              2
Interface Design




                   3
Scoring and Incentivizing Workers




  V. Raykar, S. Yu, L. Zhao, G. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning
  from crowds. Journal of Machine Learning Research, 99:1297–1322, 2010.
                                                                                      4
Past Work: Offline Crowdsourcing
e.g., Jung & Lease, HCOMP 2012




                                    5
Here: Online Crowdsourcing
Unsupervised, Incremental, Iterative data collection



               Label Collection                 Worker Evaluation




                                                       Trusted
         Confident         Ambiguous
                                                       workers




                                                                 Iterative


                                                  Welinder & Perona. Online
            Pseudo-Ground Truth
                                                  crowdsourcing: Rating annotators
                                                  and obtaining cost-effective labels.
                                                  CVPR’10 Workshops.             6
Collecting Labels
• Partition examples into subsets
• For each example in the current partition
   – Collect 2k labels for the example
   – If Jaccard agreement & high confidence
     • Declare aggregate label as “pseudo-gold”
  – Else if within budget and trusted workers exist
     • Collect another label and re-test for pseudo-gold
  – Else
     • Give up, output best guess aggregate label


                                                           7
Identifying Trusted Workers

• For a subset of psuedo-gold examples
  – Collect 2k labels for the example
• For each worker
  – If spammer score > 0.5 over >= 100 examples
      • Add worker to trusted pool




                                              8
Personalized Error Reports




                             9
Number of labels & Cost Breakdown
  # of workers per judgments
                                                182, 1%           40, 0%




   •   80% of judgments: labeled only twice.          3821, 19%
                                                                                        2 workers
   •   99% of judgments: labeled only three                                             3 workers
       times.
                                                                                        4 workers

                                                                           15757, 80%   5 workers




  Cost breakdown


  •    Label Collection: $22 (44,000 Labels / 100 labels per HIT * 0.05)
  •    Worker Evaluation: $5 (10,000 labels / 100 labels per HIT * 0.05)
  •    Bonus: $10 to 4 trusted workers based on our policy



                                                                                                    10
Effectiveness




                11
Key Points
• Some interesting ideas to explore further

   – Interface design

   – Online label analysis (cf. Welinder & Perona’10)

   – Personalized error reports for workers

• Some nice properties

   – Unsupervised, 44K labels for $40, rapid development

• Preliminary results, more analysis needed…
                                                           12
Thanks!

NIST: Ellen & Ian

Track Org: Gabriella & Mark


                              ir.ischool.utexas.edu/crowd
Support
  – Temple Fellowship



         Matt Lease - ml@ischool.utexas.edu -   @mattlease

Más contenido relacionado

Similar a UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveoralonso
 
Discovery for Knowledge Work
Discovery for Knowledge WorkDiscovery for Knowledge Work
Discovery for Knowledge WorkAKAGroup
 
Are you users' skills as up to date as your technology?
Are you users' skills as up to date as your technology?Are you users' skills as up to date as your technology?
Are you users' skills as up to date as your technology?Optimum Technology Transfer
 
Enable Lead Scoring
Enable Lead ScoringEnable Lead Scoring
Enable Lead ScoringMLWallace
 
Raab Reachforce AMA Data Quality
Raab Reachforce AMA Data QualityRaab Reachforce AMA Data Quality
Raab Reachforce AMA Data Qualitydraab
 
How to Avoid Bad Hires Through Reference Checking
How to Avoid Bad Hires Through Reference CheckingHow to Avoid Bad Hires Through Reference Checking
How to Avoid Bad Hires Through Reference CheckingHuman Capital Media
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati
 
Hiring, Firing, and Co-Founders: My Founder Institute Session
Hiring, Firing, and Co-Founders: My Founder Institute SessionHiring, Firing, and Co-Founders: My Founder Institute Session
Hiring, Firing, and Co-Founders: My Founder Institute SessionRoy Rodenstein
 
Chapter 10: Data Mining
Chapter 10: Data MiningChapter 10: Data Mining
Chapter 10: Data Miningitsvineeth209
 
Process improvement presentation
Process improvement presentationProcess improvement presentation
Process improvement presentationDr. John Persico
 
SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...
SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...
SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...Dave Healey
 
Creating A Culture of Experimentation
Creating A Culture of ExperimentationCreating A Culture of Experimentation
Creating A Culture of ExperimentationIntuit Inc.
 
Randstad Professionals Master
Randstad Professionals MasterRandstad Professionals Master
Randstad Professionals Masterhans_groenbech
 
Improving Organizational Performance Through Pre Employment Testing
Improving Organizational Performance Through Pre Employment TestingImproving Organizational Performance Through Pre Employment Testing
Improving Organizational Performance Through Pre Employment TestingBruce Chesebrough
 
The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...
The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...
The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...Recruitment Innovation Summit
 
TechnoCorp Solution - Enticing Training to placement
TechnoCorp Solution - Enticing Training to placementTechnoCorp Solution - Enticing Training to placement
TechnoCorp Solution - Enticing Training to placementTechnoCorp Solutions Pvt Ltd
 

Similar a UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT) (20)

Human computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspectiveHuman computation, crowdsourcing and social: An industrial perspective
Human computation, crowdsourcing and social: An industrial perspective
 
Discovery for Knowledge Work
Discovery for Knowledge WorkDiscovery for Knowledge Work
Discovery for Knowledge Work
 
Are you users' skills as up to date as your technology?
Are you users' skills as up to date as your technology?Are you users' skills as up to date as your technology?
Are you users' skills as up to date as your technology?
 
Enable Lead Scoring
Enable Lead ScoringEnable Lead Scoring
Enable Lead Scoring
 
Raab Reachforce AMA Data Quality
Raab Reachforce AMA Data QualityRaab Reachforce AMA Data Quality
Raab Reachforce AMA Data Quality
 
How to Avoid Bad Hires Through Reference Checking
How to Avoid Bad Hires Through Reference CheckingHow to Avoid Bad Hires Through Reference Checking
How to Avoid Bad Hires Through Reference Checking
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
Hiring, Firing, and Co-Founders: My Founder Institute Session
Hiring, Firing, and Co-Founders: My Founder Institute SessionHiring, Firing, and Co-Founders: My Founder Institute Session
Hiring, Firing, and Co-Founders: My Founder Institute Session
 
Chapter 10: Data Mining
Chapter 10: Data MiningChapter 10: Data Mining
Chapter 10: Data Mining
 
Process improvement presentation
Process improvement presentationProcess improvement presentation
Process improvement presentation
 
SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...
SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...
SharePoint and Lean Development: Critical Factors for Accelerating Time to Va...
 
Creating A Culture of Experimentation
Creating A Culture of ExperimentationCreating A Culture of Experimentation
Creating A Culture of Experimentation
 
Randstad Professionals Master
Randstad Professionals MasterRandstad Professionals Master
Randstad Professionals Master
 
Improving Organizational Performance Through Pre Employment Testing
Improving Organizational Performance Through Pre Employment TestingImproving Organizational Performance Through Pre Employment Testing
Improving Organizational Performance Through Pre Employment Testing
 
The Rare Find: Spotting Exceptional Talent Before Everyone Else with George ...
 The Rare Find: Spotting Exceptional Talent Before Everyone Else with George ... The Rare Find: Spotting Exceptional Talent Before Everyone Else with George ...
The Rare Find: Spotting Exceptional Talent Before Everyone Else with George ...
 
The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...
The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...
The Formula for Sourcing Success: Learning the Art of Quality-First Talent So...
 
Performence apprisl
Performence apprislPerformence apprisl
Performence apprisl
 
TechnoCorp Solution - Enticing Training to placement
TechnoCorp Solution - Enticing Training to placementTechnoCorp Solution - Enticing Training to placement
TechnoCorp Solution - Enticing Training to placement
 
SQC Guest Lecture- Starbucks
SQC Guest Lecture- StarbucksSQC Guest Lecture- Starbucks
SQC Guest Lecture- Starbucks
 
The Carrot Principle
The Carrot PrincipleThe Carrot Principle
The Carrot Principle
 

Más de Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesMatthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopMatthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information RetrievalMatthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...Matthew Lease
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingMatthew Lease
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)Matthew Lease
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing ScienceMatthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsMatthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingMatthew Lease
 

Más de Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
Systematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s ClothingSystematic Review is e-Discovery in Doctor’s Clothing
Systematic Review is e-Discovery in Doctor’s Clothing
 
The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)The Rise of Crowd Computing (July 7, 2016)
The Rise of Crowd Computing (July 7, 2016)
 
The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016The Rise of Crowd Computing - 2016
The Rise of Crowd Computing - 2016
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 

Último

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Último (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (IRAT)

  • 1. TREC 2012 Crowdsourcing Track Becoming IRATE : UT Austin’s Image Relevance Assessment Task Enthusiasm! Hyun Joon Jung Matthew Lease hyunJoon@utexas.edu ml@ischool.utexas.edu @mattlease
  • 2. Key Points • Interface design for efficient, cohesive judging – Collected 44K labels for $40 • Off-the-shelf worker scoring metric (Raykar & Yu) • Completely unsupervised (no training or tuning) • Online label analysis (cf. Welinder & Perona’10) • Personalized error reports for workers • … and all in 3 weeks!  2
  • 4. Scoring and Incentivizing Workers V. Raykar, S. Yu, L. Zhao, G. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 99:1297–1322, 2010. 4
  • 5. Past Work: Offline Crowdsourcing e.g., Jung & Lease, HCOMP 2012 5
  • 6. Here: Online Crowdsourcing Unsupervised, Incremental, Iterative data collection Label Collection Worker Evaluation Trusted Confident Ambiguous workers Iterative Welinder & Perona. Online Pseudo-Ground Truth crowdsourcing: Rating annotators and obtaining cost-effective labels. CVPR’10 Workshops. 6
  • 7. Collecting Labels • Partition examples into subsets • For each example in the current partition – Collect 2k labels for the example – If Jaccard agreement & high confidence • Declare aggregate label as “pseudo-gold” – Else if within budget and trusted workers exist • Collect another label and re-test for pseudo-gold – Else • Give up, output best guess aggregate label 7
  • 8. Identifying Trusted Workers • For a subset of psuedo-gold examples – Collect 2k labels for the example • For each worker – If spammer score > 0.5 over >= 100 examples • Add worker to trusted pool 8
  • 10. Number of labels & Cost Breakdown # of workers per judgments 182, 1% 40, 0% • 80% of judgments: labeled only twice. 3821, 19% 2 workers • 99% of judgments: labeled only three 3 workers times. 4 workers 15757, 80% 5 workers Cost breakdown • Label Collection: $22 (44,000 Labels / 100 labels per HIT * 0.05) • Worker Evaluation: $5 (10,000 labels / 100 labels per HIT * 0.05) • Bonus: $10 to 4 trusted workers based on our policy 10
  • 12. Key Points • Some interesting ideas to explore further – Interface design – Online label analysis (cf. Welinder & Perona’10) – Personalized error reports for workers • Some nice properties – Unsupervised, 44K labels for $40, rapid development • Preliminary results, more analysis needed… 12
  • 13. Thanks! NIST: Ellen & Ian Track Org: Gabriella & Mark ir.ischool.utexas.edu/crowd Support – Temple Fellowship Matt Lease - ml@ischool.utexas.edu - @mattlease