The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track

•

0 likes•251 views

This paper describes the participation of the uc3m team in both tasks of the TREC 2011 Crowdsourcing Track. For the first task we submitted three runs that used Amazon Mechanical Turk: one where workers made relevance judgments based on a 3-point scale, and two similar runs where workers provided an explicit ranking of documents. All three runs implemented a quality control mechanism at the task level based on a simple reading comprehension test. For the second task we also submitted three runs: one with a stepwise execution of the GetAnotherLabel algorithm and two others with a rule-based and a SVMbased model. According to the NIST gold labels, our runs performed very well in both tasks, ranking at the top for most measures.

Science

In a Nutshell
3 runs, Amazon Mechanical Turk, External HITs
One HIT for each set of 5 documents = 435 HITs (2175 judgments)
$0.20 per HIT = $0.04 per document
Run 3 Stepwise execution of the GetAnotherLabel algorithm. Hypothesis: bad workers for one type of topics are not necessarily bad for others. For each worker wi compute expected quality qi on all topics and quality qij on each topic type tj. For topics in tj, use only workers with qij>qi. Topic categorization: TREC category (closed, advice, navigational, etc.), topic subject (politics, shopping, etc.) and rarity of the topic words. Runs 1 & 2 Train rule-based and SVM-based ML models. Features:
•Worker confusion matrix from GetAnotherLabel:
•For all workers, average posterior probability of relevant/nonrelevant
•For all workers, average correct-to-incorrect ratio when saying relevant or not
•For the document, relevant-to-nonrelevant ratio
The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track
Julián Urbano, Mónica Marrero, Diego Martín, Jorge Morato, Karina Robles and Juan Lloréns
Gaithersburg, USA November 16th, 2011
run 1
run 2
run 3
Hours to complete
8.5
38
20.5
HITs submitted (overhead)
438 (+1%)
535 (+23%)
448 (+3%)
Submitted workers (just previewers)
29 (102)
83 (383)
30 (163)
Average documents per worker
76
32
75
Total cost (including fees)
$95.7
$95.7
$95.7
much better control of the whole process
fair for most workers (previous trials)
2. Display Modes
•With images
•Black & white, same layout but no images
Topic key terms (run 3)
3. Task focus: keywords (runs 1 & 2) or relevance (run 3)
4. Tabbed design
5. Quality Control
Worker Level
50 HITs at most, at least 100 approved and 95% approval (98% in run 3)
Implicit Task Level: Work Time
At least 4.5 s/document (preview+work)
Explicit Task Level: Comprehension What set of keywords better describe the document?
•Correct: top 3 by TF + 2 from next 5
•Incorrect: 5 random in last 25
some folks work while previewing
subjects always recognize top 1-2 by TF
Rejecting & Blocking
Action
Failure
run 1
run 2
run 3
Reject
Keyword
1
0
1
Time
2
1
1
Block
Keyword
1
1
1
Time
2
1
1
HITs rejected
3 (1%)
100 (23%)
13 (3%)
Workers blocked
0 (0%)
40 (48%)
4 (13%)
7. Relevance Labels Binary
•run 1: bad = 0, fair or good = 1
•runs 2 & 3: normalize slider range in [0-1] If value > 0.4 then 1, else 0 Ranking
•run 1: order by relevance, then by failures in keywords and then by time spent
•runs 2 & 3: explicit in sliders
Task I
Task II
Acc.
Rec.
Prec.
Spec.
AP
NDCG
Median
.623
.729
.773
.536
.931
.922
run 1
.748
.802
.841
.632
.922
.958
run 2
.690
.720
.821
.607
.889
.935
run 3
.731
.737
.857
.728
.894
.932
Acc.
Rec.
Prec.
Spec.
AP
NDCG
Median
.640
.754
.625
.560
.111
.359
run 1
.699
.754
.679
.644
.166
.415
run 2
.714
.750
.700
.678
.082
.331
run 3
.571
.659
.560
.484
.060
.299
according to Wordnet
unbiased majority voting
1. Document Preprocessing
Cleanup for smooth loading and safe rendering: remove everything unrelated to style or layout
6. Relevance: run 1 run2 run3
* Unofficial, as per NIST gold labels

Similar to The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track

Performance evaluation of IR modelsNisha Arankandath

Rui Meng - 2017 - Deep Keyphrase GenerationAssociation for Computational Linguistics

Can we induce change with what we measure?Michaela Greiler

Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...Sagar Deogirkar

2013 7 24 TAR Webinar 5 Tips & Myths SiglerSonya Sigler

Intro to Machine Learning by Microsoft Venturesmicrosoftventures

Fully Automated QA System For Large Scale Search And Recommendation Engines U...Spark Summit

CS3114_09212011.pptArumugam90

Machine Learning with TensorFlow 2Sarah Stemmler

Webinar: Performance Tuning + OptimizationMongoDB

Chapter 5 Query Evaluation.pdfHabtamu100

Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & TasksRishabh Mehrotra

Building largescalepredictionsystemv1arthi v

Applied Machine Learning for Chemistry II (HSI2020)Ichigaku Takigawa

Database Research Principles Revealedinfoblog

04-Data-Analysis-Overview.pptxShree Shree

Simple rules for building robust machine learning modelsKyriakos Chatzidimitriou

AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services

OR Ndejje Univ (1).pptxChandigaRichard1

Heidelberg presentationnpz

Similar to The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track (20)

Performance evaluation of IR models

Rui Meng - 2017 - Deep Keyphrase Generation

Can we induce change with what we measure?

Comparative Study of Machine Learning Algorithms for Sentiment Analysis with ...

2013 7 24 TAR Webinar 5 Tips & Myths Sigler

Intro to Machine Learning by Microsoft Ventures

Fully Automated QA System For Large Scale Search And Recommendation Engines U...

CS3114_09212011.ppt

Machine Learning with TensorFlow 2

Webinar: Performance Tuning + Optimization

Chapter 5 Query Evaluation.pdf

Parts 1 & 2: WWW 2018 Tutorial: Understanding User Needs & Tasks

Building largescalepredictionsystemv1

Applied Machine Learning for Chemistry II (HSI2020)

Database Research Principles Revealed

04-Data-Analysis-Overview.pptx

Simple rules for building robust machine learning models

AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)

OR Ndejje Univ (1).pptx

Heidelberg presentation

Recently uploaded

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari

Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani

GBSN - Microbiology (Unit 1)Areesha Ahmad

Engler and Prantl system of classification in plant taxonomyNistarini College, Purulia (W.B) India

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Disentangling the origin of chemical differences using GHOSTSérgio Sacani

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

GBSN - Biochemistry (Unit 1)Areesha Ahmad

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

Formation of low mass protostars and their circumstellar disksSérgio Sacani

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

Natural Polymer Based NanomaterialsAArockiyaNisha

Recently uploaded (20)

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

GBSN - Microbiology (Unit 1)

Engler and Prantl system of classification in plant taxonomy

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Disentangling the origin of chemical differences using GHOST

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

GBSN - Biochemistry (Unit 1)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

Formation of low mass protostars and their circumstellar disks

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis

Recombinant DNA technology (Immunological screening)

Broad bean, Lima Bean, Jack bean, Ullucus.pptx

Natural Polymer Based Nanomaterials

The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track

1. In a Nutshell 3 runs, Amazon Mechanical Turk, External HITs One HIT for each set of 5 documents = 435 HITs (2175 judgments) $0.20 per HIT = $0.04 per document Run 3 Stepwise execution of the GetAnotherLabel algorithm. Hypothesis: bad workers for one type of topics are not necessarily bad for others. For each worker wi compute expected quality qi on all topics and quality qij on each topic type tj. For topics in tj, use only workers with qij>qi. Topic categorization: TREC category (closed, advice, navigational, etc.), topic subject (politics, shopping, etc.) and rarity of the topic words. Runs 1 & 2 Train rule-based and SVM-based ML models. Features: •Worker confusion matrix from GetAnotherLabel: •For all workers, average posterior probability of relevant/nonrelevant •For all workers, average correct-to-incorrect ratio when saying relevant or not •For the document, relevant-to-nonrelevant ratio The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track Julián Urbano, Mónica Marrero, Diego Martín, Jorge Morato, Karina Robles and Juan Lloréns Gaithersburg, USA November 16th, 2011 run 1 run 2 run 3 Hours to complete 8.5 38 20.5 HITs submitted (overhead) 438 (+1%) 535 (+23%) 448 (+3%) Submitted workers (just previewers) 29 (102) 83 (383) 30 (163) Average documents per worker 76 32 75 Total cost (including fees) $95.7 $95.7 $95.7 much better control of the whole process fair for most workers (previous trials) 2. Display Modes •With images •Black & white, same layout but no images Topic key terms (run 3) 3. Task focus: keywords (runs 1 & 2) or relevance (run 3) 4. Tabbed design 5. Quality Control Worker Level 50 HITs at most, at least 100 approved and 95% approval (98% in run 3) Implicit Task Level: Work Time At least 4.5 s/document (preview+work) Explicit Task Level: Comprehension What set of keywords better describe the document? •Correct: top 3 by TF + 2 from next 5 •Incorrect: 5 random in last 25 some folks work while previewing subjects always recognize top 1-2 by TF Rejecting & Blocking Action Failure run 1 run 2 run 3 Reject Keyword 1 0 1 Time 2 1 1 Block Keyword 1 1 1 Time 2 1 1 HITs rejected 3 (1%) 100 (23%) 13 (3%) Workers blocked 0 (0%) 40 (48%) 4 (13%) 7. Relevance Labels Binary •run 1: bad = 0, fair or good = 1 •runs 2 & 3: normalize slider range in [0-1] If value > 0.4 then 1, else 0 Ranking •run 1: order by relevance, then by failures in keywords and then by time spent •runs 2 & 3: explicit in sliders Task I Task II Acc. Rec. Prec. Spec. AP NDCG Median .623 .729 .773 .536 .931 .922 run 1 .748 .802 .841 .632 .922 .958 run 2 .690 .720 .821 .607 .889 .935 run 3 .731 .737 .857 .728 .894 .932 Acc. Rec. Prec. Spec. AP NDCG Median .640 .754 .625 .560 .111 .359 run 1 .699 .754 .679 .644 .166 .415 run 2 .714 .750 .700 .678 .082 .331 run 3 .571 .659 .560 .484 .060 .299 according to Wordnet unbiased majority voting 1. Document Preprocessing Cleanup for smooth loading and safe rendering: remove everything unrelated to style or layout 6. Relevance: run 1 run2 run3 * Unofficial, as per NIST gold labels

The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track

Recommended

Recommended

More Related Content

Similar to The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track

Similar to The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track (20)

More from Julián Urbano

More from Julián Urbano (20)

Recently uploaded

Recently uploaded (20)

The University Carlos III of Madrid at TREC 2011 Crowdsourcing Track