SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Crowdsourcing	
  search	
  relevance	
  
     evalua2on	
  at	
  eBay	
  	
  
             Brian	
  Johnson	
  
          September	
  28,	
  2011	
  
                       	
  
Agenda	
  
•    Why	
  
•    What	
  
•    How	
  
•    Cost	
  
•    Quality	
  
•    Measurement	
  
Why	
  Ask	
  Real	
  Humans	
  
•  They’re	
  our	
  customers	
  
    –  Some2mes	
  asking	
  is	
  the	
  best	
  way	
  to	
  find	
  out	
  what	
  you	
  
       want	
  to	
  know	
  
    –  Provide	
  ground	
  truth	
  for	
  automated	
  metrics	
  
•  Provide	
  data	
  for	
  
    –  Experimental	
  Evalua2on	
  
         •  complements	
  A/B	
  tes2ng,	
  surveys	
  
    –  Query	
  Diagnosis	
  
    –  Judged	
  Test	
  Corpus	
  
         •  Machine	
  Learning	
  
         •  Offline	
  evalua2on	
  	
  
    –  Produc2on	
  Quality	
  Control	
  
Why	
  Crowdsourcing	
  
•  Fast	
  
    –  1-­‐3	
  days	
  
•  Low	
  Cost	
  
    –  pennies	
  per	
  judgment	
  
•  High	
  Quality	
  
    –  Mul2ple	
  workers	
  
    –  Worker	
  evalua2on	
  (test	
  ques2ons	
  &	
  inter-­‐worker	
  
       agreement)	
  
•  Flexible	
  
    –  Ask	
  anything	
  
Judgment	
  Volume	
  by	
  Day	
  
Cost	
  
Judgments	
   Cost	
  
          1	
           $0.01	
  	
  
         10	
           $0.10	
  	
  
        100	
           $1.00	
  	
  
      1,000	
          $10.00	
  	
  
     10,000	
       $100.00	
  	
  
    100,000	
   $1,000.00	
  	
  
  1,000,000	
   $10,000.00	
  	
  
Who	
  are	
  these	
  workers	
  
•  Crowdflower	
  
   –  Mechanical	
  Turk	
  
   –  Gambit/Facebook	
  
   –  TrialPay	
  
   –  SamaSource	
  
•  LiveOps	
  
•  CloudCrowd	
  
   –  Facebook	
  
What	
  Can	
  We	
  Evaluate	
  
•  Search	
  Ranking	
  
    –  Query	
  >	
  Item	
  
•  Item/Image	
  Similarity	
  
    –  Item	
  >	
  Item	
  
•  Merchandising	
  
    –  Query	
  >	
  Item	
  
    –  Category	
  >	
  Item	
  
    –  Item	
  >	
  Item	
  
•  Product	
  Tagging	
  
    –  Item	
  >	
  Product	
  
•  Category	
  Recommenda2ons	
  
    –  Item	
  (Title)	
  >	
  Category	
  
Crowdsourced	
  Search	
  Relevance	
  
              Evalua2on	
  
•  What	
  are	
  we	
  measuring	
  
   –  Relevance	
  
•  What	
  are	
  we	
  not	
  measuring	
  
   –  Value	
  
   –  Purchase	
  metrics	
  
   –  Revenue	
  
Industry	
  Standard	
  Sample	
  
•  As	
  in	
  the	
  original	
  DCG	
  formula2on,	
  we’ll	
  be	
  
   using	
  a	
  four-­‐point	
  scale	
  for	
  relevance	
  
   assessment:	
  	
  
•  Irrelevant	
  document	
  (0)	
  	
  
•  Marginally	
  relevant	
  document	
  (1)	
  	
  
•  Fairly	
  relevant	
  document	
  (2)	
  	
  
•  Highly	
  relevant	
  document	
  (3)	
  	
  

     hcp://www.sigir.org/forum/2008D/papers/2008d_sigirforum_alonso.pdf	
  
eBay	
  Search	
  Relevance	
  Crowdsourcing	
  
Great	
  Match	
  
Good	
  Match	
  
Not	
  Matching	
  
Quality	
  
•  Tes2ng	
  
   –  Train/test	
  workers	
  before	
  they	
  start	
  
   –  Mix	
  test	
  ques2ons	
  into	
  the	
  work	
  mix	
  
   –  Discard	
  data	
  from	
  unreliable	
  workers	
  
•  Redundancy	
  
   –  Cost	
  is	
  low	
  >	
  Ask	
  mul2ple	
  workers	
  
   –  Monitor	
  inter-­‐worker	
  agreement	
  
   –  Have	
  trusted	
  workers	
  monitor	
  new	
  workers	
  
   –  Track	
  worker	
  “feedback”	
  over	
  2me	
  
eBay	
  @	
  SIGIR	
  ’10	
  
                                  Ensuring	
  quality	
  in	
  crowdsourced	
  search	
  relevance	
  evalua8on:	
  	
  
                                               The	
  effects	
  of	
  training	
  ques8on	
  distribu8on	
  
                                                                                    	
  
                                       John	
  Le,	
  Andy	
  Edmonds,	
  Vaughn	
  Hester,	
  Lukas	
  Biewald	
  
                                                                                    	
  
The	
  use	
  of	
  crowdsourcing	
  plaiorms	
  like	
  Amazon	
  Mechanical	
  Turk	
  for	
  evalua2ng	
  the	
  relevance	
  of	
  search	
  
results	
   has	
   become	
   an	
   effec2ve	
   strategy	
   that	
   yields	
   results	
   quickly	
   and	
   inexpensively.	
   One	
   approach	
   to	
  
ensure	
   quality	
   of	
   worker	
   judgments	
   is	
   to	
   include	
   an	
   ini2al	
   training	
   period	
   and	
   subsequent	
   sporadic	
  
inser2on	
   of	
   predefined	
   gold	
   standard	
   data	
   (training	
   data).	
   Workers	
   are	
   no2fied	
   or	
   rejected	
   when	
   they	
   err	
  
on	
  the	
  training	
  data,	
  and	
  trust	
  and	
  quality	
  ra2ngs	
  are	
  adjusted	
  accordingly.	
  In	
  this	
  paper,	
  we	
  assess	
  how	
  
this	
  type	
  of	
  dynamic	
  learning	
  environment	
  can	
  affect	
  the	
  workers'	
  results	
  in	
  a	
  search	
  relevance	
  evalua2on	
  
task	
   completed	
   on	
   Amazon	
   Mechanical	
   Turk.	
   Specifically,	
   we	
   show	
   how	
   the	
   distribu2on	
   of	
   training	
   set	
  
answers	
   impacts	
   training	
   of	
   workers	
   and	
   aggregate	
   quality	
   of	
   worker	
   results.	
   We	
   conclude	
   that	
   in	
   a	
  
relevance	
  categoriza2on	
  task,	
  a	
  uniform	
  distribu2on	
  of	
  labels	
  across	
  training	
  data	
  labels	
  produces	
  op2mal	
  
peaks	
  in	
  1)	
  individual	
  worker	
  precision	
  and	
  2)	
  majority	
  vo2ng	
  aggregate	
  result	
  accuracy.	
  
                                                                                    	
  
                                             SIGIR	
  ’10,	
  July	
  19-­‐23,	
  2010,	
  Geneva,	
  Switzerland	
  
Metrics	
  
•  There	
  are	
  standard	
  industry	
  metrics	
  
•  Designed	
  to	
  measure	
  value	
  to	
  the	
  end	
  user	
  
•  Older	
  metrics	
  
    –  Precision	
  &	
  recall	
  (binary	
  relevance,	
  no	
  no2on	
  of	
  
       posi2on)	
  
•  Current	
  metrics	
  
    –  Cumula2ve	
  Gain	
  (overall	
  value	
  of	
  results	
  on	
  a	
  non-­‐
       binary	
  relevance	
  scale)	
  
    –  Discounted	
  (adjusted	
  for	
  posi2on	
  value)	
  
    –  Normalized	
  (common	
  0-­‐1	
  scale)	
  
Judgment	
  Scale	
  Granularity	
  


        Binary	
                           Web	
  Search	
                              SigIR	
                                 3	
  Point	
                           4	
  Point	
  
	
  	
   	
  	
                 	
  	
     	
  Offensive	
        	
  	
     	
  	
                                -­‐1	
  	
  Spam	
                     -­‐2	
  	
  Spam	
  
	
  	
   	
  	
                 	
  	
     	
  	
                	
  	
     	
  	
                                -­‐1	
  	
  Off	
  Topic	
              -­‐2	
  	
  Off	
  Topic	
  
      0	
  	
  Irrelevant	
     	
  	
     	
  Off	
  Topic	
     	
  	
     	
  Irrelevant	
                         0	
  	
  Not	
  Matching	
   -­‐1	
  	
  Not	
  Matching	
  
	
  	
   	
  	
                 	
  	
     	
  Relevant	
        	
  	
     	
  Marginally	
  Relevant	
   	
  	
   	
  	
                        	
  	
   	
  	
  
      1	
  	
  Relevant	
       	
  	
     	
  Useful	
          	
  	
     	
  Fairly	
  Relevant	
                 1	
  	
  Matching	
                    1	
  	
  Good	
  Match	
  
	
  	
   	
  	
                 	
  	
     	
  Vital	
           	
  	
     	
  Highly	
  Relevant	
       	
  	
   	
  	
                                  2	
  	
  Great	
  Match	
  
Rank	
  Discount	
  
                                    Rank	
  Discount	
  d	
  1/r^constant	
  
1.00	
  

0.90	
  

0.80	
  

0.70	
  

0.60	
  

0.50	
  

0.40	
  

0.30	
  

0.20	
  

0.10	
  

0.00	
  
           1	
     2	
     3	
           4	
        5	
        6	
       7	
     8	
     9	
     10	
  
Cumula2ve	
  Gain	
  Metrics	
  
                                                                                                   Normalized	
                                                                                Normalized	
  
                                                                                                   Discounted	
                                                                                Discounted	
  
                                                 Discounted	
   Ideal	
  Rank	
                    Cumula8ve	
   Ideal	
  Rank	
                                                               Cumula8ve	
  
            Human	
   Cumula8ve	
     Rank	
     Cumula8ve	
      Order	
         Ideal	
  DCG	
       Gain	
      Order	
         Ideal	
  DCG	
                                                 Gain	
  
Rank	
     Judgment	
   Gain	
      Discount	
      Gain	
       Observed	
   Observed	
   Observed	
   Theore8cal	
   Theore8cal	
                                                            Theore8cal	
  
  r	
          j	
       cg	
           d	
          dcg	
          io	
            idcgo	
           ndcgo	
        it	
            idcgt	
                                                      ndcgt	
  
                                                                   dcg(n-­‐1)	
  +	
  j	
  *	
                   dcg(n-­‐1)	
  +	
  io	
   dcg(n)	
  /	
                dcg(n-­‐1)	
  +	
  it	
   dcg(n)	
  /	
  idcgt
  	
          0-­‐1	
        	
  +=	
  j	
     1	
  /	
  r^c	
           d	
                       sort(j)	
          *	
  d	
             idcgo(n)	
         1	
           *	
  d	
                   (n)	
  
  1	
         1.0	
          1.00	
             1.00	
                 1.00	
                       1.00	
           1.00	
                  1.00	
          1.00	
        1.00	
                     1.00	
  
  2	
         1.0	
          2.00	
             0.53	
                 1.53	
                       1.00	
           1.53	
                  1.00	
          1.00	
        1.53	
                     1.00	
  
  3	
         0.8	
          2.80	
             0.37	
                 1.83	
                       1.00	
           1.90	
                  0.96	
          1.00	
        1.90	
                     0.96	
  
  4	
         0.0	
          2.80	
             0.28	
                 1.83	
                       1.00	
           2.18	
                  0.84	
          1.00	
        2.18	
                     0.84	
  
  5	
         1.0	
          3.80	
             0.23	
                 2.06	
                       0.80	
           2.37	
                  0.87	
          1.00	
        2.41	
                     0.85	
  
  6	
         0.2	
          4.00	
             0.20	
                 2.10	
                       0.50	
           2.47	
                  0.85	
          1.00	
        2.61	
                     0.80	
  
  7	
         0.2	
          4.20	
             0.17	
                 2.13	
                       0.20	
           2.50	
                  0.85	
          1.00	
        2.78	
                     0.77	
  
  8	
         0.5	
          4.70	
             0.15	
                 2.21	
                       0.20	
           2.53	
                  0.87	
          1.00	
        2.93	
                     0.75	
  
  9	
         1.0	
          5.70	
             0.14	
                 2.34	
                       0.00	
           2.53	
                  0.93	
          1.00	
        3.07	
                     0.76	
  
 10	
         0.0	
          5.70	
             0.12	
                 2.34	
                       0.00	
           2.53	
                  0.93	
          1.00	
        3.19	
                     0.73	
  
Con2nuous	
  Produc2on	
  Evalua2on	
  
•  Daily	
  query	
  sampling/scraping	
  to	
  facilitate	
  
   ongoing	
  monitoring,	
  QA,	
  triage,	
  and	
  post-­‐hoc	
  
   business	
  analysis	
  



     NDCG	
  




                                                             Time	
  
                By	
  Site,	
  Category,	
  Query	
  …	
  
Human	
  Judgment	
  >	
  Query	
  List	
  
Best	
  Match	
  Variant	
  Comparison	
  
Best	
  Match	
  Variant	
  Comparison	
  
Measuring	
  a	
  Ranked	
  List	
  




Huan	
  Liu,	
  Lei	
  Tang	
  and	
  Ni2n	
  Agarwal.	
  Tutorial	
  on	
  Community	
  Detec1on	
  and	
  Behavior	
  Study	
  for	
  Social	
  Compu1ng.	
  
            Presented	
  in	
  The	
  1st	
  IEEE	
  Interna2onal	
  Conference	
  on	
  Social	
  Compu2ng	
  (SocialCom’09),	
  2009.	
  
                hcp://www.iisocialcom.org/conference/socialcom2009/download/SocialCom09-­‐tutorial.pdf	
  
Ranking	
  Evalua2on	
  




hcp://research.microsox.com/en-­‐us/um/people/kevynct/files/ECIR-­‐2010-­‐ML-­‐Tutorial-­‐FinalToPrint.pdf	
  
NDCG	
  -­‐	
  Example	
  




Huan	
  Liu,	
  Lei	
  Tang	
  and	
  Ni2n	
  Agarwal.	
  Tutorial	
  on	
  Community	
  Detec1on	
  and	
  Behavior	
  Study	
  for	
  Social	
  Compu1ng.	
  
            Presented	
  in	
  The	
  1st	
  IEEE	
  Interna2onal	
  Conference	
  on	
  Social	
  Compu2ng	
  (SocialCom’09),	
  2009.	
  
                hcp://www.iisocialcom.org/conference/socialcom2009/download/SocialCom09-­‐tutorial.pdf	
  
Open	
  Ques2ons	
  
•  Discrete	
  vs.	
  Con2nuous	
  relevance	
  scale	
  
•  #	
  of	
  workers	
  
•  Distribu2on	
  of	
  test	
  ques2ons	
  
•  Genera2on	
  of	
  test	
  ques2ons	
  
•  Qualifica2on	
  (demographics,	
  interests,	
  region)	
  
•  Dynamic	
  worker	
  assignment	
  based	
  on	
  
   qualifica2on	
  
•  Mobile	
  workers	
  (untapped	
  pool)	
  
References	
  
•  Discounted	
  Cumula2ve	
  Gain	
  
   –  hcp://en.wikipedia.org/wiki/
      Discounted_cumula2ve_gain	
  
•  hcp://crowdflower.com/	
  
•  hcp://www.cloudcrowd.com/	
  
•  hcp://www.trialpay.com	
  

Más contenido relacionado

Similar a 2011 Crowdsourcing Search Evaluation

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...Matthew Lease
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInDaria Sorokina
 
Metrics As A Learn And Change Agent
Metrics As A Learn And Change AgentMetrics As A Learn And Change Agent
Metrics As A Learn And Change AgentGaetano Mazzanti
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4Khadija Atiya
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsTomer Gabel
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackSease
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ DataikuPAPIs.io
 
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...UK Government Digital Service
 
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...Mario Rodriguez
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajSri Ambati
 
Taguchi Quality Engineering.ppt
Taguchi Quality Engineering.pptTaguchi Quality Engineering.ppt
Taguchi Quality Engineering.pptsorb888
 
Managing Software Debt - Quality Debt Focus - QASIG Kirkland
Managing Software Debt - Quality Debt Focus - QASIG KirklandManaging Software Debt - Quality Debt Focus - QASIG Kirkland
Managing Software Debt - Quality Debt Focus - QASIG KirklandChris Sterling
 
Dollars and Dates are Killing Agile
Dollars and Dates are Killing AgileDollars and Dates are Killing Agile
Dollars and Dates are Killing AgileRally Software
 
Dollars and dates are killing agile final
Dollars and dates are killing agile finalDollars and dates are killing agile final
Dollars and dates are killing agile finaldrewz lin
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 

Similar a 2011 Crowdsourcing Search Evaluation (20)

UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
UT Austin @ TREC 2012 Crowdsourcing Track: Image Relevance Assessment Task (I...
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Aqm Programme Six Sigma
Aqm Programme   Six SigmaAqm Programme   Six Sigma
Aqm Programme Six Sigma
 
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedInRecruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
Recruiters, Job Seekers and Spammers: Innovations in Job Search at LinkedIn
 
Metrics As A Learn And Change Agent
Metrics As A Learn And Change AgentMetrics As A Learn And Change Agent
Metrics As A Learn And Change Agent
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
SQC Guest Lecture- Starbucks
SQC Guest Lecture- StarbucksSQC Guest Lecture- Starbucks
SQC Guest Lecture- Starbucks
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
How to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - HaystackHow to Build your Training Set for a Learning To Rank Project - Haystack
How to Build your Training Set for a Learning To Rank Project - Haystack
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Florian Douetteau @ Dataiku
Florian Douetteau @ DataikuFlorian Douetteau @ Dataiku
Florian Douetteau @ Dataiku
 
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
SPRINT 13 Workshop 1 Agile working methods - Department for Transport, GDS, M...
 
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
RecSys 2012 Dublin Conference Slides - Multiple Objective Optimization in Rec...
 
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian BharadwajH2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
H2O World - Solving Customer Churn with Machine Learning - Julian Bharadwaj
 
Taguchi Quality Engineering.ppt
Taguchi Quality Engineering.pptTaguchi Quality Engineering.ppt
Taguchi Quality Engineering.ppt
 
Managing Software Debt - Quality Debt Focus - QASIG Kirkland
Managing Software Debt - Quality Debt Focus - QASIG KirklandManaging Software Debt - Quality Debt Focus - QASIG Kirkland
Managing Software Debt - Quality Debt Focus - QASIG Kirkland
 
Dollars and Dates are Killing Agile
Dollars and Dates are Killing AgileDollars and Dates are Killing Agile
Dollars and Dates are Killing Agile
 
Dollars and dates are killing agile final
Dollars and dates are killing agile finalDollars and dates are killing agile final
Dollars and dates are killing agile final
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 

Más de Brian Johnson

Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Brian Johnson
 
eBay Search Query Intent
eBay Search Query IntenteBay Search Query Intent
eBay Search Query IntentBrian Johnson
 
2015-04 eBay Statistics
2015-04 eBay Statistics2015-04 eBay Statistics
2015-04 eBay StatisticsBrian Johnson
 
eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015Brian Johnson
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining PresentationBrian Johnson
 
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data AlchemyBrian Johnson
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataBrian Johnson
 
11 964 181 System And Method For Providi
11 964 181 System And Method For Providi11 964 181 System And Method For Providi
11 964 181 System And Method For ProvidiBrian Johnson
 
11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency AssignmenBrian Johnson
 
10 977 279 Method And System For Categor
10 977 279 Method And System For Categor10 977 279 Method And System For Categor
10 977 279 Method And System For CategorBrian Johnson
 
11 869 290 Electronic Publication System
11 869 290 Electronic Publication System11 869 290 Electronic Publication System
11 869 290 Electronic Publication SystemBrian Johnson
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & AcronymsBrian Johnson
 

Más de Brian Johnson (12)

Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
Graph Walks & Vector Embeddings: Exploiting the head and exploring the tail
 
eBay Search Query Intent
eBay Search Query IntenteBay Search Query Intent
eBay Search Query Intent
 
2015-04 eBay Statistics
2015-04 eBay Statistics2015-04 eBay Statistics
2015-04 eBay Statistics
 
eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015eBay Search Science, IEEE Big Data, April 3rd, 2015
eBay Search Science, IEEE Big Data, April 3rd, 2015
 
CloudCon Data Mining Presentation
CloudCon Data Mining PresentationCloudCon Data Mining Presentation
CloudCon Data Mining Presentation
 
2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy2011 x.commerce Innovate Data Alchemy
2011 x.commerce Innovate Data Alchemy
 
Treemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical DataTreemaps: Visualizing Hierarchical and Categorical Data
Treemaps: Visualizing Hierarchical and Categorical Data
 
11 964 181 System And Method For Providi
11 964 181 System And Method For Providi11 964 181 System And Method For Providi
11 964 181 System And Method For Providi
 
11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen11 641 262 Proprietor Currency Assignmen
11 641 262 Proprietor Currency Assignmen
 
10 977 279 Method And System For Categor
10 977 279 Method And System For Categor10 977 279 Method And System For Categor
10 977 279 Method And System For Categor
 
11 869 290 Electronic Publication System
11 869 290 Electronic Publication System11 869 290 Electronic Publication System
11 869 290 Electronic Publication System
 
2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms2011 Search Query Rewrites - Synonyms & Acronyms
2011 Search Query Rewrites - Synonyms & Acronyms
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

2011 Crowdsourcing Search Evaluation

  • 1. Crowdsourcing  search  relevance   evalua2on  at  eBay     Brian  Johnson   September  28,  2011    
  • 2. Agenda   •  Why   •  What   •  How   •  Cost   •  Quality   •  Measurement  
  • 3. Why  Ask  Real  Humans   •  They’re  our  customers   –  Some2mes  asking  is  the  best  way  to  find  out  what  you   want  to  know   –  Provide  ground  truth  for  automated  metrics   •  Provide  data  for   –  Experimental  Evalua2on   •  complements  A/B  tes2ng,  surveys   –  Query  Diagnosis   –  Judged  Test  Corpus   •  Machine  Learning   •  Offline  evalua2on     –  Produc2on  Quality  Control  
  • 4. Why  Crowdsourcing   •  Fast   –  1-­‐3  days   •  Low  Cost   –  pennies  per  judgment   •  High  Quality   –  Mul2ple  workers   –  Worker  evalua2on  (test  ques2ons  &  inter-­‐worker   agreement)   •  Flexible   –  Ask  anything  
  • 6. Cost   Judgments   Cost   1   $0.01     10   $0.10     100   $1.00     1,000   $10.00     10,000   $100.00     100,000   $1,000.00     1,000,000   $10,000.00    
  • 7. Who  are  these  workers   •  Crowdflower   –  Mechanical  Turk   –  Gambit/Facebook   –  TrialPay   –  SamaSource   •  LiveOps   •  CloudCrowd   –  Facebook  
  • 8. What  Can  We  Evaluate   •  Search  Ranking   –  Query  >  Item   •  Item/Image  Similarity   –  Item  >  Item   •  Merchandising   –  Query  >  Item   –  Category  >  Item   –  Item  >  Item   •  Product  Tagging   –  Item  >  Product   •  Category  Recommenda2ons   –  Item  (Title)  >  Category  
  • 9. Crowdsourced  Search  Relevance   Evalua2on   •  What  are  we  measuring   –  Relevance   •  What  are  we  not  measuring   –  Value   –  Purchase  metrics   –  Revenue  
  • 10. Industry  Standard  Sample   •  As  in  the  original  DCG  formula2on,  we’ll  be   using  a  four-­‐point  scale  for  relevance   assessment:     •  Irrelevant  document  (0)     •  Marginally  relevant  document  (1)     •  Fairly  relevant  document  (2)     •  Highly  relevant  document  (3)     hcp://www.sigir.org/forum/2008D/papers/2008d_sigirforum_alonso.pdf  
  • 11. eBay  Search  Relevance  Crowdsourcing  
  • 15. Quality   •  Tes2ng   –  Train/test  workers  before  they  start   –  Mix  test  ques2ons  into  the  work  mix   –  Discard  data  from  unreliable  workers   •  Redundancy   –  Cost  is  low  >  Ask  mul2ple  workers   –  Monitor  inter-­‐worker  agreement   –  Have  trusted  workers  monitor  new  workers   –  Track  worker  “feedback”  over  2me  
  • 16. eBay  @  SIGIR  ’10   Ensuring  quality  in  crowdsourced  search  relevance  evalua8on:     The  effects  of  training  ques8on  distribu8on     John  Le,  Andy  Edmonds,  Vaughn  Hester,  Lukas  Biewald     The  use  of  crowdsourcing  plaiorms  like  Amazon  Mechanical  Turk  for  evalua2ng  the  relevance  of  search   results   has   become   an   effec2ve   strategy   that   yields   results   quickly   and   inexpensively.   One   approach   to   ensure   quality   of   worker   judgments   is   to   include   an   ini2al   training   period   and   subsequent   sporadic   inser2on   of   predefined   gold   standard   data   (training   data).   Workers   are   no2fied   or   rejected   when   they   err   on  the  training  data,  and  trust  and  quality  ra2ngs  are  adjusted  accordingly.  In  this  paper,  we  assess  how   this  type  of  dynamic  learning  environment  can  affect  the  workers'  results  in  a  search  relevance  evalua2on   task   completed   on   Amazon   Mechanical   Turk.   Specifically,   we   show   how   the   distribu2on   of   training   set   answers   impacts   training   of   workers   and   aggregate   quality   of   worker   results.   We   conclude   that   in   a   relevance  categoriza2on  task,  a  uniform  distribu2on  of  labels  across  training  data  labels  produces  op2mal   peaks  in  1)  individual  worker  precision  and  2)  majority  vo2ng  aggregate  result  accuracy.     SIGIR  ’10,  July  19-­‐23,  2010,  Geneva,  Switzerland  
  • 17. Metrics   •  There  are  standard  industry  metrics   •  Designed  to  measure  value  to  the  end  user   •  Older  metrics   –  Precision  &  recall  (binary  relevance,  no  no2on  of   posi2on)   •  Current  metrics   –  Cumula2ve  Gain  (overall  value  of  results  on  a  non-­‐ binary  relevance  scale)   –  Discounted  (adjusted  for  posi2on  value)   –  Normalized  (common  0-­‐1  scale)  
  • 18. Judgment  Scale  Granularity   Binary   Web  Search   SigIR   3  Point   4  Point                Offensive           -­‐1    Spam   -­‐2    Spam                           -­‐1    Off  Topic   -­‐2    Off  Topic   0    Irrelevant        Off  Topic        Irrelevant   0    Not  Matching   -­‐1    Not  Matching                Relevant        Marginally  Relevant                   1    Relevant        Useful        Fairly  Relevant   1    Matching   1    Good  Match                Vital        Highly  Relevant           2    Great  Match  
  • 19. Rank  Discount   Rank  Discount  d  1/r^constant   1.00   0.90   0.80   0.70   0.60   0.50   0.40   0.30   0.20   0.10   0.00   1   2   3   4   5   6   7   8   9   10  
  • 20. Cumula2ve  Gain  Metrics   Normalized   Normalized   Discounted   Discounted   Discounted   Ideal  Rank   Cumula8ve   Ideal  Rank   Cumula8ve   Human   Cumula8ve   Rank   Cumula8ve   Order   Ideal  DCG   Gain   Order   Ideal  DCG   Gain   Rank   Judgment   Gain   Discount   Gain   Observed   Observed   Observed   Theore8cal   Theore8cal   Theore8cal   r   j   cg   d   dcg   io   idcgo   ndcgo   it   idcgt   ndcgt   dcg(n-­‐1)  +  j  *   dcg(n-­‐1)  +  io   dcg(n)  /   dcg(n-­‐1)  +  it   dcg(n)  /  idcgt   0-­‐1    +=  j   1  /  r^c   d   sort(j)   *  d   idcgo(n)   1   *  d   (n)   1   1.0   1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00   1.00   2   1.0   2.00   0.53   1.53   1.00   1.53   1.00   1.00   1.53   1.00   3   0.8   2.80   0.37   1.83   1.00   1.90   0.96   1.00   1.90   0.96   4   0.0   2.80   0.28   1.83   1.00   2.18   0.84   1.00   2.18   0.84   5   1.0   3.80   0.23   2.06   0.80   2.37   0.87   1.00   2.41   0.85   6   0.2   4.00   0.20   2.10   0.50   2.47   0.85   1.00   2.61   0.80   7   0.2   4.20   0.17   2.13   0.20   2.50   0.85   1.00   2.78   0.77   8   0.5   4.70   0.15   2.21   0.20   2.53   0.87   1.00   2.93   0.75   9   1.0   5.70   0.14   2.34   0.00   2.53   0.93   1.00   3.07   0.76   10   0.0   5.70   0.12   2.34   0.00   2.53   0.93   1.00   3.19   0.73  
  • 21. Con2nuous  Produc2on  Evalua2on   •  Daily  query  sampling/scraping  to  facilitate   ongoing  monitoring,  QA,  triage,  and  post-­‐hoc   business  analysis   NDCG   Time   By  Site,  Category,  Query  …  
  • 22. Human  Judgment  >  Query  List  
  • 23. Best  Match  Variant  Comparison  
  • 24. Best  Match  Variant  Comparison  
  • 25. Measuring  a  Ranked  List   Huan  Liu,  Lei  Tang  and  Ni2n  Agarwal.  Tutorial  on  Community  Detec1on  and  Behavior  Study  for  Social  Compu1ng.   Presented  in  The  1st  IEEE  Interna2onal  Conference  on  Social  Compu2ng  (SocialCom’09),  2009.   hcp://www.iisocialcom.org/conference/socialcom2009/download/SocialCom09-­‐tutorial.pdf  
  • 27. NDCG  -­‐  Example   Huan  Liu,  Lei  Tang  and  Ni2n  Agarwal.  Tutorial  on  Community  Detec1on  and  Behavior  Study  for  Social  Compu1ng.   Presented  in  The  1st  IEEE  Interna2onal  Conference  on  Social  Compu2ng  (SocialCom’09),  2009.   hcp://www.iisocialcom.org/conference/socialcom2009/download/SocialCom09-­‐tutorial.pdf  
  • 28. Open  Ques2ons   •  Discrete  vs.  Con2nuous  relevance  scale   •  #  of  workers   •  Distribu2on  of  test  ques2ons   •  Genera2on  of  test  ques2ons   •  Qualifica2on  (demographics,  interests,  region)   •  Dynamic  worker  assignment  based  on   qualifica2on   •  Mobile  workers  (untapped  pool)  
  • 29. References   •  Discounted  Cumula2ve  Gain   –  hcp://en.wikipedia.org/wiki/ Discounted_cumula2ve_gain   •  hcp://crowdflower.com/   •  hcp://www.cloudcrowd.com/   •  hcp://www.trialpay.com