SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Crowdsourcing using Mechanical Turk:
          Quality Management and Scalability



                                                        Panos Ipeirotis

                                               New York University & oDesk



                                                          Twitter: @ipeirotis
Joint work with: Jing Wang, Foster Provost,
Josh Attenberg, and Victor Sheng; Special     “A Computer Scientist in a Business School”
thanks to AdSafe Media                             http://behind-the-enemy-lines.com
Brand advertising not fully embraced
Internet advertising yet…

Afraid of improper brand placement
3   Gabrielle Giffords Shooting, Tucson, AZ, Jan 2011
4
5
New Classification Models Needed
    within days

       Pharmaceutical firm does not want ads to appear:
        –   In pages that discuss swine flu (FDA prohibited pharmaceutical
            company to display drug ad in pages about swine flu)


       Big fast-food chain does not want ads to appear:
        –   In pages that discuss the brand (99% negative sentiment)
        –   In pages discussing obesity, diabetes, cholesterol, etc


       Airline company does not want ads to appear:
        –   In pages with crashes, accidents, …
        –   In pages with discussions of terrorist plots against airlines

6
Need to build models fast

       Traditionally, modeling teams have invested
        substantial internal resources in data collection,
        extraction, cleaning, and other preprocessing
             No time for such things…
       However, now, we can outsource preprocessing tasks,
        such as labeling, feature extraction, verifying
        information extraction, etc.
        –   using Mechanical Turk, oDesk, etc.
        –   quality may be lower than expert labeling (much?)
        –   but low costs can allow massive scale


7
Amazon Mechanical Turk
Example: Build an “Adult Web Site” Classifier


    Need a large number of hand-labeled sites
    Get people to look at sites and classify them as:
   G (general audience) PG (parental guidance) R (restricted) X (porn)



Cost/Speed Statistics
 Undergrad intern: 200 websites/hr, cost: $15/hr
 Mechanical Turk: 2500 websites/hr, cost: $12/hr
Bad news: Spammers!




            Worker ATAMRO447HWJQ
labeled X (porn) sites as G (general audience)
Redundant votes, infer quality

 Look at our lazy friend ATAMRO447HWJQ
 together with other 9 workers




 Using redundancy, we can compute error rates
  for each worker
Algorithm of (Dawid & Skene, 1979)
      [and many recent variations on the same theme]


     Iterative process to estimate worker error rates
1. Initialize“correct” label for each object (e.g., use majority vote)
2. Estimate error rates for workers (using “correct” labels)
3. Estimate “correct” labels (using error rates, weight worker
   votes according to quality)
4. Go to Step 2 and iterate until convergence



      Error rates for ATAMRO447HWJQ
                                                   Our friend ATAMRO447HWJQ
      P[G → G]=99.947%      P[G → X]=0.053%
                                                   marked almost all sites as G.
      P[X → G]=99.153%      P[X → X]=0.847%
                                                      Clickety clickey click…
Challenge: From Confusion
Matrixes to Quality Scores


  Confusion Matrix for ATAMRO447HWJQ
   P[X → X]=0.847%   P[X → G]=99.153%
   P[G → X]=0.053%   P[G → G]=99.947%



   How to check if a worker is a spammer
         using the confusion matrix?
         (hint: error rate not enough)
Challenge 1:
       Spammers are lazy and smart!
Confusion matrix for spammer        Confusion matrix for good worker
   P[X → X]=0% P[X → G]=100%          P[X → X]=80%     P[X → G]=20%
   P[G → X]=0% P[G → G]=100%          P[G → X]=20%     P[G → G]=80%


         Spammers figure out how to fly under the radar…

         In reality, we have 85% G sites and 15% X sites

         Error rate of spammer = 0% * 85% + 100% * 15% = 15%
         Error rate of good worker = 85% * 20% + 85% * 20% = 20%

      False negatives: Spam workers pass as legitimate
Challenge 2:
     Humans are biased!
Error rates for CEO of AdSafe
   P[G → G]=20.0%       P[G → P]=80.0%   P[G → R]=0.0%     P[G → X]=0.0%
   P[P → G]=0.0%        P[P → P]=0.0%    P[P → R]=100.0%   P[P → X]=0.0%
   P[R → G]=0.0%        P[R → P]=0.0%    P[R → R]=100.0%   P[R → X]=0.0%
   P[X → G]=0.0%        P[X → P]=0.0%    P[X → R]=0.0%     P[X → X]=100.0%




     We have 85% G sites, 5% P sites, 5% R sites, 5% X sites

     Error rate of spammer (all G) = 0% * 85% + 100% * 15% = 15%
     Error rate of biased worker = 80% * 85% + 100% * 5% = 73%


    False positives: Legitimate workers appear to be spammers
           (important note: bias is not just a matter of “ordered” classes)
Solution: Reverse errors first,
      compute error rate afterwards
Error Rates for CEO of AdSafe
    P[G → G]=20.0%       P[G → P]=80.0%   P[G → R]=0.0%     P[G → X]=0.0%
    P[P → G]=0.0%        P[P → P]=0.0%    P[P → R]=100.0%   P[P → X]=0.0%
    P[R → G]=0.0%        P[R → P]=0.0%    P[R → R]=100.0%   P[R → X]=0.0%
    P[X → G]=0.0%        P[X → P]=0.0%    P[X → R]=0.0%     P[X → X]=100.0%


            When biased worker says G, it is 100% G
            When biased worker says P, it is 100% G
            When biased worker says R, it is 50% P, 50% R
            When biased worker says X, it is 100% X

         Small ambiguity for “R-rated” votes but other than that, fine!
Solution: Reverse errors first,
     compute error rate afterwards
Error Rates for spammer: ATAMRO447HWJQ
   P[G → G]=100.0%          P[G → P]=0.0%   P[G → R]=0.0%   P[G → X]=0.0%
   P[P → G]=100.0%          P[P → P]=0.0%   P[P → R]=0.0%   P[P → X]=0.0%
   P[R → G]=100.0%          P[R → P]=0.0%   P[R → R]=0.0%   P[R → X]=0.0%
   P[X → G]=100.0%          P[X → P]=0.0%   P[X → R]=0.0%   P[X → X]=0.0%




       When spammer says G, it is 25% G, 25% P, 25% R, 25% X
       When spammer says P, it is 25% G, 25% P, 25% R, 25% X
       When spammer says R, it is 25% G, 25% P, 25% R, 25% X
       When spammer says X, it is 25% G, 25% P, 25% R, 25% X
    [note: assume equal priors]


    The results are highly ambiguous. No information provided!
Expected Misclassification Cost

• High cost: probability spread across classes
• Low cost: “probability mass concentrated in one class

Assigned Label      Corresponding “Soft” Label               Expected
                                                             Label Cost
Spammer: G          <G: 25%, P: 25%, R: 25%, X: 25%>         0.75
Good worker: P      <G: 100%, P: 0%, R: 0%, X: 0%>           0.0

 [***Assume misclassification cost equal to 1, solution generalizes]
Quality Score
    Quality Score: A scalar measure of quality

   • A spammer is a worker who always assigns labels
     randomly, regardless of what the true class is.
                             ExpCost ( Worker)
QualityScore( Worker)  1 
                            ExpCost (Spammer)

   • Scalar score, useful for the purpose of ranking workers




                                                     HCOMP 2010
Instead of blocking: Quality-sensitive Payment

• Threshold-ing rewards gives wrong incentives:
   • Decent (but still useful) workers get fired
   • Uncertainty near the decision threshold
• Instead: Estimate payment level based on quality
   • Set acceptable quality (e.g., 99% accuracy)
   • For workers above quality specs: Pay full price
   • For others: Estimate level of redundancy to reach
     acceptable quality (e.g., Need 5 workers with 90% accuracy or
     13 workers with 80% accuracy to reach 99% accuracy;)
   • Pay full price divided by level of redundancy
Simple example:
     Redundancy and Quality

        Ask multiple labelers, keep majority label as “true” label
        Quality is probability of being correct
                                                   1
                                                        P=1.0
                                                  0.9
                                                        P=0.9
                                                  0.8   P=0.8
                             Integrated quality




     P is probability                             0.7
                                                        P=0.7
     of individual labeler                        0.6   P=0.6
     being correct                                0.5
                                                        P=0.5
                                                  0.4
     P=1.0: perfect                                     P=0.4
                                                  0.3
     P=0.5: random
     P=0.4: adversarial                           0.2
                                                         1      3   5       7        9   11   13
21                                                                  Number of labelers
Implementation

              Open source implementation available at:
             http://code.google.com/p/get-another-label/
                and demo at http://qmturk.appspot.com/
   Input:
    –   Labels from Mechanical Turk
    –   [Optional] Some “gold” labels from trusted labelers
    –   Cost of incorrect classifications (e.g., XG costlier than GX)
   Output:
    –   Corrected labels
    –   Worker error rates
    –   Ranking of workers according to their quality
    –   [Coming soon] Quality-sensitive payment
    –   [Coming soon] Risk-adjusted quality-sensitive payment
Example: Build an “Adult Web Site” Classifier


  Get people to look at sites and classify them as:
 G (general audience) PG (parental guidance) R (restricted) X (porn)


But we are not going to label the whole Internet…
Expensive
Slow
Quality and Classification Performance

     Noisy labels lead to degraded task performance
     Labeling quality increases  classification quality increases
                                                                                                         Quality = 100%
             100
                                                                                                         Quality = 80%
             90
             80
       AUC




                                                                                                          Quality = 60%
             70
             60
                                                                                                          Quality = 50%
             50
             40
                                            100

                                                  120

                                                        140

                                                              160

                                                                    180

                                                                          200

                                                                                220

                                                                                      240

                                                                                            260

                                                                                                  280

                                                                                                        300
                   1

                       20

                             40

                                  60

                                       80




                            Number of examples ("Mushroom" data set)                                    Single-labeler quality
24                                                                                                      (probability of assigning
                                                                                                        correctly a binary label)
Tradeoffs: More data or better data?

        Get more examples  Improve classification
        Get more labels  Improve label quality  Improve classification

                                                                         Quality = 100%
                    100
                                                                         Quality = 80 %
                    90

                    80
         Accuracy




                    70                                                   Quality = 60%

                    60

                    50                                                    Quality = 50%
                    40
                                            0

                                            0

                                            0
                                            0

                                            0
                                            0

                                            0

                                            0
                                            0

                                            0

                                            0
                      1
                          20

                               40
                                    60

                                          80




                                                                                  KDD 2008,
                                         10
                                         12

                                         14

                                         16
                                         18

                                         20

                                         22

                                         24

                                         26
                                         28

                                         30
25                                       Number of examples (Mushroom)            Best paper
                                                                                  runner-up
Summary of Basic Results


     We want to follow the direction that has the highest
     “learning gradient”
      –   Estimate improvement with more data (cross-validation)
      –   Estimate sensitivity to data quality (introduce noise and
          measure degradation in quality)
            Rule-of-thumb results:
            With high quality labelers (85% and above):
            Get more data (One worker per example)
            With low quality labelers (~60-70%):
            Improve quality (Multiple workers per example)
26
Selective Repeated-Labeling

        We do not need to label everything the same way


        Key observation: we have additional information to
         guide selection of data for repeated labeling
            the current multiset of labels
            the current model built from the data


        Example: {+,-,+,-,-,+} vs. {+,+,+,+,+,+}
         –   Will skip details in the talk, see “Repeated Labeling” paper,
             for targeting using item difficulty, and other techniques
27
Selective labeling strategy:
     Model Uncertainty (MU)

        Learning models of the data additional
         source of information about label certainty
        Model uncertainty: get more labels for
         instances that cause model uncertainty in             Examples

         training data (i.e., irregularities!)

                                                                Models
                       +           - -- -
                 +       +                       --    Self-healing process
                 +
                   +   +
                         +
                             +
                               +   - -- - -+- -
                                        -
                   +   +     +      - - - -- - - - -         examines
                         +
                   +   + + +        ---- -----
                                            -            irregularities in
                         + +                               training data
                   +       + +            --
                            +        -- ----
                              +                        This is NOT active
                                                --
28                                                          learning
+       - -- -
                                               +     +
                                                 + + +
                                               + ++ +      - -- - -+- -
                                                                -
     Why does                                    + + +
                                                   + + +
                                                            - - - -- - -
                                                            ---- ----
     Model Uncertainty (MU) work?                   ++
                                                      +
                                                       +
                                                                  --
                                                             --

                                            Self-healing   Examples

       Self-healing MU                      process

                                                            Models
                     “active learning” MU




29
Adult content classification




                                 Round robin

                     Selective




30
Improving worker participation

        With just labeling, workers are passively
         labeling the data that we give them

        But this can be wasteful when positive cases
         are sparse

        Why not asking the workers to search
         themselves and find training data
31
Guided Learning

     Ask workers to find
     example web pages
     (great for “sparse” content)



     After collecting enough
     examples, easy to build
     and test web page
                             http://url-collector.appspot.com/allTopics.jsp
     classifier
32                                                               KDD 2009
Limits of Guided Learning

        No incentives for workers to find “new” content


        After a while, submitted web pages similar to
         already submitted ones


        No improvement for classifier



33
The result? Blissful ignorance…

        Classifier seems great: Cross-validation tests
         show excellent performance



        Alas, classifier fails: The “unknown unknowns” ™
                                  No similar training data in training set

                                  “Unknown unknowns” = classifier
                                  fails with high confidence



34
Beat the Machine!

           Ask humans to find URLs that
              the classifier will classify incorrectly
              another human will classify correctly




                   http://adsafe-beatthemachine.appspot.com/
                               Example:
35    Find hate speech pages that the machine will classify as benign
Probes     Successes




       Error rate for probes significantly higher
     than error rate on (stratified) random data
      (10x to 100x higher than base error rate)
36
Structure of Successful Probes

        Now, we identify errors much
         faster (and proactively)


        Errors not random outliers:
         We can “learn” the errors

        Could not, however, incorporate
         errors into existing classifier without
         degrading performance




37
Unknown unknowns  Known unknowns

        Once humans find the holes, they keep probing
         (e.g., multilingual porn  )


        However, we can learn what we do not know
         (“unknown unknowns”  “known unknowns”)


        We now know the areas where we are likely to be
         wrong

38
Reward Structure for Humans

        High reward higher when:
         –   Classifier confident (but wrong) and
         –   We do not know it will be an error
        Medium reward when:
         –   Classifier confident (but wrong) and
         –   We do know it will be an error
        Low reward when:
39
         –   Classifier already uncertain about outcome
Current Directions

        Learn how to best incorporate knowledge to
         improve classifier


        Measure prevalence of newly identified errors
         on the web (“query by document”)
         –   Increase rewards for errors prevalent in the
             “generalized” case




40
Workers reacting to bad rewards/scores


     Score-based feedback leads to strange interactions:

     The “angry, has-been-burnt-too-many-times” worker:
      “F*** YOU! I am doing everything correctly and you know
       it! Stop trying to reject me with your stupid ‘scores’!”

     The overachiever worker:
      “What am I doing wrong?? My score is 92% and I want to
       have 100%”

41
An unexpected connection at the
     NAS “Frontiers of Science” conf.

                       Your bad
                       workers behave
                       like my mice!




42
An unexpected connection at the
     NAS “Frontiers of Science” conf.

                          Your bad
                          workers behave
                          like my mice!




                    Eh?

43
An unexpected connection at the
     NAS “Frontiers of Science” conf.

                           Your bad workers want
                           to engage their brain
                           only for motor skills,
                           not for cognitive skills




             Yeah, makes
             sense…
44
An unexpected connection at the
     NAS “Frontiers of Science” conf.

                       And here is how
                       I train my mice
                       to behave…




45
An unexpected connection at the
     NAS “Frontiers of Science” conf.


                                    Confuse motor skills!
                                    Reward cognition!




            I should try this the
            moment that I get
            back to my room

46
Implicit Feedback using Frustration


      Punish bad answers with frustration of motor
       skills (e.g., add delays between tasks)
       –   “Loading image, please wait…”
       –   “Image did not load, press here to reload”
       –   “404 error. Return the HIT and accept again”
      Reward good answers by rewarding the
       cognitive part of the brain (e.g, introduce
       variety/novelty, return results fast)

47 →Make this probabilistic to keep feedback implicit
48
First result


        Spammer workers quickly abandon
        Good workers keep labeling

        Bad: Spammer bots unaffected
        How to frustrate a bot?
         –   Give it a CAPTHCA 



49
Second result (more impressive)

        Remember, scheme was for training the mice…

        15% of the spammers start submitting good work!
        Putting cognitive effort is more beneficial (?)

        Key trick: Learn to test workers on-the-fly and
         estimate their quality over streaming data (code
         and paper coming soon…)
50
Thanks!

Q & A?

Más contenido relacionado

La actualidad más candente

SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)
SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)
SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)SSII
 
ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)
ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)
ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)Kaku Okuda
 
応用行動分析家のための科学論文リーディング
応用行動分析家のための科学論文リーディング応用行動分析家のための科学論文リーディング
応用行動分析家のための科学論文リーディングSoichiro Matsuda
 
医療データ解析者へ向けた Git・GitHub 入門
医療データ解析者へ向けた Git・GitHub 入門医療データ解析者へ向けた Git・GitHub 入門
医療データ解析者へ向けた Git・GitHub 入門Yui Tomo
 
ObserverパターンからはじめるUniRx
ObserverパターンからはじめるUniRx ObserverパターンからはじめるUniRx
ObserverパターンからはじめるUniRx torisoup
 
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調智啓 出川
 
Gazebo/ROSで力覚センサプラグインを使う
Gazebo/ROSで力覚センサプラグインを使うGazebo/ROSで力覚センサプラグインを使う
Gazebo/ROSで力覚センサプラグインを使うHDeanK
 
ニューロテクノロジーの課題と未来
ニューロテクノロジーの課題と未来ニューロテクノロジーの課題と未来
ニューロテクノロジーの課題と未来Hiro Hamada
 
Deep learning for_extreme_multi-label_text_classification
Deep learning for_extreme_multi-label_text_classificationDeep learning for_extreme_multi-label_text_classification
Deep learning for_extreme_multi-label_text_classificationJunya Kamura
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLMDeep Learning JP
 
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたいTakuji Tahara
 
居場所を隠すために差分プライバシーを使おう
居場所を隠すために差分プライバシーを使おう居場所を隠すために差分プライバシーを使おう
居場所を隠すために差分プライバシーを使おうHiroshi Nakagawa
 
Learning to Resize Images for Computer Vision Tasks
Learning to Resize Images for Computer Vision TasksLearning to Resize Images for Computer Vision Tasks
Learning to Resize Images for Computer Vision Tasksharmonylab
 
CycleGANで顔写真をアニメ調に変換する
CycleGANで顔写真をアニメ調に変換するCycleGANで顔写真をアニメ調に変換する
CycleGANで顔写真をアニメ調に変換するmeownoisy
 
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築NVIDIA Japan
 
Python nlp handson_20220225_v5
Python nlp handson_20220225_v5Python nlp handson_20220225_v5
Python nlp handson_20220225_v5博三 太田
 
Paper: Objects as Points(CenterNet)
Paper: Objects as Points(CenterNet)Paper: Objects as Points(CenterNet)
Paper: Objects as Points(CenterNet)Yusuke Fujimoto
 
論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」
論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」
論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」Kaoru Nasuno
 
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted WindowsToru Tamaki
 

La actualidad más candente (20)

SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)
SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)
SSII2020 [OS2] 限られたデータからの深層学習 (オーガナイザーによる冒頭の導入)
 
ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)
ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)
ゲーム屋ですがこんな風にXD使ってます(Xd勉強会20180316)
 
応用行動分析家のための科学論文リーディング
応用行動分析家のための科学論文リーディング応用行動分析家のための科学論文リーディング
応用行動分析家のための科学論文リーディング
 
医療データ解析者へ向けた Git・GitHub 入門
医療データ解析者へ向けた Git・GitHub 入門医療データ解析者へ向けた Git・GitHub 入門
医療データ解析者へ向けた Git・GitHub 入門
 
ObserverパターンからはじめるUniRx
ObserverパターンからはじめるUniRx ObserverパターンからはじめるUniRx
ObserverパターンからはじめるUniRx
 
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
2015年度先端GPGPUシミュレーション工学特論 第15回 CPUとGPUの協調
 
Gazebo/ROSで力覚センサプラグインを使う
Gazebo/ROSで力覚センサプラグインを使うGazebo/ROSで力覚センサプラグインを使う
Gazebo/ROSで力覚センサプラグインを使う
 
ニューロテクノロジーの課題と未来
ニューロテクノロジーの課題と未来ニューロテクノロジーの課題と未来
ニューロテクノロジーの課題と未来
 
Deep learning for_extreme_multi-label_text_classification
Deep learning for_extreme_multi-label_text_classificationDeep learning for_extreme_multi-label_text_classification
Deep learning for_extreme_multi-label_text_classification
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM【DL輪読会】大量API・ツールの扱いに特化したLLM
【DL輪読会】大量API・ツールの扱いに特化したLLM
 
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
【LT資料】 Neural Network 素人なんだけど何とかご機嫌取りをしたい
 
居場所を隠すために差分プライバシーを使おう
居場所を隠すために差分プライバシーを使おう居場所を隠すために差分プライバシーを使おう
居場所を隠すために差分プライバシーを使おう
 
Learning to Resize Images for Computer Vision Tasks
Learning to Resize Images for Computer Vision TasksLearning to Resize Images for Computer Vision Tasks
Learning to Resize Images for Computer Vision Tasks
 
CycleGANで顔写真をアニメ調に変換する
CycleGANで顔写真をアニメ調に変換するCycleGANで顔写真をアニメ調に変換する
CycleGANで顔写真をアニメ調に変換する
 
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
TensorRT Inference Serverではじめる、 高性能な推論サーバ構築
 
Python nlp handson_20220225_v5
Python nlp handson_20220225_v5Python nlp handson_20220225_v5
Python nlp handson_20220225_v5
 
Paper: Objects as Points(CenterNet)
Paper: Objects as Points(CenterNet)Paper: Objects as Points(CenterNet)
Paper: Objects as Points(CenterNet)
 
論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」
論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」
論文輪読資料「FaceNet: A Unified Embedding for Face Recognition and Clustering」
 
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
文献紹介:Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows
 

Similar a Crowdsourcing using Mechanical Turk: Quality Management and Scalability

New York Mechanical Turk Meetup
New York Mechanical Turk MeetupNew York Mechanical Turk Meetup
New York Mechanical Turk MeetupPanos Ipeirotis
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...OpenSource Connections
 
presentation_python_11_1569171345_375360.pptx
presentation_python_11_1569171345_375360.pptxpresentation_python_11_1569171345_375360.pptx
presentation_python_11_1569171345_375360.pptxGAURAVRATHORE86
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsMichael Manapat
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 
Causal Inference in R
Causal Inference in RCausal Inference in R
Causal Inference in RAna Daglis
 
Csci101 lect08a matlab_programs
Csci101 lect08a matlab_programsCsci101 lect08a matlab_programs
Csci101 lect08a matlab_programsElsayed Hemayed
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boostingbutest
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference BerlinTom Capper
 
Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01宁 梅
 
Cs291 assignment solution
Cs291 assignment solutionCs291 assignment solution
Cs291 assignment solutionKuntal Bhowmick
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep LearningCloudxLab
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learningknowbigdata
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep LearningShubhWadekar
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonTom Capper
 
Airbnb offline experiments
Airbnb offline experimentsAirbnb offline experiments
Airbnb offline experimentsElena Grewal
 

Similar a Crowdsourcing using Mechanical Turk: Quality Management and Scalability (20)

New York Mechanical Turk Meetup
New York Mechanical Turk MeetupNew York Mechanical Turk Meetup
New York Mechanical Turk Meetup
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
presentation_python_11_1569171345_375360.pptx
presentation_python_11_1569171345_375360.pptxpresentation_python_11_1569171345_375360.pptx
presentation_python_11_1569171345_375360.pptx
 
Counterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning modelsCounterfactual evaluation of machine learning models
Counterfactual evaluation of machine learning models
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Causal Inference in R
Causal Inference in RCausal Inference in R
Causal Inference in R
 
Csci101 lect08a matlab_programs
Csci101 lect08a matlab_programsCsci101 lect08a matlab_programs
Csci101 lect08a matlab_programs
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
Conversion Conference Berlin
Conversion Conference BerlinConversion Conference Berlin
Conversion Conference Berlin
 
Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01Ning_Mei.ASSIGN01
Ning_Mei.ASSIGN01
 
Cs291 assignment solution
Cs291 assignment solutionCs291 assignment solution
Cs291 assignment solution
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Statistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference LondonStatistics for CRO - Conversion Conference London
Statistics for CRO - Conversion Conference London
 
Predictive Testing
Predictive TestingPredictive Testing
Predictive Testing
 
Naive Bayes
Naive Bayes Naive Bayes
Naive Bayes
 
Airbnb offline experiments
Airbnb offline experimentsAirbnb offline experiments
Airbnb offline experiments
 

Más de Panos Ipeirotis

Quizz: Targeted Crowdsourcing with a Billion (Potential) Users
Quizz: Targeted Crowdsourcing with a Billion (Potential) UsersQuizz: Targeted Crowdsourcing with a Billion (Potential) Users
Quizz: Targeted Crowdsourcing with a Billion (Potential) UsersPanos Ipeirotis
 
Humanities and Technology Unite
Humanities and Technology UniteHumanities and Technology Unite
Humanities and Technology UnitePanos Ipeirotis
 
The Market for Intellect: Discovering economically-rewarding education paths
The Market for Intellect: Discovering economically-rewarding education pathsThe Market for Intellect: Discovering economically-rewarding education paths
The Market for Intellect: Discovering economically-rewarding education pathsPanos Ipeirotis
 
On Mice and Men: The Role of Biology in Crowdsourcing
On Mice and Men: The Role of Biology in CrowdsourcingOn Mice and Men: The Role of Biology in Crowdsourcing
On Mice and Men: The Role of Biology in CrowdsourcingPanos Ipeirotis
 
Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...
Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...
Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...Panos Ipeirotis
 
Crowdsourcing: Lessons from Henry Ford
Crowdsourcing: Lessons from Henry FordCrowdsourcing: Lessons from Henry Ford
Crowdsourcing: Lessons from Henry FordPanos Ipeirotis
 
Managing Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A TutorialManaging Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A TutorialPanos Ipeirotis
 

Más de Panos Ipeirotis (7)

Quizz: Targeted Crowdsourcing with a Billion (Potential) Users
Quizz: Targeted Crowdsourcing with a Billion (Potential) UsersQuizz: Targeted Crowdsourcing with a Billion (Potential) Users
Quizz: Targeted Crowdsourcing with a Billion (Potential) Users
 
Humanities and Technology Unite
Humanities and Technology UniteHumanities and Technology Unite
Humanities and Technology Unite
 
The Market for Intellect: Discovering economically-rewarding education paths
The Market for Intellect: Discovering economically-rewarding education pathsThe Market for Intellect: Discovering economically-rewarding education paths
The Market for Intellect: Discovering economically-rewarding education paths
 
On Mice and Men: The Role of Biology in Crowdsourcing
On Mice and Men: The Role of Biology in CrowdsourcingOn Mice and Men: The Role of Biology in Crowdsourcing
On Mice and Men: The Role of Biology in Crowdsourcing
 
Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...
Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...
Big Data, Stupid Decisions / Strata Jumpstart 2011 / Panos Ipeirotis / http:/...
 
Crowdsourcing: Lessons from Henry Ford
Crowdsourcing: Lessons from Henry FordCrowdsourcing: Lessons from Henry Ford
Crowdsourcing: Lessons from Henry Ford
 
Managing Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A TutorialManaging Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A Tutorial
 

Último

How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistanvineshkumarsajnani12
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfwill854175
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...ssuserf63bd7
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165meghakumariji156
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGpr788182
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Availablepr788182
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubaijaehdlyzca
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizharallensay1
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 MonthsIndeedSEO
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAITim Wilson
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwaitdaisycvs
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...meghakumariji156
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon investment
 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Availablepr788182
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptxRoofing Contractor
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowranineha57744
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 

Último (20)

How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Arti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdfArti Languages Pre Seed Teaser Deck 2024.pdf
Arti Languages Pre Seed Teaser Deck 2024.pdf
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
Falcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business PotentialFalcon Invoice Discounting: Unlock Your Business Potential
Falcon Invoice Discounting: Unlock Your Business Potential
 
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service AvailableBerhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
Berhampur Call Girl Just Call 8084732287 Top Class Call Girl Service Available
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 

Crowdsourcing using Mechanical Turk: Quality Management and Scalability

  • 1. Crowdsourcing using Mechanical Turk: Quality Management and Scalability Panos Ipeirotis New York University & oDesk Twitter: @ipeirotis Joint work with: Jing Wang, Foster Provost, Josh Attenberg, and Victor Sheng; Special “A Computer Scientist in a Business School” thanks to AdSafe Media http://behind-the-enemy-lines.com
  • 2. Brand advertising not fully embraced Internet advertising yet… Afraid of improper brand placement
  • 3. 3 Gabrielle Giffords Shooting, Tucson, AZ, Jan 2011
  • 4. 4
  • 5. 5
  • 6. New Classification Models Needed within days  Pharmaceutical firm does not want ads to appear: – In pages that discuss swine flu (FDA prohibited pharmaceutical company to display drug ad in pages about swine flu)  Big fast-food chain does not want ads to appear: – In pages that discuss the brand (99% negative sentiment) – In pages discussing obesity, diabetes, cholesterol, etc  Airline company does not want ads to appear: – In pages with crashes, accidents, … – In pages with discussions of terrorist plots against airlines 6
  • 7. Need to build models fast  Traditionally, modeling teams have invested substantial internal resources in data collection, extraction, cleaning, and other preprocessing No time for such things…  However, now, we can outsource preprocessing tasks, such as labeling, feature extraction, verifying information extraction, etc. – using Mechanical Turk, oDesk, etc. – quality may be lower than expert labeling (much?) – but low costs can allow massive scale 7
  • 9. Example: Build an “Adult Web Site” Classifier  Need a large number of hand-labeled sites  Get people to look at sites and classify them as: G (general audience) PG (parental guidance) R (restricted) X (porn) Cost/Speed Statistics  Undergrad intern: 200 websites/hr, cost: $15/hr  Mechanical Turk: 2500 websites/hr, cost: $12/hr
  • 10. Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience)
  • 11. Redundant votes, infer quality Look at our lazy friend ATAMRO447HWJQ together with other 9 workers  Using redundancy, we can compute error rates for each worker
  • 12. Algorithm of (Dawid & Skene, 1979) [and many recent variations on the same theme] Iterative process to estimate worker error rates 1. Initialize“correct” label for each object (e.g., use majority vote) 2. Estimate error rates for workers (using “correct” labels) 3. Estimate “correct” labels (using error rates, weight worker votes according to quality) 4. Go to Step 2 and iterate until convergence Error rates for ATAMRO447HWJQ Our friend ATAMRO447HWJQ P[G → G]=99.947% P[G → X]=0.053% marked almost all sites as G. P[X → G]=99.153% P[X → X]=0.847% Clickety clickey click…
  • 13. Challenge: From Confusion Matrixes to Quality Scores Confusion Matrix for ATAMRO447HWJQ  P[X → X]=0.847% P[X → G]=99.153%  P[G → X]=0.053% P[G → G]=99.947% How to check if a worker is a spammer using the confusion matrix? (hint: error rate not enough)
  • 14. Challenge 1: Spammers are lazy and smart! Confusion matrix for spammer Confusion matrix for good worker  P[X → X]=0% P[X → G]=100%  P[X → X]=80% P[X → G]=20%  P[G → X]=0% P[G → G]=100%  P[G → X]=20% P[G → G]=80%  Spammers figure out how to fly under the radar…  In reality, we have 85% G sites and 15% X sites  Error rate of spammer = 0% * 85% + 100% * 15% = 15%  Error rate of good worker = 85% * 20% + 85% * 20% = 20% False negatives: Spam workers pass as legitimate
  • 15. Challenge 2: Humans are biased! Error rates for CEO of AdSafe P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%  We have 85% G sites, 5% P sites, 5% R sites, 5% X sites  Error rate of spammer (all G) = 0% * 85% + 100% * 15% = 15%  Error rate of biased worker = 80% * 85% + 100% * 5% = 73% False positives: Legitimate workers appear to be spammers (important note: bias is not just a matter of “ordered” classes)
  • 16. Solution: Reverse errors first, compute error rate afterwards Error Rates for CEO of AdSafe P[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0% P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0% P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=100.0%  When biased worker says G, it is 100% G  When biased worker says P, it is 100% G  When biased worker says R, it is 50% P, 50% R  When biased worker says X, it is 100% X Small ambiguity for “R-rated” votes but other than that, fine!
  • 17. Solution: Reverse errors first, compute error rate afterwards Error Rates for spammer: ATAMRO447HWJQ P[G → G]=100.0% P[G → P]=0.0% P[G → R]=0.0% P[G → X]=0.0% P[P → G]=100.0% P[P → P]=0.0% P[P → R]=0.0% P[P → X]=0.0% P[R → G]=100.0% P[R → P]=0.0% P[R → R]=0.0% P[R → X]=0.0% P[X → G]=100.0% P[X → P]=0.0% P[X → R]=0.0% P[X → X]=0.0%  When spammer says G, it is 25% G, 25% P, 25% R, 25% X  When spammer says P, it is 25% G, 25% P, 25% R, 25% X  When spammer says R, it is 25% G, 25% P, 25% R, 25% X  When spammer says X, it is 25% G, 25% P, 25% R, 25% X [note: assume equal priors] The results are highly ambiguous. No information provided!
  • 18. Expected Misclassification Cost • High cost: probability spread across classes • Low cost: “probability mass concentrated in one class Assigned Label Corresponding “Soft” Label Expected Label Cost Spammer: G <G: 25%, P: 25%, R: 25%, X: 25%> 0.75 Good worker: P <G: 100%, P: 0%, R: 0%, X: 0%> 0.0 [***Assume misclassification cost equal to 1, solution generalizes]
  • 19. Quality Score Quality Score: A scalar measure of quality • A spammer is a worker who always assigns labels randomly, regardless of what the true class is. ExpCost ( Worker) QualityScore( Worker)  1  ExpCost (Spammer) • Scalar score, useful for the purpose of ranking workers HCOMP 2010
  • 20. Instead of blocking: Quality-sensitive Payment • Threshold-ing rewards gives wrong incentives: • Decent (but still useful) workers get fired • Uncertainty near the decision threshold • Instead: Estimate payment level based on quality • Set acceptable quality (e.g., 99% accuracy) • For workers above quality specs: Pay full price • For others: Estimate level of redundancy to reach acceptable quality (e.g., Need 5 workers with 90% accuracy or 13 workers with 80% accuracy to reach 99% accuracy;) • Pay full price divided by level of redundancy
  • 21. Simple example: Redundancy and Quality  Ask multiple labelers, keep majority label as “true” label  Quality is probability of being correct 1 P=1.0 0.9 P=0.9 0.8 P=0.8 Integrated quality P is probability 0.7 P=0.7 of individual labeler 0.6 P=0.6 being correct 0.5 P=0.5 0.4 P=1.0: perfect P=0.4 0.3 P=0.5: random P=0.4: adversarial 0.2 1 3 5 7 9 11 13 21 Number of labelers
  • 22. Implementation Open source implementation available at: http://code.google.com/p/get-another-label/ and demo at http://qmturk.appspot.com/  Input: – Labels from Mechanical Turk – [Optional] Some “gold” labels from trusted labelers – Cost of incorrect classifications (e.g., XG costlier than GX)  Output: – Corrected labels – Worker error rates – Ranking of workers according to their quality – [Coming soon] Quality-sensitive payment – [Coming soon] Risk-adjusted quality-sensitive payment
  • 23. Example: Build an “Adult Web Site” Classifier  Get people to look at sites and classify them as: G (general audience) PG (parental guidance) R (restricted) X (porn) But we are not going to label the whole Internet… Expensive Slow
  • 24. Quality and Classification Performance Noisy labels lead to degraded task performance Labeling quality increases  classification quality increases Quality = 100% 100 Quality = 80% 90 80 AUC Quality = 60% 70 60 Quality = 50% 50 40 100 120 140 160 180 200 220 240 260 280 300 1 20 40 60 80 Number of examples ("Mushroom" data set) Single-labeler quality 24 (probability of assigning correctly a binary label)
  • 25. Tradeoffs: More data or better data?  Get more examples  Improve classification  Get more labels  Improve label quality  Improve classification Quality = 100% 100 Quality = 80 % 90 80 Accuracy 70 Quality = 60% 60 50 Quality = 50% 40 0 0 0 0 0 0 0 0 0 0 0 1 20 40 60 80 KDD 2008, 10 12 14 16 18 20 22 24 26 28 30 25 Number of examples (Mushroom) Best paper runner-up
  • 26. Summary of Basic Results We want to follow the direction that has the highest “learning gradient” – Estimate improvement with more data (cross-validation) – Estimate sensitivity to data quality (introduce noise and measure degradation in quality) Rule-of-thumb results: With high quality labelers (85% and above): Get more data (One worker per example) With low quality labelers (~60-70%): Improve quality (Multiple workers per example) 26
  • 27. Selective Repeated-Labeling  We do not need to label everything the same way  Key observation: we have additional information to guide selection of data for repeated labeling  the current multiset of labels  the current model built from the data  Example: {+,-,+,-,-,+} vs. {+,+,+,+,+,+} – Will skip details in the talk, see “Repeated Labeling” paper, for targeting using item difficulty, and other techniques 27
  • 28. Selective labeling strategy: Model Uncertainty (MU)  Learning models of the data additional source of information about label certainty  Model uncertainty: get more labels for instances that cause model uncertainty in Examples training data (i.e., irregularities!) Models + - -- - + + -- Self-healing process + + + + + + - -- - -+- - - + + + - - - -- - - - - examines + + + + + ---- ----- - irregularities in + + training data + + + -- + -- ---- + This is NOT active -- 28 learning
  • 29. + - -- - + + + + + + ++ + - -- - -+- - - Why does + + + + + + - - - -- - - ---- ---- Model Uncertainty (MU) work? ++ + + -- -- Self-healing Examples Self-healing MU process Models “active learning” MU 29
  • 30. Adult content classification Round robin Selective 30
  • 31. Improving worker participation  With just labeling, workers are passively labeling the data that we give them  But this can be wasteful when positive cases are sparse  Why not asking the workers to search themselves and find training data 31
  • 32. Guided Learning Ask workers to find example web pages (great for “sparse” content) After collecting enough examples, easy to build and test web page http://url-collector.appspot.com/allTopics.jsp classifier 32 KDD 2009
  • 33. Limits of Guided Learning  No incentives for workers to find “new” content  After a while, submitted web pages similar to already submitted ones  No improvement for classifier 33
  • 34. The result? Blissful ignorance…  Classifier seems great: Cross-validation tests show excellent performance  Alas, classifier fails: The “unknown unknowns” ™ No similar training data in training set “Unknown unknowns” = classifier fails with high confidence 34
  • 35. Beat the Machine! Ask humans to find URLs that  the classifier will classify incorrectly  another human will classify correctly http://adsafe-beatthemachine.appspot.com/ Example: 35 Find hate speech pages that the machine will classify as benign
  • 36. Probes Successes Error rate for probes significantly higher than error rate on (stratified) random data (10x to 100x higher than base error rate) 36
  • 37. Structure of Successful Probes  Now, we identify errors much faster (and proactively)  Errors not random outliers: We can “learn” the errors  Could not, however, incorporate errors into existing classifier without degrading performance 37
  • 38. Unknown unknowns  Known unknowns  Once humans find the holes, they keep probing (e.g., multilingual porn  )  However, we can learn what we do not know (“unknown unknowns”  “known unknowns”)  We now know the areas where we are likely to be wrong 38
  • 39. Reward Structure for Humans  High reward higher when: – Classifier confident (but wrong) and – We do not know it will be an error  Medium reward when: – Classifier confident (but wrong) and – We do know it will be an error  Low reward when: 39 – Classifier already uncertain about outcome
  • 40. Current Directions  Learn how to best incorporate knowledge to improve classifier  Measure prevalence of newly identified errors on the web (“query by document”) – Increase rewards for errors prevalent in the “generalized” case 40
  • 41. Workers reacting to bad rewards/scores Score-based feedback leads to strange interactions: The “angry, has-been-burnt-too-many-times” worker:  “F*** YOU! I am doing everything correctly and you know it! Stop trying to reject me with your stupid ‘scores’!” The overachiever worker:  “What am I doing wrong?? My score is 92% and I want to have 100%” 41
  • 42. An unexpected connection at the NAS “Frontiers of Science” conf. Your bad workers behave like my mice! 42
  • 43. An unexpected connection at the NAS “Frontiers of Science” conf. Your bad workers behave like my mice! Eh? 43
  • 44. An unexpected connection at the NAS “Frontiers of Science” conf. Your bad workers want to engage their brain only for motor skills, not for cognitive skills Yeah, makes sense… 44
  • 45. An unexpected connection at the NAS “Frontiers of Science” conf. And here is how I train my mice to behave… 45
  • 46. An unexpected connection at the NAS “Frontiers of Science” conf. Confuse motor skills! Reward cognition! I should try this the moment that I get back to my room 46
  • 47. Implicit Feedback using Frustration  Punish bad answers with frustration of motor skills (e.g., add delays between tasks) – “Loading image, please wait…” – “Image did not load, press here to reload” – “404 error. Return the HIT and accept again”  Reward good answers by rewarding the cognitive part of the brain (e.g, introduce variety/novelty, return results fast) 47 →Make this probabilistic to keep feedback implicit
  • 48. 48
  • 49. First result  Spammer workers quickly abandon  Good workers keep labeling  Bad: Spammer bots unaffected  How to frustrate a bot? – Give it a CAPTHCA  49
  • 50. Second result (more impressive)  Remember, scheme was for training the mice…  15% of the spammers start submitting good work!  Putting cognitive effort is more beneficial (?)  Key trick: Learn to test workers on-the-fly and estimate their quality over streaming data (code and paper coming soon…) 50