SlideShare una empresa de Scribd logo
1 de 28
Thousands of Online
Observers is Just the
Beginning

Nathan Moroney, HP Labs

Human Vision and Electronic Imaging XIV
Session 2: Social Software, Internet Experiments and New Paradigms for the Web
Monday, January 19, 2009, 1:00-1:30 PM
Outline
• Brief     History of Crowd-Sourcing
• Online      Experiments
     − Unconstrained color naming
     − Color name comparison
     − Color difference description
     − Image quality description
     − World Wide Gamma
• Online      Tools
     − Color Thesaurus, Color Zeitgeist & Italian Color Thesaurus
• Eight     considerations


1/27/2009                                                     2
Brief History of Crowdsourcing: Part 1

   “Since the beginning, it was
   just the same. The only
   difference, the crowds are
   bigger now.”
                           Elvis




1/27/2009                                3
Brief History of Crowdsourcing: Part 2

        “The future belongs to crowds.”
        Mao II
        Don Delillo




            (Left as an exercise for the audience to do an Elvis – Delillo mash-up)


1/27/2009                                                                             4
Online Experiments
• Basic      pieces
   − Experimental design – unconstrained text
   − Software, a server – JavaScript
   − Communication network –World Wide Web
   − Participants - volunteers
• Results
   − Direct Data
   − Usage Data
• Optional      but useful – lab data for validation



 1/27/2009                                             5
Unconstrained Color Naming
• Seven        colored patches
• Randomly        selected
   − 6x6x6 RGB sampling
• Text       field for names
• Provide       the “best” name
• Optional       comments
• Started       in 2002




 1/27/2009                        6
On-Line vs. Berlin & Kay
                                        CIECAM02Hue Angle
                                        CIECAM02 hue angle



                              y = 0.9971x + 28.986
                     360
                                 2
                                R = 0.9859

                     270
      Berlin & Kay


                     180




                     90




                      0
                          0        90        180        270   360

                                              On-Line
                                              Web
1/27/2009                                                           7
Color Name Comparison
• Text       only
• Eleven       color names
• Non-repeating         random
 walk
• Eleven       triads
   − Which color is least like the
     other two?
• Collect
       additional
 demographic data


 1/27/2009                           8
Clustering Nominal Comparisons




1/27/2009                        9
Color Difference Description
• Five       pairs of colored patches.
• Best       describe the difference
• Text       field per pair
   − Unconstrained description
• Randomly         sample RGB cube
   − Constrained RGB offsets




 1/27/2009                               10
Frequencies of Words
                                     0.048   right
                                     0.045   more
                                     0.031   left

         is six times as frequent
• ‘More’
                                     0.028   one
                                     0.018   color

 as ‘less’                           0.017   green
                                     0.017   darker

• ‘Darker’ is twice as frequent
                                     0.015   blue
                                     0.012   than

 as ‘lighter’,                       0.012   saturated
                                     0.011   patch
   − same for ‘dark’ and ‘light’     0.011   first
                                     0.010   purple

• Lime and magenta are not in
                                     0.009   lighter
                                     0.009   second
 the top 100 terms –                 0.008   dark
                                     0.007   less
   − But they are in the top 10 of   0.007   brown

     unconstrained naming.           0.007   red
                                     0.006   different
                                     0.006   yellow
                                     0.006   difference
                                     0.006   brighter
                                     0.006   hue
                                     0.005   pink
 1/27/2009                                                11
Image Quality Description
    Overall and specific
•
    description of image quality
    Demographic questions
•

                Proportion vs. Token
           0.089            the
           0.033            of
           0.032            is
           0.031            and
                            color(s)
           0.021
           0.017            to
           0.016            good
           0.014            on
           0.014            a
           0.013            in

    1/27/2009                          12
Opt-In Demographics: n=338
                                             Non-Native
            Male                                    35%
            44%                 Female
                                                                        Native
                                56%
                                           English
Gender                                                                  65%
                                         Proficiency
                                                                  Maybe
                        >60
                                                               1% Color Blind
              40-60           < 20
                        1%                     Don’t Know 9%
                                                                      Definitely
               17%                                              1%
                              23%
                                                                      Color Blind

                                            Color
     Age                                    Vision
     (years)                             (self-described)

                      59%                                       89%
                                                               Normal
                      20-40
1/27/2009                                                                     13
World Wide Gamma
• Lightness
          partitioning task, benchmark to a nominal
 display and existing lightness scales, such as L*.




                                        After


               Before
 1/27/2009                                        14
World Wide Gamma
• Red is >600
 participants
• Black is current
 results
• Specific
 experimental
 feedback
• Offsetfor darkest
 levels but quite
 linear

 1/27/2009            15
Online Color Thesaurus
• Interface   to the underlying database of color names
• Largest    number of users




 1/27/2009                                            16
Color Zeitgeist
• Usage data – tools use creates data which in turn
 creates another tool




1/27/2009                                             17
Italian Color Thesaurus
• Italian    data < English data
• Adaptive     tools
   − Qualification through ratings
   − Quantity through instance-
     based harvesting, collect new
     data only for missing colors




 1/27/2009                           18
Consideration 1: Scale
• Yes        online experiments mean bigger crowds
   − Larger & more diverse pool of possible participants
   − Logarithmic scale of participation

                                   Stanford
                            HP                 Palo        San
                                                      HP
        Department                                                  California
                                    (under)
                           Labs                Alto        Jose
  1           10     100          1K     10K     100K      1M     10M     100M




                                 English                    Application      OS
       Lab                                          Color
                               Web-based                      Based        Based
   Prototypes &                                   Thesaurus
                              Color naming                    Color         Color
   Experiments
                               experiment                     Picker       Picker


 1/27/2009                                                                       19
Observers per Experiment by Year
                                          10000




                                           1000
            Log of the Number Observers




                                                                                               These
                                                                                               should also
                                                                                               have error
                                                                                               bars and
                                            100
                                                                                               connecting
                                                                                               lines…


                                             10




                                              1
                                              1990   1995         2000           2005   2010
                                                            Experiment by Year
1/27/2009                                                                                                    20
Consideration 2: Distributed Design
• Minimize   the effort from any single participant
   − Increase volunteer participation rate?
   − Minimize impact of an single, systematically disruptive
     participant
•A ‘knob’ that can be used to dial the target “time to
 completion” for any given web participant
• Applicable   to even relatively complex tasks
   − Triadic comparison


                     vs.

 1/27/2009                                                     21
Consideration 3: Ambiguity
• Lack       of constraints is a trade-off
   − May make the task more difficult for observers
   − May enable a different set of questions
   − General bias is towards unconstrained tasks
   − Implicitly include real world variability
• Sourcesof variability are vast, robustness comes
 from scale – and a focus categories not thresholds

                    “wasn’t sure whether you wanted
                    accurate or poetic names.”
                                             Anonymous Comment
                                                   June 8, 2002


 1/27/2009                                                        22
Consideration 4:         Hypotheses vs Training
• Thresholds    versus Categories
• Individual   performance versus collective capability
• Numbers      versus Words

                                                Pixel by pixel
                                                machine color
                                                naming – see -
                                                ‘Lexical Image
                                                Processing’
                                                CIC 16




 1/27/2009                                                  23
Consideration 5: Simplicity
• In   both tasks and tools
• The simpler the task – likely the less confusion over
 instructions, higher the volunteer participation rate
• The simpler the tools – lowest common denominator
 infrastructure, minimum number of versions over the
 years, likely widest audience




 1/27/2009                                                24
Consideration 6: Global & Open-Ended
    Global scale for participation
•

    Effort is front loaded - once uploaded no
•
    real penalty to indefinite data collection
    Data ‘evolves’ as it changes scale
•

    Especially true for
•
      − inter-related experiments,
                                                                                    10000




      − variations in experimental designs and                                       1000




                                                      Log of the Number Observers
      − results that are in pursuit of an aggregate
        property                                                                      100

      − results that change over time
                                                                                       10




                                                                                        1
                                                                                        1990   1995         2000           2005        2010
                                                                                                      Experiment by Year



    1/27/2009                                                                                                                     25
Consideration 7: Usage as Data
• Any        online interaction creates data
• Theboundary between experiments and tools is
 potentially fuzzy
• Usefulexperiments can be formatted as a useful
 tool, and the more useful the tool the greater the
 potential data.
• An important implication and possible advantage is
 that a tool defines context for the task, the
 pragmatics is inherent.


 1/27/2009                                            26
Consideration 8: Mutual Bootstrapping
    Mutual bootstrapping – machine learning applied to training
•
    data gathered online, which in turn creates processed data
    which can enable human learning.
    Social data can be educational.
•



                   Chartreuse
    Revisiting approaches to laboratory experiments – if the
•
    goals are simplicity, categorization, ambiguity, larger scale
    and so on, how are the designs different?



    1/27/2009                                                   27
Questions?

             Elvis’s favorite color?

             That would be blue.




1/27/2009                              28

Más contenido relacionado

Destacado

Highland-March Office Business Centers Overview
Highland-March Office Business Centers OverviewHighland-March Office Business Centers Overview
Highland-March Office Business Centers Overviewkimlibby
 
Rupert 4.5 UDL and Backwards Design, Jan 2014
Rupert 4.5 UDL and Backwards Design, Jan 2014Rupert 4.5 UDL and Backwards Design, Jan 2014
Rupert 4.5 UDL and Backwards Design, Jan 2014Faye Brownlie
 
Efficient Memory-Reference Checks for Real-time Java
Efficient Memory-Reference Checks for Real-time JavaEfficient Memory-Reference Checks for Real-time Java
Efficient Memory-Reference Checks for Real-time JavaAngelo Corsaro
 
Asia Food 6B
Asia Food 6BAsia Food 6B
Asia Food 6BC FM
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceAngelo Corsaro
 
Carols Presentation53
Carols  Presentation53Carols  Presentation53
Carols Presentation53guest576d5
 
Cyberpolitics2009w4
Cyberpolitics2009w4Cyberpolitics2009w4
Cyberpolitics2009w4oiwan
 
Talleres halloween 2012
Talleres halloween 2012Talleres halloween 2012
Talleres halloween 2012XXX XXX
 
Vortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of ThingsVortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of ThingsAngelo Corsaro
 
The Reality of Innovation and its Implications for Projects
The Reality of Innovation and its Implications for ProjectsThe Reality of Innovation and its Implications for Projects
The Reality of Innovation and its Implications for ProjectsLINKInnovationStudies
 
Cyberpolitics 2009 W5
Cyberpolitics 2009 W5Cyberpolitics 2009 W5
Cyberpolitics 2009 W5oiwan
 
Castañada infantil 2012
Castañada infantil 2012Castañada infantil 2012
Castañada infantil 2012XXX XXX
 
SharePoint + Silverlight - new BFF's by Wictor Wilén
SharePoint + Silverlight - new BFF's by Wictor WilénSharePoint + Silverlight - new BFF's by Wictor Wilén
SharePoint + Silverlight - new BFF's by Wictor WilénWictor Wilén
 

Destacado (20)

Active Channel
Active ChannelActive Channel
Active Channel
 
Highland-March Office Business Centers Overview
Highland-March Office Business Centers OverviewHighland-March Office Business Centers Overview
Highland-March Office Business Centers Overview
 
Rupert 4.5 UDL and Backwards Design, Jan 2014
Rupert 4.5 UDL and Backwards Design, Jan 2014Rupert 4.5 UDL and Backwards Design, Jan 2014
Rupert 4.5 UDL and Backwards Design, Jan 2014
 
Efficient Memory-Reference Checks for Real-time Java
Efficient Memory-Reference Checks for Real-time JavaEfficient Memory-Reference Checks for Real-time Java
Efficient Memory-Reference Checks for Real-time Java
 
Sph 106 Ch 10
Sph 106 Ch 10Sph 106 Ch 10
Sph 106 Ch 10
 
Asia Food 6B
Asia Food 6BAsia Food 6B
Asia Food 6B
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution Service
 
Carols Presentation53
Carols  Presentation53Carols  Presentation53
Carols Presentation53
 
Sph 107 Ch 10
Sph 107 Ch 10Sph 107 Ch 10
Sph 107 Ch 10
 
Cyberpolitics2009w4
Cyberpolitics2009w4Cyberpolitics2009w4
Cyberpolitics2009w4
 
Talleres halloween 2012
Talleres halloween 2012Talleres halloween 2012
Talleres halloween 2012
 
Asis. educ. inic.
Asis. educ. inic.Asis. educ. inic.
Asis. educ. inic.
 
Facebook
FacebookFacebook
Facebook
 
Vortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of ThingsVortex: The Intelligent Data Sharing Platform for the Internet of Things
Vortex: The Intelligent Data Sharing Platform for the Internet of Things
 
The Reality of Innovation and its Implications for Projects
The Reality of Innovation and its Implications for ProjectsThe Reality of Innovation and its Implications for Projects
The Reality of Innovation and its Implications for Projects
 
Cyberpolitics 2009 W5
Cyberpolitics 2009 W5Cyberpolitics 2009 W5
Cyberpolitics 2009 W5
 
Castañada infantil 2012
Castañada infantil 2012Castañada infantil 2012
Castañada infantil 2012
 
SharePoint + Silverlight - new BFF's by Wictor Wilén
SharePoint + Silverlight - new BFF's by Wictor WilénSharePoint + Silverlight - new BFF's by Wictor Wilén
SharePoint + Silverlight - new BFF's by Wictor Wilén
 
Pintura 2 Eso
Pintura 2 EsoPintura 2 Eso
Pintura 2 Eso
 
Linkedin Pp
Linkedin PpLinkedin Pp
Linkedin Pp
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Ei09 Thousands Observers

  • 1. Thousands of Online Observers is Just the Beginning Nathan Moroney, HP Labs Human Vision and Electronic Imaging XIV Session 2: Social Software, Internet Experiments and New Paradigms for the Web Monday, January 19, 2009, 1:00-1:30 PM
  • 2. Outline • Brief History of Crowd-Sourcing • Online Experiments − Unconstrained color naming − Color name comparison − Color difference description − Image quality description − World Wide Gamma • Online Tools − Color Thesaurus, Color Zeitgeist & Italian Color Thesaurus • Eight considerations 1/27/2009 2
  • 3. Brief History of Crowdsourcing: Part 1 “Since the beginning, it was just the same. The only difference, the crowds are bigger now.” Elvis 1/27/2009 3
  • 4. Brief History of Crowdsourcing: Part 2 “The future belongs to crowds.” Mao II Don Delillo (Left as an exercise for the audience to do an Elvis – Delillo mash-up) 1/27/2009 4
  • 5. Online Experiments • Basic pieces − Experimental design – unconstrained text − Software, a server – JavaScript − Communication network –World Wide Web − Participants - volunteers • Results − Direct Data − Usage Data • Optional but useful – lab data for validation 1/27/2009 5
  • 6. Unconstrained Color Naming • Seven colored patches • Randomly selected − 6x6x6 RGB sampling • Text field for names • Provide the “best” name • Optional comments • Started in 2002 1/27/2009 6
  • 7. On-Line vs. Berlin & Kay CIECAM02Hue Angle CIECAM02 hue angle y = 0.9971x + 28.986 360 2 R = 0.9859 270 Berlin & Kay 180 90 0 0 90 180 270 360 On-Line Web 1/27/2009 7
  • 8. Color Name Comparison • Text only • Eleven color names • Non-repeating random walk • Eleven triads − Which color is least like the other two? • Collect additional demographic data 1/27/2009 8
  • 10. Color Difference Description • Five pairs of colored patches. • Best describe the difference • Text field per pair − Unconstrained description • Randomly sample RGB cube − Constrained RGB offsets 1/27/2009 10
  • 11. Frequencies of Words 0.048 right 0.045 more 0.031 left is six times as frequent • ‘More’ 0.028 one 0.018 color as ‘less’ 0.017 green 0.017 darker • ‘Darker’ is twice as frequent 0.015 blue 0.012 than as ‘lighter’, 0.012 saturated 0.011 patch − same for ‘dark’ and ‘light’ 0.011 first 0.010 purple • Lime and magenta are not in 0.009 lighter 0.009 second the top 100 terms – 0.008 dark 0.007 less − But they are in the top 10 of 0.007 brown unconstrained naming. 0.007 red 0.006 different 0.006 yellow 0.006 difference 0.006 brighter 0.006 hue 0.005 pink 1/27/2009 11
  • 12. Image Quality Description Overall and specific • description of image quality Demographic questions • Proportion vs. Token 0.089 the 0.033 of 0.032 is 0.031 and color(s) 0.021 0.017 to 0.016 good 0.014 on 0.014 a 0.013 in 1/27/2009 12
  • 13. Opt-In Demographics: n=338 Non-Native Male 35% 44% Female Native 56% English Gender 65% Proficiency Maybe >60 1% Color Blind 40-60 < 20 1% Don’t Know 9% Definitely 17% 1% 23% Color Blind Color Age Vision (years) (self-described) 59% 89% Normal 20-40 1/27/2009 13
  • 14. World Wide Gamma • Lightness partitioning task, benchmark to a nominal display and existing lightness scales, such as L*. After Before 1/27/2009 14
  • 15. World Wide Gamma • Red is >600 participants • Black is current results • Specific experimental feedback • Offsetfor darkest levels but quite linear 1/27/2009 15
  • 16. Online Color Thesaurus • Interface to the underlying database of color names • Largest number of users 1/27/2009 16
  • 17. Color Zeitgeist • Usage data – tools use creates data which in turn creates another tool 1/27/2009 17
  • 18. Italian Color Thesaurus • Italian data < English data • Adaptive tools − Qualification through ratings − Quantity through instance- based harvesting, collect new data only for missing colors 1/27/2009 18
  • 19. Consideration 1: Scale • Yes online experiments mean bigger crowds − Larger & more diverse pool of possible participants − Logarithmic scale of participation Stanford HP Palo San HP Department California (under) Labs Alto Jose 1 10 100 1K 10K 100K 1M 10M 100M English Application OS Lab Color Web-based Based Based Prototypes & Thesaurus Color naming Color Color Experiments experiment Picker Picker 1/27/2009 19
  • 20. Observers per Experiment by Year 10000 1000 Log of the Number Observers These should also have error bars and 100 connecting lines… 10 1 1990 1995 2000 2005 2010 Experiment by Year 1/27/2009 20
  • 21. Consideration 2: Distributed Design • Minimize the effort from any single participant − Increase volunteer participation rate? − Minimize impact of an single, systematically disruptive participant •A ‘knob’ that can be used to dial the target “time to completion” for any given web participant • Applicable to even relatively complex tasks − Triadic comparison vs. 1/27/2009 21
  • 22. Consideration 3: Ambiguity • Lack of constraints is a trade-off − May make the task more difficult for observers − May enable a different set of questions − General bias is towards unconstrained tasks − Implicitly include real world variability • Sourcesof variability are vast, robustness comes from scale – and a focus categories not thresholds “wasn’t sure whether you wanted accurate or poetic names.” Anonymous Comment June 8, 2002 1/27/2009 22
  • 23. Consideration 4: Hypotheses vs Training • Thresholds versus Categories • Individual performance versus collective capability • Numbers versus Words Pixel by pixel machine color naming – see - ‘Lexical Image Processing’ CIC 16 1/27/2009 23
  • 24. Consideration 5: Simplicity • In both tasks and tools • The simpler the task – likely the less confusion over instructions, higher the volunteer participation rate • The simpler the tools – lowest common denominator infrastructure, minimum number of versions over the years, likely widest audience 1/27/2009 24
  • 25. Consideration 6: Global & Open-Ended Global scale for participation • Effort is front loaded - once uploaded no • real penalty to indefinite data collection Data ‘evolves’ as it changes scale • Especially true for • − inter-related experiments, 10000 − variations in experimental designs and 1000 Log of the Number Observers − results that are in pursuit of an aggregate property 100 − results that change over time 10 1 1990 1995 2000 2005 2010 Experiment by Year 1/27/2009 25
  • 26. Consideration 7: Usage as Data • Any online interaction creates data • Theboundary between experiments and tools is potentially fuzzy • Usefulexperiments can be formatted as a useful tool, and the more useful the tool the greater the potential data. • An important implication and possible advantage is that a tool defines context for the task, the pragmatics is inherent. 1/27/2009 26
  • 27. Consideration 8: Mutual Bootstrapping Mutual bootstrapping – machine learning applied to training • data gathered online, which in turn creates processed data which can enable human learning. Social data can be educational. • Chartreuse Revisiting approaches to laboratory experiments – if the • goals are simplicity, categorization, ambiguity, larger scale and so on, how are the designs different? 1/27/2009 27
  • 28. Questions? Elvis’s favorite color? That would be blue. 1/27/2009 28