SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
One of These Things is Not Like the Other:
      Crowdsourcing Semantic Similarity of Multimedia Files
                                        Raynor    Vliegendhart *,                     Martha               Larson *,   and Johan                   Pouwelse**

                                      Multimedia Information Retrieval Lab*                                  Parallel and Distributed Systems Group**
                                         Delft University of Technology                                          Delft University of Technology

 Problem                                                                                                      HIT Design
 ● Problem: What constitutes a near duplicate?                                                                Amazon Mechanical Turk (AMT) is a crowdsourcing platform
    For example: Are these two files the same? Why (not)?                                                     to which Human Intelligence Tasks (HITs) can be submitted.
                                                                                                              Phrasing in our HIT is important in order to elicit serious judgments:
                                                                                                              ● “Imagine that you download the three items in the list and that
                                                                                                                you view them.”

      Chrono Cross - 'Dream of the                  Chrono Cross Dream of the
       Shore Near Another World'                    Shore Near Another World                                                 Harry Potter and the Sorcerers Stone Audio
                                                                                                                             Book (478 MB)
          Violin/Piano Cover                             Violin and Piano
                                                                                                                             Harry Potter and the Sorcerer s Stone
             (YouTube: IQYNEj51EUI)                      (YouTube: Iuh3YrJtK3M)                                              (2001)(ENG GER NL) 2Lions- (4.36 GB)

                                                                                                                             Harry Potter.And.The.Sorcerer.Stone.DVDR.
    Yes:   It’s the same song.                                                                                               NTSC.SKJACK.Universal.S (4.46 GB)

    No:    These are different performances by different performers.

    Definition:                                                                                               ● Don’t force workers to make a contrast, and
    Functional near-duplicate multimedia items are items that fulfill the
                                                                                                              ● Explain the definition of functional similarity.
    same purpose for the user. Once the user has one of these items,
    there is no additional need for another.
                                                                                                                   o The items are comparable. They are for all practical purposes the
                                                                                                                     same. Someone would never really need all three of these.
 ● Task: Discovering new notions of user-perceived similarity between                                              o Each item can be considered unique. I can imagine that someone
   multimedia files in a file-sharing setting.                                                                       might really want to download all three of these items.

 ● Motivation: Clustering items in search results.                                                                 o One item is not like the other two. (Please mark that item in the list.)
                                                                                                                     The other two items are comparable.




                                                                                                              Experiments
                                                                                                              ● Dataset:
                                                                                                                   ● Popular file-sharing site: The Pirate Bay (thepiratebay.se).
                                                              Screenshots from Tribler 5.4 (tribler.org)

                                                                                                                   ● 75 queries derived from Top 100 list.
                                                                                                                   ● 32,773 filenames and metadata.

 Approach                                                                                                          ● 1000 random triads sampled from search results.
                                                                                                              ● Crowdsourcing Experiment:
 ● Idea: Point the odd one out, inspired by Sesame Street’s
   “one of these things is not like the other”.                                                                    ● Recruitment HIT and Main HIT run concurrently on AMT.
                                                                                                                   ● 8 out of 14 qualified workers produced free-text judgments
                                                                                                                     for 308 triads within 36 hours.
                                                                                                              ● Card Sort:
                                                                                                                   ● Group similar judgments into piles,
                                                                                                                     merge piles iteratively, and, finally
                                                                                                                     label each pile.
                                                                                                                   ● End result: 44 user-perceived
                                                                                                                     dimensions of similarity discovered.

 ● Crowdsourcing Task:
      ● 3 multimedia files displayed as search results
                                                                                                              Conclusion
      ● Worker points the odd one out and justifies why.
                                                                                                              ● Wealth of user-perceived dimensions of similarity discovered.
 ● Challenge: Eliciting serious judgments                                                                     ● Quick results due to interesting crowdsourcing task.




Contact: R.Vliegendhart@tudelft.nl                                                                                                        ICT.OPEN 2012, Rotterdam, The Netherlands, 2012
         @ShinNoNoir

Más contenido relacionado

Más de CUbRIK Project

Matching Game Mechanics and Human Computation Tasks in Games with a Purpose
Matching Game Mechanics and Human Computation Tasks in Games with a PurposeMatching Game Mechanics and Human Computation Tasks in Games with a Purpose
Matching Game Mechanics and Human Computation Tasks in Games with a PurposeCUbRIK Project
 
Humanist machine interaction with histoGraph
Humanist machine interaction with histoGraphHumanist machine interaction with histoGraph
Humanist machine interaction with histoGraphCUbRIK Project
 
histoGraph presented to MMSP 2013
histoGraph presented to MMSP 2013histoGraph presented to MMSP 2013
histoGraph presented to MMSP 2013CUbRIK Project
 
histoGraph for historians
histoGraph for historianshistoGraph for historians
histoGraph for historiansCUbRIK Project
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitiesCUbRIK Project
 
CUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK Project
 
Building a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphBuilding a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphCUbRIK Project
 
The CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetThe CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetCUbRIK Project
 
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Project
 
CUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Project
 
Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?CUbRIK Project
 
CUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK Project
 
CUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK Project
 
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaCUbRIK Project
 
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...CUbRIK Project
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Project
 
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Project
 
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK Project
 
Semantic schema for geonames
Semantic schema for geonamesSemantic schema for geonames
Semantic schema for geonamesCUbRIK Project
 

Más de CUbRIK Project (20)

Matching Game Mechanics and Human Computation Tasks in Games with a Purpose
Matching Game Mechanics and Human Computation Tasks in Games with a PurposeMatching Game Mechanics and Human Computation Tasks in Games with a Purpose
Matching Game Mechanics and Human Computation Tasks in Games with a Purpose
 
Humanist machine interaction with histoGraph
Humanist machine interaction with histoGraphHumanist machine interaction with histoGraph
Humanist machine interaction with histoGraph
 
histoGraph presented to MMSP 2013
histoGraph presented to MMSP 2013histoGraph presented to MMSP 2013
histoGraph presented to MMSP 2013
 
histoGraph for historians
histoGraph for historianshistoGraph for historians
histoGraph for historians
 
histoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital HumanitieshistoGraph: a case study in Digital Humanities
histoGraph: a case study in Digital Humanities
 
SMILA in CUbRIK
SMILA in CUbRIKSMILA in CUbRIK
SMILA in CUbRIK
 
CUbRIK research on social aspects
CUbRIK research on social aspectsCUbRIK research on social aspects
CUbRIK research on social aspects
 
Building a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraphBuilding a social graph for the history of Europe: the CUbRIK histoGraph
Building a social graph for the history of Europe: the CUbRIK histoGraph
 
The CUbRIK histoGraph Factsheet
The CUbRIK histoGraph FactsheetThe CUbRIK histoGraph Factsheet
The CUbRIK histoGraph Factsheet
 
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence ApplicationCUbRIK Fashion Trend Analysis: a Business Intelligence Application
CUbRIK Fashion Trend Analysis: a Business Intelligence Application
 
CUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual InterfaceCUbRIK Social Graph Visual Interface
CUbRIK Social Graph Visual Interface
 
Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?Mining Emotions in Short Films: User Comments or Crowdsourcing?
Mining Emotions in Short Films: User Comments or Crowdsourcing?
 
CUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@QualinetCUbRIK and gaming experience@Qualinet
CUbRIK and gaming experience@Qualinet
 
CUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approachCUbRIK: Open Box. Multimedia and Human Computation approach
CUbRIK: Open Box. Multimedia and Human Computation approach
 
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social mediaICT 2013: Better Society: empowering Horizon 2020 with trustable social media
ICT 2013: Better Society: empowering Horizon 2020 with trustable social media
 
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
How Do We Deep-Link? Leveraging User-Contributed Time-Links for Non-Linear Vi...
 
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
CUbRIK Research at CIKM 2012: Efficient Jaccard-based Diversity Analysis of L...
 
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a PurposeCUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
CUbRIK Tutorial at ICWE 2013: part 2 - Introduction to Games with a Purpose
 
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human ComputationCUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
CUbRIK tutorial at ICWE 2013: part 1 Introduction to Human Computation
 
Semantic schema for geonames
Semantic schema for geonamesSemantic schema for geonames
Semantic schema for geonames
 

Último

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

ICT.OPEN2012 - One of These Things is Not Like the Other: Crowdsourcing Semantic Similarity of Multimedia Files

  • 1. One of These Things is Not Like the Other: Crowdsourcing Semantic Similarity of Multimedia Files Raynor Vliegendhart *, Martha Larson *, and Johan Pouwelse** Multimedia Information Retrieval Lab* Parallel and Distributed Systems Group** Delft University of Technology Delft University of Technology Problem HIT Design ● Problem: What constitutes a near duplicate? Amazon Mechanical Turk (AMT) is a crowdsourcing platform For example: Are these two files the same? Why (not)? to which Human Intelligence Tasks (HITs) can be submitted. Phrasing in our HIT is important in order to elicit serious judgments: ● “Imagine that you download the three items in the list and that you view them.” Chrono Cross - 'Dream of the Chrono Cross Dream of the Shore Near Another World' Shore Near Another World Harry Potter and the Sorcerers Stone Audio Book (478 MB) Violin/Piano Cover Violin and Piano Harry Potter and the Sorcerer s Stone (YouTube: IQYNEj51EUI) (YouTube: Iuh3YrJtK3M) (2001)(ENG GER NL) 2Lions- (4.36 GB) Harry Potter.And.The.Sorcerer.Stone.DVDR. Yes: It’s the same song. NTSC.SKJACK.Universal.S (4.46 GB) No: These are different performances by different performers. Definition: ● Don’t force workers to make a contrast, and Functional near-duplicate multimedia items are items that fulfill the ● Explain the definition of functional similarity. same purpose for the user. Once the user has one of these items, there is no additional need for another. o The items are comparable. They are for all practical purposes the same. Someone would never really need all three of these. ● Task: Discovering new notions of user-perceived similarity between o Each item can be considered unique. I can imagine that someone multimedia files in a file-sharing setting. might really want to download all three of these items. ● Motivation: Clustering items in search results. o One item is not like the other two. (Please mark that item in the list.) The other two items are comparable. Experiments ● Dataset: ● Popular file-sharing site: The Pirate Bay (thepiratebay.se). Screenshots from Tribler 5.4 (tribler.org) ● 75 queries derived from Top 100 list. ● 32,773 filenames and metadata. Approach ● 1000 random triads sampled from search results. ● Crowdsourcing Experiment: ● Idea: Point the odd one out, inspired by Sesame Street’s “one of these things is not like the other”. ● Recruitment HIT and Main HIT run concurrently on AMT. ● 8 out of 14 qualified workers produced free-text judgments for 308 triads within 36 hours. ● Card Sort: ● Group similar judgments into piles, merge piles iteratively, and, finally label each pile. ● End result: 44 user-perceived dimensions of similarity discovered. ● Crowdsourcing Task: ● 3 multimedia files displayed as search results Conclusion ● Worker points the odd one out and justifies why. ● Wealth of user-perceived dimensions of similarity discovered. ● Challenge: Eliciting serious judgments ● Quick results due to interesting crowdsourcing task. Contact: R.Vliegendhart@tudelft.nl ICT.OPEN 2012, Rotterdam, The Netherlands, 2012 @ShinNoNoir