Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 12 Anuncio

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Descargar para leer sin conexión

Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.

Much computer vision research has focused on natural images, but technical documents typically consist of abstract images, such as charts, drawings, diagrams, and schematics. How well do general web search engines discover abstract images? Recent advancements in computer vision and machine learning have led to the rise of reverse image search engines. Where conventional search engines accept a text query and return a set of document results, including images, a reverse image search accepts an image as a query and returns a set of images as results. This paper evaluates how well common reverse image search engines discover abstract images. We conducted an experiment leveraging images from Wikimedia Commons, a website known to be well indexed by Baidu, Bing, Google, and Yandex. We measure how difficult an image is to find again (retrievability), what percentage of images returned are relevant (precision), and the average number of results a visitor must review before finding the submitted image (mean reciprocal rank). When trying to discover the same image again among similar images, Yandex performs best. When searching for pages containing a specific image, Google and Yandex outperform the others when discovering photographs with precision scores ranging from 0.8191 to 0.8297, respectively. In both of these cases, Google and Yandex perform better with natural images than with abstract ones achieving a difference in retrievability as high as 54% between images in these categories. These results affect anyone applying common web search engines to search for technical documents that use abstract images.

Anuncio
Anuncio

Más Contenido Relacionado

Más reciente (20)

Anuncio

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

  1. 1. 1 2022/10/24 @shawnmjones 1 2022/10/24 Managed by Triad National Security, LLC, for the U.S. Department of Energy’s NNSA. Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine Shawn M. Jones & Diane Oyen Information Sciences (CCS-3) 2022/10/24 LA-UR-22-30888
  2. 2. 2 2022/10/24 @shawnmjones There are few computer vision research papers focused on querying and retrieving abstract, technical drawings • Technical documents typically contain abstract images • Many reasons exist to search for abstract images online: • protect intellectual property • build datasets • find evidence for legal cases • establish scholarly evidence • justify funding through image reuse https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png
  3. 3. 3 2022/10/24 @shawnmjones Baidu Bing Google Yandex Now major search engines support reverse image search Screenshot source: https://image.baidu.com Screenshot source: https://images.google.com Screenshot source: https://www.bing.com/ Screenshot source: https://yandex.com/images
  4. 4. 4 2022/10/24 @shawnmjones With each service, a user can upload an image and receive different types of results pages-with results similar-to results the uploaded query image Uploaded image source: https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg Screenshot from: https://www.bing.com
  5. 5. 5 2022/10/24 @shawnmjones Research Question When using the reverse image search capability of general web search engines, are natural images more easily discovered than abstract images?
  6. 6. 6 2022/10/24 @shawnmjones To collect query images, we submitted terms to Wikimedia Commons’ API “diagram” “schematic” abstract images “photo” “photograph” natural images 100 images 100 images 100 images 99 images Previous studies have shown that Wikipedia content has high retrievability. Image sources: • https://commons.wikimedia.org/wiki/File:Galileo_Diagram.jpg • https://commons.wikimedia.org/wiki/File:Complete_neuron_cell_diagram_en.svg • https://commons.wikimedia.org/wiki/File:Bicycle_diagram-es.svg • https://commons.wikimedia.org/wiki/File:Systems_Engineering_V_diagram.jpg Image sources : • https://commons.wikimedia.org/wiki/File:Hvdc_bipolar_schematic.svg • https://commons.wikimedia.org/wiki/File:Beve_gear_schematic.png • https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png • https://commons.wikimedia.org/wiki/File:Carriage-house-2.jpg Image sources : • https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg • https://commons.wikimedia.org/wiki/File:Frank_W._Micklethwaite_photo_of_downtown_Toronto,_1890_-2.jpg • https://commons.wikimedia.org/wiki/File:James_Abram_Garfield,_photo_portrait_seated.jpg • https://commons.wikimedia.org/wiki/File:Wtc-photo.jpg Image sources : • https://commons.wikimedia.org/wiki/File:Adams_The_Tetons_and_the_Snake_River.jpg • https://commons.wikimedia.org/wiki/File:Photographing_sunrise_1745.jpg • https://commons.wikimedia.org/wiki/File:FEMA_-_5399_-_Photograph_by_Andrea_Booher_taken_on_09-28-2001_in_New_York.jpg • https://commons.wikimedia.org/wiki/File:Photographing_a_model.jpg
  7. 7. 7 2022/10/24 @shawnmjones We then submitted the same image to each reverse image search engine then again with: and so on... Image source: https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg Image source: https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png Screenshot source: https://images.google.com Screenshot source: https://www.bing.com/ Screenshot source: https://image.baidu.com Screenshot source: https://yandex.com/images
  8. 8. 8 2022/10/24 @shawnmjones Using ImageHash’s pHash and GoFigure’s VisHash we evaluated how often the same image existed in the results pHash was designed to compare photographs via Discrete Cosine Transforms (DCT). VisHash was designed to compare diagrams and technical drawings by finding shapes in the image. Uploaded images: https://commons.wikimedia.org/wiki/File:Manatee_photo.jpg https://commons.wikimedia.org/wiki/File:Interspiro_DCSC_loop_schematic.png Screenshots source: https://yandex.com/images
  9. 9. 9 2022/10/24 @shawnmjones Precision differs based on pages-with or similar-to results, with Yandex performing best blue = abstract images green = natural images Precision@k: What percentage of images in the results are the same as the query image if we stop at k results? S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
  10. 10. 10 2022/10/24 @shawnmjones After reviewing 10 pages-with results, Google has a max of 54% retrievability difference between images from the categories of photograph and diagram blue = abstract images green = natural images Retrievability: Given a query image, was it retrieved within the cutoff c? S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
  11. 11. 11 2022/10/24 @shawnmjones For similar-to results, Yandex consistently provides a high MRR (0.8) for natural images MRR: How many results, on average, across all queries, must a visitor review before finding a the same one again? Google does well with pages-with results S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).
  12. 12. 12 2022/10/24 @shawnmjones Key Takeaways • We submitted abstract and natural images from Wikimedia Commons to four major reverse image search engines. • When they do return results, Bing and Baidu do not perform well. • Google does not perform well for similar-to results, likely indicating that their definition of similar-to differs from other search engines. • Yandex performs best in all cases. • Yandex and Google consistently perform better for natural images in pages-with results. S. M. Jones and D. Oyen. 2022. “Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine,” Proceedings of the 2nd Drawings and abstract Imagery: Representation and Analysis (DIRA) Workshop. (Tel Aviv, Israel).

×