Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

A framework for visual search in broadcasting companies' multimedia archives

171 visualizaciones

Publicado el

Media Management Commission Seminar 2017 in Lugano. Day 2 - presentation 8

Publicado en: Datos y análisis
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

A framework for visual search in broadcasting companies' multimedia archives

  1. 1. A framework for visual search in broadcast archives Speakers: Federico Maria Pandolfi, Davide Desirello Rai Teche
  2. 2.  Importance of proper organization and management of contents  Efficient search and retrieval methodologies are a must  Typical MAM systems: text-based queries, search over textual information and metadata  Pros: reliability, robustness  Cons: metadata extraction is expensive, time consuming and may not be available for each entry  No semantic or analytical representations of contents  No query-by-example or near-duplicate detection Introduction
  3. 3. Rai's digital archives include (videos and images as of end 2015):  1.540.032 hours of video material  102.300 music sheets and documents  18.720 photos of scenic costumes  1.700 photos of sets furniture  1.552 photos of Centro Elettronico Rai The number increases with a rate of approx. 130.000hr/year both because of new and old (digitized) material Only about 46% is annotated Case study: numbers
  4. 4. Case study: possible IR scenarios Archives  Correlating non-annotated material with similar pre-annotated contents (video-to-video search)  Retrieve specific video/image in the multimedia archive from a clip, single frame or similar image (image/video-to-image/video search) News Link an edited news/reportage to its raw footage and, viceversa (video-to- video search) Web Find a specific show from an image/clip (image/video-to- video search)
  5. 5.  Content-Based Image Retrieval (CBIR) solutions are necessary  Representation of images by means of features automatically extractable from the contents themselves, no annotation needed  Large number of CBIR solutions available  Highly customizable to address specific needs (e.g.: global/local/DCNN features, lots of efficient indexing and retrieval options, etc...) The importance of CBIR
  6. 6.  Issue: lots of options for image search, few for video- to-video search  Issue: Cutting-edge solutions with solid absolute performance but complex systems and/or non patent- free algorithms  Expensive and difficult to maintain platform: not ideal in enterprise environment Our approach  Solution: new framework based on ready-to-use solutions compatible with Rai's enterprise infrastructure  Solution: first approach based on simpler, open-source solution  LIRe (Lucene Image Retrieval): • CBIR platform with strong community support • Easy to integrate with Apache Solr (widely used in Rai) • Easy distributed search, index replication and scalability
  7. 7. Modules composing the framework (and their implementation):  Listener (custom files and folders manager)  Scene detector/key-frames extractor (FFMpeg)  Feature extractor (CEDD, LIRESOLR Plugin)  Indexer (LIRESOLR Plugin)  Retriever (LIRESOLR Plugin) Goal Implement independent workflow's logic blocks to:  Develop code in parallel  Easily replace blocks with better/more efficient solutions  Allow faster debugging and maintenance operations Proposed workflow: modularity
  8. 8. Proposed workflow: Listener and indexing  Chain starts by indexing reference videos in the database, various entry-points:  Shared folder  RESTful APIs  Listener:  Is a background process watching a shared folder (container) with files to be indexed  Manages the whole flow by giving specific commands to the various components  Manages the folders-structure  Triggered by JSON Token file containing file-list and parameters for indexing process  Framework targeted at image search on video files  Scene detection and key-frame extraction with FFMpeg  Generation of CEDD features descriptor (light, low computational power) for each key-frame  Indexing entries in Solr, two cores:  ImageCore (ID, URI, Descriptor)  MetaCore (other available metadata)
  9. 9. JSONListener Video +JSON Watch Folder Proposed workflow: modularity Index (Solr) Feature Extractor Indexer Keyframe Extractor TV Radio Tape Other
  10. 10. Simple retrieval algorithm: 1. Computation of query image descriptor 2. Descriptor-specific distance evaluated for each entry in the database 3. LIRe tweakable parameters:  Accuracy  Number of candidates 4. Results sorted by relevance using distance as score Proposed workflow: retrieval
  11. 11. How to test the framework?  Lack of copyright-free datasets and evaluation frameworks that target our specific use-case (to use as reference)  Impossible to perform image search on the whole Rai’s archive, datasets selected (not annotated):  TG Leonardo (2200 episodes, approx. 360hrs): thematic, scientific focused newscast, suitable for news/reportage and raw footage retrieval  Medita (2000 episodes, approx. 2000hrs): educational show, suitable to test pure image search and tagging-aid capabilities  Query images extracted from indexed videos using different techniques:  FFMpeg shot detection  Rai’s Shotfinder Preliminary evaluation
  12. 12. Preliminary evaluation The best match is not always found among the very first results  CEDD is a very compact descriptor, images with similar colours and textures may have very similar descriptors  Changing the accuracy increases retrieval time, slightly better results Difficult to evaluate precision and recall for query images different than the indexed images (datasets not annotated yet)  If query shot is indexed: pat(1)≃1, otherwise the distance increases substantially  Might be good enough for raw footage/final edit match use-case
  13. 13.  Not able yet to find instances of same objects within different videos and under different conditions (e.g. different video quality, framing, etc..), no semantic search  Might be because of CEDD and, in general, global descriptors  Compact global descriptors may be good for specific tasks but a more semantic approach is required  Quantitative tests presented are not mature yet  Making a proper dataset requires time and our framework is still in early stage of development  We plan to build our own annotated dataset using the company’s archive material Conclusions
  14. 14. Future work Creation of a new annotated dataset containing raw and edited material Evaluation of better key-frame extraction and shot detection algorithms:  Reduce the number of extracted key-frames  Weight key-frames according to their relevance within the related sequence  Improve retrieval performances, decrease index size and, reduce disk occupation and speed-up search times Evaluation of more sophisticated feature extraction algorithms (local features, BoVW, DCNN feature vectors, ...)  In some cases a semantic search (based on image contents) might be more useful
  15. 15. Thank you for watching F. M. Pandolfi, D. Desirello Rai Teche