Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Automated metadata generation projects at yle - 2017 Selkala, Elina

333 visualizaciones

Publicado el

Media Management Commission Seminar 2017 in Lugano. Day 2 - presentation 2

Publicado en: Datos y análisis
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Automated metadata generation projects at yle - 2017 Selkala, Elina

  1. 1. Automated metadata generation projects at Yle Elina Selkälä Manager, archive publishing and metadata Yle Archives elina.selkala@yle.fi FIAT/IFTA Media Management Seminar Lugano 8.-9.6.2017
  2. 2. Agenda Automated metadata generation projects at Yle • Yle in a nutshell • Yle Archives, collections and materials • Production of metadata at Yle • What we experimented on: examples of automatic content analysis projects • What we learned • What is happening next • What is the role of the information professional in the age of AI
  3. 3. This is Yle Automated metadata generation projects at Yle • Public service broadcasting company • 3 nationwide television & 6 radio channels, 24 regional radio stations • Extensive online presence: yle.fi, svenska.yle.fi, Yle Areena, Yle Elävä arkisto • In addition to Finnish and Swedish, has broadcasts in 11 languages, e.g. Sami, English and Russian • National programming hours per year: 50,000 hours of radio programming 20,000 hours of TV programming 5,000 hours of audio content online 15,000 hours of video content online
  4. 4. Yle Archives Automated metadata generation projects at Yle • Archives and catalogues Yle produced and co-produced radio and TV programmes • Fosters and curates the archive collections of Yle • Offers information services and training for Yle staff • Publishes archive material online Collections • TV and radio materials, photographs, sound effects and music • Archived in Media Asset Management System ”Metro” (Avid) • Represents an important part of Finnish cultural heritage • Archive has also sheet music, books and online resources e.g. papers, magazines, databases
  5. 5. Radio and TV Archive collections Automated metadata generation projects at Yle TV materials • TV programmes and raw material from 1957 onwards & film materials from 1906 onward • Collection consists of around 700,000 programmes and clips • All Yle productions / co-productions have been systematically archived since 1984 • Archiving in native digital form since 2009 • Around 10,000 hours of video content is archived / year • Relatively good metadata Radio materials • Yle produced programmes and raw material, oldest surviving clip from 1935 • The collection consists of around 2 million programmes and clips • Currently around 10% of radio transmissions are archived (e.g. News and works of art) • Archiving in native digital form from the beginning of the 2000s • Around 20,000 hours of audio content is archived / year • Metadata of varying quality
  6. 6. Metadata production at Yle Archived radio and TV programmes Automated metadata generation projects at Yle • Yle’s archive materials are widely used as whole programmes (reruns) and clips • Metadata incomplete or insufficient for many reasons → hinders findability and safe re-use • Alongside tape collections digitization projects, related programme metadata is updated and improved • Huge endeavour, therefore prioritization is needed (most used, customer orders) • Descriptive metadata is done manually • Done by Archives’ information specialists (about 15 people)
  7. 7. Metadata production at Yle New audio and video content Automated metadata generation projects at Yle • Metadata production decentralized Metadata added and stored throughout the production and publishing process Some metadata from production and publishing systems, descriptive metadata filled out manually Done by Yle staff; production coordinators, editors, producers, etc. • Company-wide Archiving Policy Defines the responsibilities, contents to be archived, metadata and formats • Growing amount of published content • Metadata is used for archiving and reuse purposes, as well as reporting • New needs for metadata: improve discoverability and visibility on
  8. 8. Automated content analysis projects at Yle Fall 2016 • Automated content analysis (virtual) team with participants from different parts of Yle • Improve discoverability on web services (Yle Areena) • Improve discoverability from archive databases • New ways to subtitle video content • Management of raw materials and versions • Team’s goals were to: • Learn about AI, machine learning and automatic content analysis methods in theory and practice • Carry out pilot projects (PoCs) with some companies • Find solutions for automated metadata production in practic Automated metadata generation projects at Yle
  9. 9. Case 1 Automatic content analysis of TV programmes (1/2) Pilot project with Valossa Labs Goal • Test and evaluate the quality and suitability of automatically produced (descriptive) metadata in Yle’s metadata production Tested methods • Text analysis of subtitles → tagging, annotation • Image recognition: object and face recognition • OCR of captions • Automatic segmentation Automated metadata generation projects at Yle
  10. 10. Case 1 Automatic content analysis of TV programmes (2/2) Results • Face recognition works well, object recognition is somewhat unreliable and too detailed • Subtitles could also be used for content analysis • Automatic segmentation (scenes, inserts) works well • Test period was too short, no experiences about the learning capabilities of the system • Speech recognition alongside image recognition would probably be profitable, but the tested application did not support this feature Automated metadata generation projects at Yle
  11. 11. Case 2 Automatic content analysis of audio content (1/2) Pilot project with Lingsoft Goal • Test and evaluate the quality of speech & music recognition and automatic annotation Tested methods • Speech recognition → textual data for text analysis • Automatic annotation and indexing • Music recognition (distinguish music from speech) Automated metadata generation projects at Yle
  12. 12. Case 2 Automatic content analysis of audio content (2/2) Results • Quality of the audio and speaker's way to speak have a significant impact • Accuracy of the transcription is sufficient for annotation → relevant keywords, tags • Music recognition works fairly well • Speaker recognition would be useful, but the tested service did not support this feature Automated metadata generation projects at Yle
  13. 13. Case 3 Automatic content analysis of Yle Areena content (1/3) Pilot project with Qvik, Valossa Labs and Aalto University Goal • Improve findability and usability of audio and video content in Yle Areena online service Three experiments • Speech recognition: Time-code based transcriptions of audio files • Image / structure recognition: fast forward opening & closing credits, inserts • Text analysis: automatic annotation Yle Areena content New functionalities for the end user Automatic content analysis Media Metadata Automated metadata generation projects at Yle
  14. 14. Case 3 Automatic content analysis of Yle Areena content (2/3) Speech-to-text & text analysis • Time-coded transcription and automatic annotation of audio and video content Results • Transcriptions were added to Yle Areena web page, search engines were able to index contents → searches to verbal content was made possible • Identification of relevant concepts was successful Automated metadata generation projects at Yle
  15. 15. Case 3 Automatic content analysis of Yle Areena content (3/3) Identifying the structure of the content • Automatic segmentation and identification of recurrent elements (opening & closing credits) • Object recognition Results • Recurring elements (based on images) and topics (based on subtitling) can be identified → intelligent fast forward is possible (Demo) • Object recognition is somewhat unreliable Automated metadata generation projects at Yle
  16. 16. Lessons learned Define needs, requirements, and goals • What is needed and who needs • Costs and benefits Define how success is measured • Define how success is measured • Evaluation criteria Plan lead-through of projects • Time and other resources Cooperation with outside partners • Ready-made test material packages Contract and copyrights issues Share your information Automated metadata generation projects at Yle
  17. 17. On-going projects Production • Robot journalism, Voitto-robotti (pilot project) • Automatic annotation of Yle’s web articles (in production) Publishing • Automatic metadata production by speech recognition and image recognition (PoC) • Speech recognition in subtitling (PoC) Consumption / use • Recommendation for Yle Areena content (in production) • Yle Uutisvahti application, recommendation engine (in production) • Automatic moderation of web discussions (PoC) • Deduction of customer demographics (in production) Automated metadata generation projects at Yle
  18. 18. Information professionals changing role What is the role of information professionals in the age of AI? • Machine’s teacher • Quality assessor, quality control manager • Curator and valuer of metadata • Customer value assessor • Publisher of (archived) content New skills are needed • Comprehension of the methods to assess the opportunities available • Technical know-how Information professional and the machine need to coexist Automated metadata generation projects at Yle

×