This paper aims at presenting the results of LinkedTV's rst
participation to the Search and Hyperlinking task at Medi-
aEval challenge 2013. We used textual information, tran-
scripts, subtitles and metadata, and we tested their combi-
nation with automatically detected visual concepts. Hence,
we submitted various runs to compare diverse approaches
and see the improvement when adding visual information.
DevEX - reference for building teams, processes, and platforms
LinkedTV @ MediaEval 2013 Search and Hyperlinking Task
1. Television Linked To The Web
LinkedTV @ MediaEval
Search and Hyperlinking
M. Sahuguet1, B. Huet1, B. Cervenková2, E. Apostolidis4, V. Mezaris4, D. Stein3,
S. Eickeler3, J.L. Redondo Garcia1, R. Troncy1, and L. Pikora2
MediaEval 2013 Workshop
Barcelona, Catalunya, Spain, 18-19 October 2013.
(1)
(2)
www.linkedtv.eu
(3)
(4)
2. LinkedTV ― Television Linked To the Web
www.linkedtv.eu
LinkedTV: interweaving Web and
TV into a single experience
Second screen scenario for
enriching television content and
achieving interaction between
user and content
Web: http://www.linkedtv.eu
2
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
3. LinkedTV@MediaEval
www.linkedtv.eu
MediaEval Search & Hyperlinking:
an overview of LinkedTV’s enrichment process
Brainstorming
Pre-processing (BBC dataset)
Video segmentation
Indexing data in Lucene
From visual cues to detected concepts
Search task
Hyperlinking task
Conclusion
3
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
4. Brainstorming
www.linkedtv.eu
Brainstorming meeting: Tasks and Dataset analysis
Shots are too small to return to user
Typos in the queries
Duplicate videos in the dataset
Visual concepts are not usable as such
Visual cues may not be helpful
Visual cues can also help as search terms
Maybe we can segment the videos differently?
Can we use speaker information?
Name of show/channel may appear in the query
Actors/Character names may appear
What analysis can we further apply on videos?
4
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
5. Brainstorming
www.linkedtv.eu
Brainstorming meeting: Tasks and Dataset analysis
Search:
Getting the right video is possible
Need to extract segment with good timing
Segmentation level is of major importance
Shot are too short
We want to be as close as possible to the viewer
Visual cues: not always helpful
<visualQueues>2 men sitting opposite each other</visualQueues>
<visualQueues>stands out and grabs your attention</visualQueues>
Need to design a framework to use Visual Cues
How can the LinkedTV media analysis tools be used?
5
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
6. Pre-processing dataset
www.linkedtv.eu
Processing ~ 1697h of BBC video data
Visual Concept detection (151)
20 days on 100 cores
Scene segmentation
CERTH
2 days on 6 cores
OCR
Fraunhofer
1 day on 10 cores
Keywords extraction
Fraunhofer
5 hours
Named Entities extraction
Eurecom
4 days
Face detection and tracking
6
CERTH
Eurecom
4 days on 160 cores
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
7. Video Segmentation
www.linkedtv.eu
Shots (provided by Task Organisers)
Scenes: groups of adjacent shots
Visual similarity
Temporal consistency
P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I.
Trancoso. Temporal Video Segmentation to Scenes Using High-Level
Audiovisual Features. IEEE Transactions on Circuits and Systems for Video
Technology, 2011
Sliding windows:
7
inspired from M. Eskevich, G. Jones, C. Wartena, M. Larson, R. Aly, T.
Verschoor, and R. Ordelman. Comparing retrieval effectiveness of
alternative content segmentation methods for Internet video search. 10th
International Workshop on Content-Based Multimedia Indexing (CBMI), 2012
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
8. Indexing data in Lucene
www.linkedtv.eu
Lucene engine for indexing the data
Index at different temporal granularities:
Video level (pre-filtering)
Scenes level
Shot level
Sliding windows segments level
Index different features at each temporal granularity:
Text (transcripts, subtitles)
Metadata (title, synopsis, cast, etc)
OCR
Visual concepts values (floating point fields)
Design a framework for querying indexes and returning video segments
from a query
8
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
9. From visual cues to detected concepts
www.linkedtv.eu
Text search is straightforward (default, TF-IDF values)
Need to incorporate visual information to the search
9
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
10. From visual cues to detected concepts
www.linkedtv.eu
Text search is straightforward (default, TF-IDF values)
Need to incorporate visual information to the search
Which concepts are present in the query?
semantic word distance based on Wordnet synset
mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal
10
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
11. From visual cues to detected concepts
www.linkedtv.eu
Text search is straightforward (default, TF-IDF values)
Need to incorporate visual information to the search
Which concepts are present in the query?
semantic word distance based on Wordnet synset
mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal
Integration of detected visual concepts to the Lucene search:
Concepts filtering
11
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
12. From visual cues to detected concepts
www.linkedtv.eu
Text search is straightforward (default, TF-IDF values)
Need to incorporate visual information to the search
Which concepts are present in the query?
semantic word distance based on Wordnet synset
mapping between keywords (extracted first results:
- Correct detection rate from the 100 from the visual cues query)
and visual concepts 0,5
- threshold at
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
- Normalize confidence: threshold at 0,7
mapped visual concepts: Athlete, Dogs, Horse, Animal
Integration of detected visual concepts to the Lucene search:
Concepts filtering
12
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
13. From visual cues to detected concepts
www.linkedtv.eu
Text search is straightforward (default, TF-IDF values)
Need to incorporate visual information to the search
Which concepts are present in the query?
semantic word distance based on Wordnet synset
mapping between keywords (extracted from the visual cues query)
and visual concepts
<visualQueues>animals, kenya wildlife reserve, marathon</visualQueues>
mapped visual concepts: Athlete, Dogs, Horse, Animal
Integration of detected visual concepts to the Lucene search:
Concepts Selection
Designing an enriched query: both textual (text query) and visual
information (range query).
13
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
14. Search task
www.linkedtv.eu
Search videos at different temporal granularity
Concatenation of textual and visual query for text search
<queryText>Odd cars, Fake MacLaren, </queryText>
<visualQueues>Jeremy Clarkson, Richard Hammond, James May, Ferrari 430
Scuderia</visualQueues>
Visual cues can be found in queryText too
If TV Channel is mentioned, perform filtering:
<visualQueues>Cannabis on BBC ONE</visualQueues>
Should also be done on show titles (for next year?)
For some runs, filter at video level first
Making a text query on the video index
Use 20 first video for segment search
Focused search
14
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
15. Search task
www.linkedtv.eu
Different granularities:
scenes
partial scenes (begin at shot ; ends at the corresponding scene ending)
temporally clustered shots (inside a video)
sliding window
Different textual data (transcript/ASR)
With/Without Visual Concepts
With/Without use of synonyms
9 runs
goal : comparing approaches and features
15
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
17. Hyperlinking Task
www.linkedtv.eu
Re-use of the search component
Shot clustering approach
Scene approach
Create a query from the anchor!
Get subtitle and shots aligned with anchor
Text query: extract keywords using Alchemy API (highest weight to anchor
than context)
Visual cues query: for each concept, highest score over all shots
Use of “MoreLikeThis” (MLT) feature in Lucene, combined with THD
sliding window approach
Create temporary documents from the anchor!
17
THD = Targeted Hypernym Discovery (UEP): returns semantic
annotation, synonyms
MLT: finding similar documents as input
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
18. Hyperlinking results
www.linkedtv.eu
Run
18
P-10
P-20
0.0577
0.4467
0.3200
0.2067
LA SW MLT
0.1201
0.4200
0.4200
0.3217
LA scenes
0.1770
0.6867
0.5867
0.4167
LC clustering 0.0823
Scenes search in
LC condition
(anchor + context)
P-5
LA clustering
Scenes search in
LA condition
(anchor only)
MAP
0.5733
0.4833
0.2767
LC SW MLT
0.1820
0.5667
0.5667
0.4300
LC scenes
0.2523
0.8133
0.7300
0.5283
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
19. Conclusions
www.linkedtv.eu
Major findings
Scene segmentation approach performs best
Improvement when using visual concepts
when carefully employed
Future work
Improve scene detection
Closer follow human perception
Improve the link between query and visual concepts
Use named entities
Thank you
Questions?
19
LinkedTV @ MediaEval Search and Hyperlinking 2013
10/18/2013
Notas del editor
Input from Daniel regarding the progress in Audio Analysis and VideoOCR*** adoption of new video OCR*** speech processing - preparation of new paradigms: **** deep neural networks (automatic speech recognition) **** i-vectors + SVMs using cosine kernel (speaker recognition)