This are the slides of the keynote talk I gave at CBMI 2019 (on September 4, 2019 in Dublin, Ireland) about the Video Browser Showdown (VBS) competition.
2. 1. Introduction
2. VBS Teams and Tools
3. Performance Evaluation at VBS
4. Experiences and Challenges
5. Conclusion
Outline
Klaus Schoeffmann CBMI 2019 2
4. • Live evaluation platform for interactive video search
• Content-based search in videos
• Evaluates several types of search (KIS, AVS)
• Competitive setup
• Large dataset and challenging queries
• Direct competition with other systems (for several hours)
• Reveals both search performance and usability
• Sophisticated scoring
• Expert and novice session
• Entertaining event
• Part of the welcome reception at the MMM conference
• Showcases video retrieval for conference participants
The Video Browser Showdown (VBS)
Klaus Schoeffmann CBMI 2019 4
Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video
Browser Showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 29 (February 2019), 18 pages. DOI: https://doi.org/10.1145/3295663
5. VBS 2019 - Video
Klaus Schoeffmann CBMI 2019 5
http://videolectures.net/multimediamodeling2019_video_browser_showdown/
6. • Known-Item Search (KIS)
• We want to find one specific target scene (20s duration)
• Target scene is known to the searcher, but no knowledge about
location/position
• Two different types
• Visual KIS
• Simulates that the searcher knows how the scene looks like
• VBS: repetitive presentation of target scene (max 5 mins)
• Textual KIS
• Simulates that the searcher knows a description of the target scene
• VBS: text description with increasing details (max 8 mins)
Evaluated Tasks at VBS
Klaus Schoeffmann CBMI 2019 6
"Shots of a factory hall from above. Workers transporting gravel with
wheelbarrows. Other workers putting steel bars in place. The hall has
Cooperativa Agraria written in red letters on the roof. There are 1950s style
American cars and trucks visible in one shot."
7. • Ad-hoc Video Search (AVS)
• Since 2017 in collaboration with TRECVID AVS (Georges Quénot, George Awad)
• We want to find many scenes for a specific content class/topic
• For example:
1. ”an adult person running in a city street”
2. ”a chef or cook in a kitchen”
3. ”outdoor shots with snow or ice conditions”
Evaluated Tasks at VBS
Klaus Schoeffmann CBMI 2019 7
Teams should solve KIS and AVS tasks as
quickly and accurately as possible
(max 5 mins per topic)
Awad, G., Butt, A., Curtis, K., Lee, Y., Fiscus, J., Godil, A., ... & Kraaij, W. (2018, November). Trecvid 2018: Benchmarking video activity detection, video captioning and matching, video storytelling linking and video search.
8. • Provide a platform for comparable evaluation of video search tools
• as an alternative to user studies and user simulations
• same queries, same dataset, same conditions, at the same time!
• standardized interaction logging
• Push research on video content search tools that are
• highly interactive
• efficient in terms of search time and accuracy
• flexible in terms of queries
• easy to use
Research Goals of the VBS
Klaus Schoeffmann CBMI 2019 8
KIS
AVS
visual textual
Experts x x x
Novices x x
L. Rossetto, R. Gasser, J. Lokoc, W. Bailer, K. Schoeffmann, B. Muenzer, T. Soucek, P. A. Nguyen, P. Bolettieri, A. Leibetseder, S. Vrochidis, "Interactive Video Retrieval in the Age of Deep Learning - Findings from the Video Browser Showdown 2019",
in IEEE Transactions on Multimedia, in review (2019).
9. • V3C1 (Vimeo Create Commons Collection)
• 1,000 hours of video content
• 7,475 video files and additional data
• Metadata from Vimeo (including
category and content description)
• Download information
• Master shot reference with
1,082,659 segments (keyframes,
thumbs, segmentation info)
• Several different languages
• Around 25% English
• Varying duration (3 min to 1h)
Video Dataset (Since 2019)
Klaus Schoeffmann CBMI 2019 9
Rossetto, L., Schuldt, H., Awad, G., & Butt, A. A. (2019, January). V3C–A Research Video Collection. In International Conference on Multimedia Modeling (pp. 349-360). Springer, Cham.
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., & Awad, G. (2019, June). V3C1 Dataset: An Evaluation of Content Characteristics. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (pp. 334-338). ACM.
10. • 2012 (KIS in single videos)
• 2012: 30 video, 38 hours, visual (v)
• 2013 (KIS in single videos)
• 10 videos, 10 hours, visual (v) + textual (t)
• 2014 (KIS in single videos & collection)
• 76 videos, 30 hours for collection search, v + t
• 2015 (KIS in collection)
• 153 videos, 100 hours, v + t
• 2016 (KIS in collection)
• 442 videos, 250 hours, v + t
• 2017-2018 (KIS and AVS in collection)
• 4573 videos, 600 hours, v + t
• AVS partly from TRECVID
• 2019-2020 (KIS and AVS in collection)
• 7475 videos, 1000 hours, v + t
• 1.08 million segments
VBS 2012-2019: Dataset Size
Klaus Schoeffmann CBMI 2019 10
2 1 30
100
250
600 600
1000 1000
2012 2013 2014 2015 2016 2017 2018 2019 2020
Hours of Video Content
Data source:
• 2012-2014: EBU SCAIE (and NHK), TOSCA-MP EU project
• 2015-2016: BBC MediaEval (Search & Hyperlinking task)
• 2017-2018: IACC.3 (Internet Archive Creative Commons)
• 2019-2020: V3C1 (Vimeo Creative Commons Collection)
12. VBS Teams
Klaus Schoeffmann CBMI 2019 12
Team Organization Country 2012 2013 2014 2015 2016 2017 2018 2019
ITEC Institute of Information Technology, Klagenfurt University Austria x x x x x x xx x
vitrivr University of Basel Switzerland x xx x x x
VIRET Charles University, Prague Czech Republic x x x x x x
VIREO Hong Kong City University Hong Kong x x x
VISIONE Institute of Information Science and Technologies, Pisa Italy x
VERGE Information Technologies Institute/CERTH, Thessaloniki Greece x x x x x x
NII-UIT NII, Tokyo / University of Information Technology, VNUHCM, Ho Chi Min City Japan / Vietnam x x x x x
NECTEC NECTEC, Pathum Thani Thailand x
HTW HTW Berlin, University of Applied Sciences Germany x x x
DCU Dublin City University Ireland x x x x x
JR JOANNEUM RESEARCH, Graz Austria x x x x
UU Utrecht University The Netherlands x x
NUS National University of Singapore Singapore x x
VideoCylce University of Mons Belgium x
FRTRD France Telecom Research & Development, Beijing China x
OVIDIUS artemis Department, Télécom SudParis, Evry Cedex France x
ISYS Institute of Information Systems, Klagenfurt University Austria x
UPC Technical University of Catalonie, Barcelona Spain x
18 unique teams from 17 countries (corresponding author only)
13. 1st Video Browser Showdown (MMM 2012)
Klaus Schoeffmann CBMI 2019 13
Only search tasks in single video files (about 1-2h duration; visual KIS)
Winner's System (Klagenfurt University)
K. Schoeffmann, "A User-Centric Media Retrieval Competition: The Video Browser Showdown 2012-2014," in IEEE MultiMedia, vol. 21, no. 4, pp. 8-13, Oct.-Dec. 2014.
14. A Few More Current VBS Tools
Klaus Schoeffmann CBMI 2019 14
15. 7th Video Browser Showdown (MMM 2018)
Klaus Schoeffmann CBMI 2019 15
https://www.youtube.com/watch?v=CA5kr2pO5b
17. • Content analysis and retrieval methods
• Partly custom shot detection approaches
• Keyframe-based concept detection
• with several different networks (Inception v3, VGG, ResNet, NasNet, GRU)
• trained on different datasets (ImageNet 1000/21k, TRECVID SIN-345, Places-205, …)
• Google Vision API
• Object detection
• R-CNN, YOLO, or DHSNet (Deep Hierarchical Saliency Net)
• Color analysis (different descriptors; e.g., FS, CL/HistMap, etc.) with filtering, clustering, SOM, and/or query-by-sketch
• Motion analysis
• ASR and OCR analysis (e.g., with TesseractOCR)
• Different descriptors for similarity search
• typically weights from deep CNN layers
• BoVW, VLAD, …
• Ranking with weighting models (and different distance models),
TF/IDF-based retrieval or similar (e.g., Lucene or Solr) and/or Boolean filtering (AND, OR, NOT)
• Indexing methods
• Own special filters (linear, m-trees, pivot-tables, …)
• Hierarchical or clustering-based search (e.g., kNN)
• Relational databases and own extensions (ADAMpro)
Content Analysis and Indexing
Klaus Schoeffmann CBMI 2019 17
Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video
Browser Showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 29 (February 2019), 18 pages. DOI: https://doi.org/10.1145/3295663
18. • In general, flexible set of search features
• Try to find unique content of 20s segment
and filter for it
• Something specific that quickly narrows
down the search, e.g.,
• A specific number of faces
• Temporal filtering (shopàdooràshop)
• Specific dominant color ("green")
• Particular content class ("military")
• Appearing objects ("snowmobile", ”dolphin")
• Written or spoken text
• In worst case a more general concept
combined with fast interactive browsing
• "wood", "forest", "tree"
• "black and white content"
• "music", "concert"
How Do VBS Teams Search?
Klaus Schoeffmann CBMI 2019 18
19. Interaction Features (VBS 2019)
Klaus Schoeffmann CBMI 2019 19
Interaction
heatmap
L. Rossetto, R. Gasser, J. Lokoc, W. Bailer, K. Schoeffmann, B. Muenzer, T. Soucek, P. A. Nguyen, P. Bolettieri, A. Leibetseder, S. Vrochidis, "Interactive Video Retrieval in the Age of Deep Learning - Findings from the Video Browser Showdown 2019",
in IEEE Transactions on Multimedia, in review (2019).
27. • Tasks are presented and evaluated via the VBS evaluation server
• Query presentation (clip or text)
• Shows remaining time
• Evaluates incoming submissions
• correct/wrong
• search time
• score
• Computes scores
• Shows statistics/ranking
VBS Evaluation Server
Klaus Schoeffmann CBMI 2019 27
https://github.com/klschoef/vbsserver
28. VBS Setup
28Klaus Schoeffmann CBMI 2019
2012
2014
2013
2017
2018
2019
2015 2016
Team
3
Team
5
Evaluation Server
(Projector Wall)
Judge
Team
7
Team
2 Team
8
Team
1
Team
9
Judge
Audience
Audience
Audience
Moderator
Team
6
Team
4
29. VBS 2012-2019: Tasks/Sessions
Klaus Schoeffmann CBMI 2019 29
Year Location Dataset Content Teams KIS v KIS t KIS v N AVS AVS N
2012 Klagenfurt EBU SCAIE/NHK 38h* 11 8 6
2013 Huang Shan EBU SCAIE/NHK 10h* 6 10 6
2014 Dublin EBU SCAIE/NHK 30h 7 10+10 10 10+10
2015 Sydney BBC/MediaEval 100h 9 10 6 4 (+2 t N)
2016 Miami BBC/MediaEval 250h 9 10 10 6
2017 Reykjavik IACC.3 600h 6 7 7 7
2018 Bangkok IACC.3 600h 9 4 4+10 4 4 4
2019 Thessaloniki V3C1 1000h 6 10 8 5 5 6
2020 Daejeon V3C1 1000h
*…total collection set, but in 2012-2013 each task was limited for one single video file
30. Search Time (VBS 2015-2017)
Klaus Schoeffmann CBMI 2019 30
2015 (100 hours) 2016 (250 hours) 2017 (600 hours)
• Visual KIS typically quite fast
• Textual KIS much harder to solve
• Experts faster than novices
• AVS easier than KIS
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer and G. Awad, "On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017," in IEEE Transactions on Multimedia, vol. 20, no. 12, pp. 3361-3376, Dec. 2018.
31. Search Time (VBS 2018 and 2019)
Klaus Schoeffmann CBMI 2019 31
2019 (1000 hours)2018 (600 hours)
• In 2019 only one textual KIS task not solved
(within time limit)
• Novices are very fast at AVS too (first submission)
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer and G. Awad, "On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017," in IEEE Transactions on Multimedia, vol. 20, no. 12, pp. 3361-3376, Dec. 2018.
32. YOY Improvement?
Klaus Schoeffmann CBMI 2019 32
162.56
135.24
106.32 111.13
186.58 184.02
246.98 256.12
154.42 154.09
137.60
2016 2017 2018 2019
Average Search Time in Seconds (+/- 1 S.D.)
KIS Visual KIS Textual KIS Visual Novice
600 hours 1000 hours600 hours250 hours
33. YOY Improvement?
Klaus Schoeffmann CBMI 2019 33
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2016 2017 2018 2019
Average Task Solve Ratio (per Team)
KIS Visual KIS Textual KIS Visual Novice
600 hours 1000 hours600 hours250 hours
34. YOY Improvement?
Klaus Schoeffmann CBMI 2019 34
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2016 2017 2018 2019
Task Solve Ratio (Overall)
KIS Visual KIS Textual KIS Visual Novice
35. • General goals
• Reward for (1) solving a task and for (2) being fast
• Fair scoring and penalty for wrong submissions
• Known-Item Search
VBS Scoring
Klaus Schoeffmann CBMI 2019 35
• sC is time-independent reward for solving a task i (e.g., 50)
• fTS is a linearly decreasing function, based on search time t
• g is a guarantee between the last accepted correct submission and the first
potential late correct submission (e.g., 30s) – i.e. the time limit is extended by g
Visual KIS: 5 min
Textual KIS: 7/8 min
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer and G. Awad, "On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017," in IEEE Transactions on Multimedia, vol. 20, no. 12, pp. 3361-3376, Dec. 2018.
36. • Ad-hoc Video Search
• Scoring based on Precision and Recall according to
• correct and incorrect submissions of the team (C and I)
• pool of correct shot submissions of all teams for the task (P)
• quantization function q that merges temporally close correct shots (into ranges; since
VBS2018 ranges are fixed static non-overlapping segments of 180s duration)
• We mitigate impact of incorrect submissions to reduce penalty in case of
ambiguous topic descriptions
VBS Scoring
Klaus Schoeffmann CBMI 2019 36
Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video
Browser Showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 29 (February 2019), 18 pages. DOI: https://doi.org/10.1145/3295663
37. • Final score for a team j is the average score over all five categories,
normalized by the corresponding maximum of each category/session c
• Visual KIS expert
• Textual KIS expert
• Visual KIS novice
• AVS expert
• AVS novice
VBS Scoring
Klaus Schoeffmann CBMI 2019 37
39. • Since 2017 we share some tasks but differ in search modality
• TRECVID AVS F – fully-automatic search (based on text query)
• TRECVID AVS M – manual search (user can change query once)
• VBS AVS – completely interactive (user-based search)
• How does the performance compare?
• No direct comparison possible
• Both competitions have incomplete ground truth
• TRECVID results are ranked, AVS submissions at VBS not
• We use different evaluation metrics (xInfAP and Precision * Recall)
• However, at least we can assess
• Precision --> much higher at VBS
• Simulated AP at VBS, which is ”averaged AP” over all permutations of AVS submissions
TRECVID AVS vs. VBS AVS
Klaus Schoeffmann CBMI 2019 39
40. TRECVID AVS 2017 vs. VBS AVS 2018
Klaus Schoeffmann CBMI 2019 40
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer and G. Awad, "On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017," in IEEE Transactions on Multimedia, vol. 20, no. 12, pp. 3361-3376, Dec. 2018.
41. TRECVID AVS 2017 vs. VBS AVS 2018
Klaus Schoeffmann CBMI 2019 41
J. Lokoč, W. Bailer, K. Schoeffmann, B. Muenzer and G. Awad, "On Influential Trends in Interactive Video Retrieval: Video Browser Showdown 2015–2017," in IEEE Transactions on Multimedia, vol. 20, no. 12, pp. 3361-3376, Dec. 2018.
43. The Power of Human Computation
Klaus Schoeffmann CBMI 2019 43
VERGE: shot and scene detection, concept detection
(SIFT, VLAD, CNNs), similarity search.
UU: tiny thumbnails only, powerful user.
Outperformed VERGE and was finally ranked 3rd!
Example from the Video Browser Showdown 2015:
44. • Visual KIS
• Originally we simply repeated the 20s clip until the task ended
• In order to simulate a fading memory, we now incrementally blur the clip
• At VBS2020 we will even start with a blurred and color-less presentation
• Textual KIS
• Incrementally revealed details to simulate that a person remembers and tells more
and more details to the searcher…
• This makes this kind of tasks much harder
VBS 2012-2019: Task Presentation
Klaus Schoeffmann CBMI 2019 44
0s: "Shots of a factory hall from above. Workers transporting gravel with
wheelbarrows. Other workers putting steel bars in place."
100s: "The hall has Cooperativa Agraria written in red letters on the roof."
200s: "There are 1950s style American cars and trucks visible in one shot."
45. • Quite challenging to evaluate
• No complete ground truth (neither we nor TRECVID)
• Our solution: use judges to perform live evaluation of submissions for which we do not have g.t. yet
AVS Tasks
Klaus Schoeffmann CBMI 2019 45
21 23
87
41
13
0
20
40
60
80
100
2017 2018 2019
Average Number of
Submissions per AVS Task
AVS AVS Novice
46. • 1,848 shots judged live (2018: 2,780 shots)
• About 40% of submitted shots were not in TRECVID g.t.
• Verification experiment
• 1,383 shots were judged again later
• Judgement were diverging for 23% of the shots,
in 88% of those cases the live judgement was “incorrect”
• Judges seem to make incorrect decisions when in doubt
• While ground truth for later use is biased, still same conditions for all teams in the room
• Needed to set up clear rules for live judges
• Like used by NIST for TRECVID annotations
Evaluation of AVS Tasks at VBS 2017
Klaus Schoeffmann CBMI 2019 46
Judge 1: false Judge 2: true Judge 1: true Judge 1: false
same video
47. AVS Tasks at VBS 2019
Klaus Schoeffmann CBMI 2019 47
Server issues:
For one (too simple) AVS task, the
server got 1000 submissions (and
stuck due to locking/scoring
issues)…
49. • The VBS is an extensive evaluation platform for interactive video search
• Search in large datasets
• Advanced scoring model
• Sophisticated and flexible tools
• It has shown that we are quite efficiently able to interactively find very specific
content in reasonably large collections
• Search times do rather decrease, despite making the collection larger
• There are many different approaches that work well; a few have proven to be very effective
at narrowing the search
• Hierarchical color-based filtering/refinement
• Temporal filtering (temporal color sketches)
• Text-based filtering (ASR and OCR)
• It seems to be very advantageous to have a tool with flexible search features for different
use cases of search (e.g., sketch, concept, color/motion/object filter, etc.)
• However, if there are too many options, users often do not know where to start…
Conclusion
Klaus Schoeffmann CBMI 2019 49
50. • We want to continue the VBS and further increase its challenge…
• More realistic task presentation
• e.g., in 2020 we will use blurry, color-less visual KIS presentation
• This also simulates a fuzzy memory
• Increasing the dataset
• We will add additional parts of the V3C collection
(e.g., V3C2 and V3C3 have another 1,300 and 1,500 hours of content)
• We reach computing limits of indexing
• At least for laptops
• Server-based or distributed and collaborative approaches will help
• With larger datasets
• The tools integrate more and more automatic retrieval and filtering features
• Still the user seems to be very important though
• See example of ITEC for “Visual KIS Novice” category in 2018
• Newest evaluation of VBS2019 shows that users spend most time in browsing results…
Conclusion
Klaus Schoeffmann CBMI 2019 50
51. • V3C1 dataset (with shot segmentation and keyframe) freely available
• Just need to sign data agreement form
• https://videobrowsershowdown.org/call-for-papers/
• Analysis data available
• https://github.com/klschoef/V3C1Analysis
• Inception v3 classifications for 21k ImageNet classes
• Filters for keyframes with 11 dominant colors, 1/2/3/4/many faces, text
• Global features: color layout, edge histogram
• Metadata (bitrate, resolution, segment duration, upload date, …)
• Luca Rossetto et al. 2019. The V3C1 Dataset: Advancing the State of the Art in Video Retrieval. ACM
SIGMM Records, Issue 2
• The winners of VBS 2019 provide their software component
• Cineast, ADAMpro, and vitrivr UI
• https://vitrivr.org/vitrivr.html
• VBS server with test tasks available
• https://videobrowsershowdown.org/call-for-papers/vbs-server/
Join the VBS!
Klaus Schoeffmann CBMI 2019 51
Demo paper submission
deadline for VBS2020 is
September 16, 2019
Berns, F., Rossetto, L., Schoeffmann, K., Beecks, C., & Awad, G. (2019, June). V3C1 Dataset: An Evaluation of Content Characteristics. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (pp. 334-338). ACM.
52. VBS is a Collaborative Effort – Thanks!
Klaus Schoeffmann CBMI 2019 52
…in particular to all moderators, judges, and the teams that
participated in the last 8 years!
Werner Bailer Jakub Lokoc Cathal GurrinKlaus Schoeffmann
VBS Organization Team
Luca Rossetto George Awad
Contributors and Collaborators
Bernd Muenzer Marco Hudelist Andreas Leibetseder Sabrina KletzJürgen Primus Claudiu Cobarzan
Colleagues from Klagenfurt University
VBS 2019 Participants