SlideShare a Scribd company logo
1 of 43
Download to read offline
TRECVID 2019 INSTANCE RETRIEVAL
INTRODUCTION AND TASK OVERVIEW
Wessel Kraaij
Leiden University; Netherlands
Organisation for Applied Scientific Research (TNO)
George Awad
Georgetown University; National Institute of Standards and
Technology
Keith Curtis
National Institute of Standards and Technology
Disclaimer
The identification of any commercial product or trade name does not imply endorsement or
recommendation by the National Institute of Standards and Technology.
TRECVID 2019
Table of contents
• Task Definition
• Data
• Topics (Queries)
• Participating teams
• Evaluation & results
• General observation
2
TRECVID 2019
Task
From 2013 – 2015
• The task asked systems to find a specific object, person or location
in any context using a small set of image and video examples.
From 2016 - 2018
• A different query type was used: find a specific person in a specific
location.
In 2019 - 2021
• A new query type is being used: find a specific person doing a
specific action.
System task:
▪ Given a topic with:
▪ 4 example images of the target person
▪ 4 Region of Interest (ROI)-masked images of the target person
▪ 4 to 6 video examples of a specific action
▪ Return a list of up to 1000 shots ranked by likelihood that they contain
the target person doing the target action
▪ Automatic or interactive runs are accepted
3
TRECVID 2019
Data …
• The British Broadcasting Corporation (BBC) and the Access
to Audiovisual Archives (AXES) project made 464 h of the
BBC soap opera EastEnders available for research
• 244 weekly “omnibus” files (MPEG-4) from 5 years of broadcasts
• 471527 shots
• Average shot length: 3.5 seconds
• Transcripts from BBC
• Per-file metadata
• Represents a “small world” with a slowly changing set of:
• People (several dozen)
• Locales: homes, workplaces, pubs, cafes, open-air market, clubs
• Objects: clothes, cars, household goods, personal possessions,
pets, etc
• Views: various camera positions, times of year, times of day,
• Use of fan community metadata allowed, if documented
5
TRECVID 20196
Majority of episodes filmed at Elstree
studios. Sometimes filmed on ‘location’.
EastEnders’ world
TRECVID 20197
Topic creation procedure @ NIST
• Viewed several videos to develop a list of recurring people,
actions and their overlapping.
• Listed in order the most frequent actions and most frequent
person’s performing them
• Created ≈90 topics targeting recurring specific persons doing
specific actions.
• Chose 50 topics as a representative sample, including 30
unique topics for 2019 and 20 common topics for 2019 -
2021. Each topic includes images for target persons and
example videos of the specific actions.
• Filtered example shots from the submissions if it satisfies the
topic.
TRECVID 20198
Global test condition: type of training
data
Effect of examples – 2 conditions:
• A – one or more provided images – no video
• E - video examples (+ optional image examples)
Sources of Training Data:
• A – Only sample video 0
• B - Other external data only
• C – Only provided images/videos in the official query
• D - Sample video 0 AND provided images/videos in the
official query (A+C)
• E – External data AND NIST provided data (sample
video 0 OR official query images/videos)
TRECVID 20199
Topics – segmented “person” example
images
Bradley Denise
Dot Heather
TRECVID 201910
Topics – segmented “person” example
images
Ian Jack
Jane Max
TRECVID 201911
Topics – segmented “person” example
images
Phil Sean
Shirley Stacey
TRECVID 201912
Sample Actions
Open door & enter Sit on couch
TRECVID 201913
Sample Actions
Eating Hugging
TRECVID 201914
30 Unique Queries – 2019
Max Pat Ian Denise Phil Jane Dot Bradley Jack Stacey
Holding glass x x x x
Sit on couch x x x
Holding phone x x x
Drinking x x x
Open door & enter x x
Open door & leave x x
Shouting x x x
Eating x x
Crying x x
Laughing x x
Go up / down stairs x x
Carrying bag x x
30 x unique queries : find {Max, Pat, Ian, Denise, Phil, Jane, Dot, Bradley, Jack, Stacey} doing
{Holding glass, Sit on couch, Holding phone, Drinking, Eating, Crying, Laughing, Shouting,
Open door & leave, Open door & enter, Go up / down stairs, Carrying bag}
TRECVID 201915
20 Common Queries – 2019-2021
Sean Max Denise Phil Dot Heather Jack Shirley Stacey
Kissing x x
Sit on couch x x
Holding phone x x
Drinking x x
Open door & enter x x
Open door & leave x x
Shouting x x
Hugging x x
Close door without
leaving
x x
Stand & talk at door x x
20 x common queries : find {Sean, Max, Denise, Phil, Dot, Heather, Jack,
Shirley, Stacey} doing {Kissing, Sit on couch, Holding phone, Drinking,
Shouting, Hugging, Open door & leave, Open door & enter, Close door
without leaving, Stand & talk at door}
TRECVID 201916
Team Organization Run Types
Submitted
F: automatic, I:
Interactive
BUPT_MCPRL Beijing University of Posts and Telecommunications F_E (2), I_E (1)
HSMW_TUC Chemnitz University of Technology, University of Applied Sciences Mittweida F_E (4)
Inf Monash University, Renmin University, Shandong University F_E (3)
WHU_NERCMS National Engineering Research Center for Multimedia Software,
Wuhan University
F_E (3)
NII_Hitachi_UIT National Institute of Informatics, Japan (NII); Hitachi, Ltd; University of
Information Technology, VNU-HCM
F_A (4), F_E(4)
PKU_ICST Peking University F_A (3), F_E (3),
I_E (1)
INS 2019: 6 Finishers (out of 12)
TRECVID 201917
Evaluation
For each topic the submissions were pooled and judged down to
max rank 520, resulting in 141599 judged shots (≈ 473 person-h).
• 10 NIST assessors played the clips and determined if they
contained the topic target or not.
• 6 592 clips (avg. 220 / topic) contained the topic target (4.66
%)
• True positives per topic: min 29 med 187 max 575
• The task is treated as a form of ranking and thus the
trec_eval_video tool was used to calculate average precision,
recall, precision, etc.
• To measure efficiency, speed was also measured.
• In total, 26 automatic and 2 interactive runs were submitted.
TRECVID 201918
Results by team
TRECVID 201919
# Query
9258 Find Pat Drinking
9256 Find Phil Holding phone
9253 Find Pat Sit on couch
9257 Find Jane Holding phone
9274 Find Jack Shouting
9255 Find Ian Holding phone
9275 Find Stacey Crying
9273 Find Jack Drinking
9265 Find Max Crying
9269 Find Jack Sit on couch
9254 Find Denise Sit on couch
9266 Find Jane Laughing
9272 Find Stacey Drinking
9278 Find Stacey Go up/down stairs
9252 Find Denise Holding Cup/Glass
9251 Find Pat Holding Cup/Glass
9249 Find Max Holding Cup/Glass
9268 Find Phil Go up/down stairs
9261 Find Max Shouting
9262 Find Phil Shouting
9263 Find Jane Eating
9250 Find Ian Holding Cup/Glass
9277 Find Jack Open door & leave
9264 Find Dot Eating
9260 Find Dot Open door & enter
9267 Find Dot Open door & leave
9270 Find Stacey Carrying bag
9271 Find Bradley Carrying bag
9259 Find Ian Open door & enter
9276 Find Bradley Laughing
Results by topics - automatic
*Mean score of AP per action only
Shouting has avg. high scores, but high median scores
Holding phone (0.1252)* easier to find
Open door & enter (0.0166)* hard to find
Open door & leave (0.0201)* hard to find
Carrying bag (0.0228)* hard to find
TRECVID 201920
Some observations..
• Poor results for topics involving Dot and Bradley could
indicate that they are hard people to find.
• However - previous iterations of the INS task showed them
to be among the easiest people to find. What gives?
• Actions involving Dot consistently score poorly, whether it is
Dot or another character involved. Seems to be more a case
of hard actions to recognise.
• Bradley laughing - very poor results - but looking at frequent
false positives on this topic reveal lots of instances of
contrived laughter from Bradley. Obvious instances of
exaggerated faked / contrived laughter do not count as
laughing.
TRECVID 201921
Easier Topics
# Query
9274 Find Jack Shouting
9262 Find Phil Shouting
9261 Find Max Shouting
9254 Find Denise Sit on couch
9255 Find Ian Holding phone
9273 Find Jack Drinking
9272 Find Stacey Drinking
9252 Find Denise Holding Cup/Glass
9253 Find Pat Sit on couch
9266 Find Jane Laughing
9278 Find Stacey Go up/down stairs
9275 Find Stacey Crying
9269 Find Jack Sit on couch
9257 Find Jane Holding phone
9249 Find Max Holding Cup/Glass
9258 Find Pat Drinking
9251 Find Pat Holding Cup/Glass
9256 Find Phil Holding phone
9250 Find Ian Holding Cup/Glass
9263 Find Jane Eating
9260 Find Dot Open door & enter
9264 Find Dot Eating
9265 Find Max Crying
9268 Find Phil Go up/down stairs
9277 Find Jack Open door & leave
9270 Find Stacey Carrying bag
9267 Find Dot Open door & leave
9271 Find Bradley Carrying bag
Number of runs with AP above 0.06
TRECVID 201922
Hard Topics
# Query
9259 Find Ian Open door & enter
9276 Find Bradley Laughing
9267 Find Dot Open door & leave
9271 Find Bradley Carrying bag
9277 Find Jack Open door & leave
9270 Find Stacey Carrying bag
9263 Find Jane Eating
9264 Find Dot Eating
9260 Find Dot Open door & enter
9265 Find Max Crying
9268 Find Phil Go up/down stairs
9266 Find Jane Laughing
9275 Find Stacey Crying
9278 Find Stacey Go up/down stairs
9269 Find Jack Sit on couch
9249 Find Max Holding Cup/Glass
9257 Find Jane Holding phone
9251 Find Pat Holding Cup/Glass
9258 Find Pat Drinking
9250 Find Ian Holding Cup/Glass
9256 Find Phil Holding phone
9255 Find Ian Holding phone
9273 Find Jack Drinking
9272 Find Stacey Drinking
9252 Find Denise Holding Cup/Glass
9253 Find Pat Sit on couch
9254 Find Denise Sit on couch
9262 Find Phil Shouting
9261 Find Max Shouting
9274 Find Jack Shouting
Number of runs with AP below 0.06
TRECVID 201923
Some observations..
• From the previous two bar charts we can safely say that
shouting is the easiest topic to find. This was not obvious
from the boxplot of results by topics.
• Drinking, sitting on couch, and holding phone are also
among the easiest topics to find.
• Open door & leave, open door & enter, and carrying bag are
among the hardest topics to find.
TRECVID 201924
Some Frequent False Positives
Jack sit on couch Bradley carrying bag
Jack is sitting on an armchair - a single
seating structure. Topic specifies a couch - a
comfortable seating structure which seats
more than one person.
Bradley is seated next to Stacey. Dot comes
into the picture carrying a bag.
TRECVID 201925
Some Frequent False Positives
Dot open door & leave Stacey Drinking
Dot opens the door to let people in and then
closes the door again. Does not leave the
room / house.
In this shot we see Stacey holding a glass.
Later in the shot Mo is seen drinking. Topic
specifies that the person must be seen moving
the glass/cup to their mouth and performing a
sipping or drinking action.
TRECVID 201926
Some Frequent False Positives
Stacey crying Jack shouting
Stacey appears to have hurt herself, with
blood around her left eye. She is rubbing her
eye but does not appear to be crying.
This appears to be a frank discussion between
Jack and another man. The other person
appears to be angry and raises his voice at
Jack, however Jack does not raise his voice.
TRECVID 201927
Some Frequent False Positives
Phil holding phone Pat drinking
Phil is seen singing into a microphone, along
with Garry. Another person is holding a phone
recording them.
Pat is seen clapping hands. Later in the shot
she turns her head to look over at someone.
Another person moves a glass to their mouth.
TRECVID 201928
Further observations from viewing most frequent
false positives of worst performing topics
• Open door & enter - Systems tended to classify any shots
with target person and a doorway as a positive detection.
More work needed on training systems to classify the action
itself.
• Open door & leave - Same as above, systems classifying any
shots with target person and a doorway as positive
detection. More work needed on training systems to classify
the action of opening a door and leaving.
• Carrying bag - Fewer conclusions can be drawn. Many
instances where a shot is classified as a positive detection if
the target person appears in the shot and a different person
is carrying a bag, however, many other shots contain the
target person with no bag visible in the shot at all.
TRECVID 201929
Automatic Run results + Randomization testing
= > > > > > > >
= > > > > > > >
= > > > > > >
= > > > > > >
= > > > > >
= > > > >
= > > >
= > >
=
=
1 2 3 4 5 6 7 8 9 10
> p < 0.05
0.242 F_A_PKU_ICST_4*^
0.239 F_E_PKU_ICST_1*⤒
0.235 F_A_PKU_ICST_3^⤉
0.230 F_E_PKU_ICST_5⤒⤉
0.201 F_E_PKU_ICST_6
0.198 F_A_PKU_ICST_7
0.119 F_E_BUPT_MCPRL_1
0.116 F_E_BUPT_MCPRL_2
0.024 F_E_NII_Hitachi_UIT_2⍏
0.024 F_A_NII_Hitachi_UIT_3⍏
MAP
Top 10 runs across all teams (automatic)
p = probability the row run scored better than the column run due to chance
*^⤒⤉⍏ = difference not
statistically significant
TRECVID 201930
Mean Average Precision vs. per run clock processing time
(automatic)
PKU-ICST
TRECVID 201931
Interactive Run results + Randomization testing
= >
=
1 2
> p < 0.05
0.360 I_E_PKU_ICST_2
0.212 I_E_BUPT_MCPRL_4
MAP
Runs across all teams (interactive)
p = probability the row run scored better than the column run due to chance
TRECVID 201932
Results by example set (A/E) - automatic
TRECVID 201933
Results by Data Source
TRECVID 201934
Some general observations
about the task
• Slight decrease in number of participants and finishers, but
higher % of participants finished the task.
• Many more teams now using E condition - training with video
examples. Perhaps more necessary now with action
recognition. But - Results from teams using both show little
difference between image & video and image only!
• Interactive search task:
• Limited participation - only two interactive runs this year.
• First year of updated task - results cannot be compared in
any way to previous years - in subsequent years we can
compare using the common topics.
TRECVID 201935
Some general observations
about the task – Data Source
•Best results by far achieved using external data
plus NIST provided data.
•Huge gap to results from systems trained using
only external data or using only sample video 0.
TRECVID 201936
Further Conclusions
•Person recognition has been a feature of the INS
task since 2013 and is very mature by this stage.
Very few frequent false positives misidentify the
person.
•Action recognition is a new feature of INS task.
The much increased difficulty of the new INS task
is due to this. Requires much more work to reach
an acceptable level of maturity.
TRECVID 201937
Further Conclusions
•Visual Concepts very important.
•Easier tasks mostly those with obvious visual
context (sit on couch, hold phone, hold glass,
etc.).
•Harder tasks tend to be more independent from
obvious visual context (crying, laughing, eating,
different actions involving a doorway hard to
isolate from others).
TRECVID 201938
Inf
• Person Search:
• MTCNN Model to detect faces
• VGG-Face2 for Feature Selection
• Cosine Distance to measure similarity.
• Action Search:
• Faster RCNN model
• Tracklets generated by DeepSort
• Fine tune RGB benchmark of I3D model
• Cosine similarity for action reranking
• Reranking:
• Person Search based
• Action search based
• Fusion based
• Submitted 3 automatic runs
TRECVID 201939
NII-Hitachi-UIT
• Person Search:
• VGG-Face2 for face representation
• Cosine Similarity for matching and reranking
• Action Search:
• Audio type: VGGish
• Visual type: C3D and Semantic Features from VGG-1k
• Cosine Similarity for matching and reranking
• Fusion of Similarity Scores and person and action for final
rank list.
• Submitted 8 Automatic runs:
• 4 using video example set
• 4 using no-video
TRECVID 201940
PKU-ICST
• Action Recognition:
• Frame-level action recognition
• Video-level action recognition trained using Kinetics-400
• Object Detection pre-trained on MS-COCO
• Facial Expression Recognition
• Person Specific Recognition:
• Query Augmentation by super resolution.
• Face Recognition with 2 Deep models + top N query expansion
• Instance Score Fusion:
• Score Fusion strategy to mine common information from action
specific and person specific recognition.
• Submitted 6 Automatic runs and 1 interactive run
TRECVID 201941
WHU-NERCMS
• Scheme 1:
• Retrieve person and action respectively first then fuse
• Person: Adopted face recognition model
• Action: Adopted 3D Convolutional networks
• Fusion: Weighting based and person identity based filter to
combine results
• Scheme 2:
• Retrieve and track specific people then retrieve actions
• Adopted face recognition to determine face ID and bind track ID
of the track detected by the object tracking.
• Adopt action recognition of tracked person target frames.
• Submitted 3 Automatic runs
TRECVID 201942
Overview of submissions (1)
• 6 out of 6 teams described INS runs for the TV
notebook
• 2 teams will present their INS experiments
2:15 - 2:45, (BUPT_MCPRL Team– Beijing University of Posts and
Telecommunications)
2:45 - 3:15, (HSMW_TUC Team– University of Applied Sciences Mittweida)
3:15 - 3:35, INS Discussion
TRECVID 201943
INS 2019 Discussion
• What do teams think of the new task (query type)?
• Are the selected actions important in real life
applications?
• What is the main challenge in the new query type?
- No enough training data?
- actions are difficult?
- fusion of persons + action detection results?
• Is the task still of an ad-hoc nature? Or converting
to a supervised learning?
• Do we need additional run categories?
TRECVID 201944
2019 - 2021 Progress Runs
• 20 common topics.
• Evaluate progress of participating teams 2019-
2021 using a set of common topics.
• 12 runs submitted by 3 separate teams in 2019.
Additional teams can still submit progress runs in
2020 on the 10 topics to be evaluated in 2021.
• 10 common topics will be evaluated in 2020.
• 10 remaining common topics evaluated in 2021.

More Related Content

Similar to TRECVID 2019 INSTANCE RETRIEVAL INTRODUCTION AND TASK OVERVIEW

Deep sec talk - Addressing the skills gap
Deep sec talk - Addressing the skills gapDeep sec talk - Addressing the skills gap
Deep sec talk - Addressing the skills gapColin McLean
 
[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video Search[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video SearchGeorge Awad
 
Data Literacy Presentation at Madison Tableau User Group
Data Literacy Presentation at Madison Tableau User GroupData Literacy Presentation at Madison Tableau User Group
Data Literacy Presentation at Madison Tableau User GroupBen Jones
 
How Developers Self-sabotage Their Learning (and what to do instead)
How Developers Self-sabotage Their Learning (and what to do instead)How Developers Self-sabotage Their Learning (and what to do instead)
How Developers Self-sabotage Their Learning (and what to do instead)All Things Open
 
TRECVID 2016 : Instance Search
TRECVID 2016 : Instance SearchTRECVID 2016 : Instance Search
TRECVID 2016 : Instance SearchGeorge Awad
 
Agile Development - Are you building the right thing ? (Follow the value)
Agile Development - Are you building the right thing ? (Follow the value)Agile Development - Are you building the right thing ? (Follow the value)
Agile Development - Are you building the right thing ? (Follow the value)Martin Nymann Vinther
 
BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019Stanford University
 
How To Start Agile Competence Center
How To Start Agile Competence CenterHow To Start Agile Competence Center
How To Start Agile Competence CenterJovan Vidić
 
Entrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextEntrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextP&CO
 

Similar to TRECVID 2019 INSTANCE RETRIEVAL INTRODUCTION AND TASK OVERVIEW (10)

Deep sec talk - Addressing the skills gap
Deep sec talk - Addressing the skills gapDeep sec talk - Addressing the skills gap
Deep sec talk - Addressing the skills gap
 
[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video Search[TRECVID 2018] Ad-hoc Video Search
[TRECVID 2018] Ad-hoc Video Search
 
Data Literacy Presentation at Madison Tableau User Group
Data Literacy Presentation at Madison Tableau User GroupData Literacy Presentation at Madison Tableau User Group
Data Literacy Presentation at Madison Tableau User Group
 
How Developers Self-sabotage Their Learning (and what to do instead)
How Developers Self-sabotage Their Learning (and what to do instead)How Developers Self-sabotage Their Learning (and what to do instead)
How Developers Self-sabotage Their Learning (and what to do instead)
 
Making Social Data Sing
Making Social Data SingMaking Social Data Sing
Making Social Data Sing
 
TRECVID 2016 : Instance Search
TRECVID 2016 : Instance SearchTRECVID 2016 : Instance Search
TRECVID 2016 : Instance Search
 
Agile Development - Are you building the right thing ? (Follow the value)
Agile Development - Are you building the right thing ? (Follow the value)Agile Development - Are you building the right thing ? (Follow the value)
Agile Development - Are you building the right thing ? (Follow the value)
 
BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019BeaconsAI engr 245 lean launchpad stanford 2019
BeaconsAI engr 245 lean launchpad stanford 2019
 
How To Start Agile Competence Center
How To Start Agile Competence CenterHow To Start Agile Competence Center
How To Start Agile Competence Center
 
Entrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider contextEntrepreneurial ecosystem- Wider context
Entrepreneurial ecosystem- Wider context
 

More from George Awad

Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022George Awad
 
Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022George Awad
 
Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022George Awad
 
Multimedia Understanding at TRECVID 2022
Multimedia Understanding at TRECVID 2022Multimedia Understanding at TRECVID 2022
Multimedia Understanding at TRECVID 2022George Awad
 
Video Search at TRECVID 2022
Video Search at TRECVID 2022Video Search at TRECVID 2022
Video Search at TRECVID 2022George Awad
 
Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022George Awad
 
TRECVID 2019 Video to Text
TRECVID 2019 Video to TextTRECVID 2019 Video to Text
TRECVID 2019 Video to TextGeorge Awad
 
TRECVID 2019 introduction slides
TRECVID 2019 introduction slidesTRECVID 2019 introduction slides
TRECVID 2019 introduction slidesGeorge Awad
 
[TRECVID 2018] Video to Text
[TRECVID 2018] Video to Text[TRECVID 2018] Video to Text
[TRECVID 2018] Video to TextGeorge Awad
 
Instance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDInstance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDGeorge Awad
 
Activity detection at TRECVID
Activity detection at TRECVIDActivity detection at TRECVID
Activity detection at TRECVIDGeorge Awad
 
The Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVIDThe Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVIDGeorge Awad
 
Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)George Awad
 
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)George Awad
 
TRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event DetectionTRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event DetectionGeorge Awad
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationGeorge Awad
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionGeorge Awad
 
TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search George Awad
 
TRECVID 2016 workshop
TRECVID 2016 workshopTRECVID 2016 workshop
TRECVID 2016 workshopGeorge Awad
 

More from George Awad (19)

Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022Movie Summarization at TRECVID 2022
Movie Summarization at TRECVID 2022
 
Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022Disaster Scene Indexing at TRECVID 2022
Disaster Scene Indexing at TRECVID 2022
 
Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022Activity Detection at TRECVID 2022
Activity Detection at TRECVID 2022
 
Multimedia Understanding at TRECVID 2022
Multimedia Understanding at TRECVID 2022Multimedia Understanding at TRECVID 2022
Multimedia Understanding at TRECVID 2022
 
Video Search at TRECVID 2022
Video Search at TRECVID 2022Video Search at TRECVID 2022
Video Search at TRECVID 2022
 
Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022Video Captioning at TRECVID 2022
Video Captioning at TRECVID 2022
 
TRECVID 2019 Video to Text
TRECVID 2019 Video to TextTRECVID 2019 Video to Text
TRECVID 2019 Video to Text
 
TRECVID 2019 introduction slides
TRECVID 2019 introduction slidesTRECVID 2019 introduction slides
TRECVID 2019 introduction slides
 
[TRECVID 2018] Video to Text
[TRECVID 2018] Video to Text[TRECVID 2018] Video to Text
[TRECVID 2018] Video to Text
 
Instance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVIDInstance Search (INS) task at TRECVID
Instance Search (INS) task at TRECVID
 
Activity detection at TRECVID
Activity detection at TRECVIDActivity detection at TRECVID
Activity detection at TRECVID
 
The Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVIDThe Video-to-Text task evaluation at TRECVID
The Video-to-Text task evaluation at TRECVID
 
Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)Introduction about TRECVID (The Video Retreival benchmark)
Introduction about TRECVID (The Video Retreival benchmark)
 
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
TRECVID ECCV2018 Tutorial (Ad-hoc Video Search)
 
TRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event DetectionTRECVID 2016 : Multimedia event Detection
TRECVID 2016 : Multimedia event Detection
 
TRECVID 2016 : Concept Localization
TRECVID 2016 : Concept LocalizationTRECVID 2016 : Concept Localization
TRECVID 2016 : Concept Localization
 
TRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text DescriptionTRECVID 2016 : Video to Text Description
TRECVID 2016 : Video to Text Description
 
TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search TRECVID 2016 : Ad-hoc Video Search
TRECVID 2016 : Ad-hoc Video Search
 
TRECVID 2016 workshop
TRECVID 2016 workshopTRECVID 2016 workshop
TRECVID 2016 workshop
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

TRECVID 2019 INSTANCE RETRIEVAL INTRODUCTION AND TASK OVERVIEW

  • 1. TRECVID 2019 INSTANCE RETRIEVAL INTRODUCTION AND TASK OVERVIEW Wessel Kraaij Leiden University; Netherlands Organisation for Applied Scientific Research (TNO) George Awad Georgetown University; National Institute of Standards and Technology Keith Curtis National Institute of Standards and Technology Disclaimer The identification of any commercial product or trade name does not imply endorsement or recommendation by the National Institute of Standards and Technology.
  • 2. TRECVID 2019 Table of contents • Task Definition • Data • Topics (Queries) • Participating teams • Evaluation & results • General observation 2
  • 3. TRECVID 2019 Task From 2013 – 2015 • The task asked systems to find a specific object, person or location in any context using a small set of image and video examples. From 2016 - 2018 • A different query type was used: find a specific person in a specific location. In 2019 - 2021 • A new query type is being used: find a specific person doing a specific action. System task: ▪ Given a topic with: ▪ 4 example images of the target person ▪ 4 Region of Interest (ROI)-masked images of the target person ▪ 4 to 6 video examples of a specific action ▪ Return a list of up to 1000 shots ranked by likelihood that they contain the target person doing the target action ▪ Automatic or interactive runs are accepted 3
  • 4. TRECVID 2019 Data … • The British Broadcasting Corporation (BBC) and the Access to Audiovisual Archives (AXES) project made 464 h of the BBC soap opera EastEnders available for research • 244 weekly “omnibus” files (MPEG-4) from 5 years of broadcasts • 471527 shots • Average shot length: 3.5 seconds • Transcripts from BBC • Per-file metadata • Represents a “small world” with a slowly changing set of: • People (several dozen) • Locales: homes, workplaces, pubs, cafes, open-air market, clubs • Objects: clothes, cars, household goods, personal possessions, pets, etc • Views: various camera positions, times of year, times of day, • Use of fan community metadata allowed, if documented 5
  • 5. TRECVID 20196 Majority of episodes filmed at Elstree studios. Sometimes filmed on ‘location’. EastEnders’ world
  • 6. TRECVID 20197 Topic creation procedure @ NIST • Viewed several videos to develop a list of recurring people, actions and their overlapping. • Listed in order the most frequent actions and most frequent person’s performing them • Created ≈90 topics targeting recurring specific persons doing specific actions. • Chose 50 topics as a representative sample, including 30 unique topics for 2019 and 20 common topics for 2019 - 2021. Each topic includes images for target persons and example videos of the specific actions. • Filtered example shots from the submissions if it satisfies the topic.
  • 7. TRECVID 20198 Global test condition: type of training data Effect of examples – 2 conditions: • A – one or more provided images – no video • E - video examples (+ optional image examples) Sources of Training Data: • A – Only sample video 0 • B - Other external data only • C – Only provided images/videos in the official query • D - Sample video 0 AND provided images/videos in the official query (A+C) • E – External data AND NIST provided data (sample video 0 OR official query images/videos)
  • 8. TRECVID 20199 Topics – segmented “person” example images Bradley Denise Dot Heather
  • 9. TRECVID 201910 Topics – segmented “person” example images Ian Jack Jane Max
  • 10. TRECVID 201911 Topics – segmented “person” example images Phil Sean Shirley Stacey
  • 11. TRECVID 201912 Sample Actions Open door & enter Sit on couch
  • 13. TRECVID 201914 30 Unique Queries – 2019 Max Pat Ian Denise Phil Jane Dot Bradley Jack Stacey Holding glass x x x x Sit on couch x x x Holding phone x x x Drinking x x x Open door & enter x x Open door & leave x x Shouting x x x Eating x x Crying x x Laughing x x Go up / down stairs x x Carrying bag x x 30 x unique queries : find {Max, Pat, Ian, Denise, Phil, Jane, Dot, Bradley, Jack, Stacey} doing {Holding glass, Sit on couch, Holding phone, Drinking, Eating, Crying, Laughing, Shouting, Open door & leave, Open door & enter, Go up / down stairs, Carrying bag}
  • 14. TRECVID 201915 20 Common Queries – 2019-2021 Sean Max Denise Phil Dot Heather Jack Shirley Stacey Kissing x x Sit on couch x x Holding phone x x Drinking x x Open door & enter x x Open door & leave x x Shouting x x Hugging x x Close door without leaving x x Stand & talk at door x x 20 x common queries : find {Sean, Max, Denise, Phil, Dot, Heather, Jack, Shirley, Stacey} doing {Kissing, Sit on couch, Holding phone, Drinking, Shouting, Hugging, Open door & leave, Open door & enter, Close door without leaving, Stand & talk at door}
  • 15. TRECVID 201916 Team Organization Run Types Submitted F: automatic, I: Interactive BUPT_MCPRL Beijing University of Posts and Telecommunications F_E (2), I_E (1) HSMW_TUC Chemnitz University of Technology, University of Applied Sciences Mittweida F_E (4) Inf Monash University, Renmin University, Shandong University F_E (3) WHU_NERCMS National Engineering Research Center for Multimedia Software, Wuhan University F_E (3) NII_Hitachi_UIT National Institute of Informatics, Japan (NII); Hitachi, Ltd; University of Information Technology, VNU-HCM F_A (4), F_E(4) PKU_ICST Peking University F_A (3), F_E (3), I_E (1) INS 2019: 6 Finishers (out of 12)
  • 16. TRECVID 201917 Evaluation For each topic the submissions were pooled and judged down to max rank 520, resulting in 141599 judged shots (≈ 473 person-h). • 10 NIST assessors played the clips and determined if they contained the topic target or not. • 6 592 clips (avg. 220 / topic) contained the topic target (4.66 %) • True positives per topic: min 29 med 187 max 575 • The task is treated as a form of ranking and thus the trec_eval_video tool was used to calculate average precision, recall, precision, etc. • To measure efficiency, speed was also measured. • In total, 26 automatic and 2 interactive runs were submitted.
  • 18. TRECVID 201919 # Query 9258 Find Pat Drinking 9256 Find Phil Holding phone 9253 Find Pat Sit on couch 9257 Find Jane Holding phone 9274 Find Jack Shouting 9255 Find Ian Holding phone 9275 Find Stacey Crying 9273 Find Jack Drinking 9265 Find Max Crying 9269 Find Jack Sit on couch 9254 Find Denise Sit on couch 9266 Find Jane Laughing 9272 Find Stacey Drinking 9278 Find Stacey Go up/down stairs 9252 Find Denise Holding Cup/Glass 9251 Find Pat Holding Cup/Glass 9249 Find Max Holding Cup/Glass 9268 Find Phil Go up/down stairs 9261 Find Max Shouting 9262 Find Phil Shouting 9263 Find Jane Eating 9250 Find Ian Holding Cup/Glass 9277 Find Jack Open door & leave 9264 Find Dot Eating 9260 Find Dot Open door & enter 9267 Find Dot Open door & leave 9270 Find Stacey Carrying bag 9271 Find Bradley Carrying bag 9259 Find Ian Open door & enter 9276 Find Bradley Laughing Results by topics - automatic *Mean score of AP per action only Shouting has avg. high scores, but high median scores Holding phone (0.1252)* easier to find Open door & enter (0.0166)* hard to find Open door & leave (0.0201)* hard to find Carrying bag (0.0228)* hard to find
  • 19. TRECVID 201920 Some observations.. • Poor results for topics involving Dot and Bradley could indicate that they are hard people to find. • However - previous iterations of the INS task showed them to be among the easiest people to find. What gives? • Actions involving Dot consistently score poorly, whether it is Dot or another character involved. Seems to be more a case of hard actions to recognise. • Bradley laughing - very poor results - but looking at frequent false positives on this topic reveal lots of instances of contrived laughter from Bradley. Obvious instances of exaggerated faked / contrived laughter do not count as laughing.
  • 20. TRECVID 201921 Easier Topics # Query 9274 Find Jack Shouting 9262 Find Phil Shouting 9261 Find Max Shouting 9254 Find Denise Sit on couch 9255 Find Ian Holding phone 9273 Find Jack Drinking 9272 Find Stacey Drinking 9252 Find Denise Holding Cup/Glass 9253 Find Pat Sit on couch 9266 Find Jane Laughing 9278 Find Stacey Go up/down stairs 9275 Find Stacey Crying 9269 Find Jack Sit on couch 9257 Find Jane Holding phone 9249 Find Max Holding Cup/Glass 9258 Find Pat Drinking 9251 Find Pat Holding Cup/Glass 9256 Find Phil Holding phone 9250 Find Ian Holding Cup/Glass 9263 Find Jane Eating 9260 Find Dot Open door & enter 9264 Find Dot Eating 9265 Find Max Crying 9268 Find Phil Go up/down stairs 9277 Find Jack Open door & leave 9270 Find Stacey Carrying bag 9267 Find Dot Open door & leave 9271 Find Bradley Carrying bag Number of runs with AP above 0.06
  • 21. TRECVID 201922 Hard Topics # Query 9259 Find Ian Open door & enter 9276 Find Bradley Laughing 9267 Find Dot Open door & leave 9271 Find Bradley Carrying bag 9277 Find Jack Open door & leave 9270 Find Stacey Carrying bag 9263 Find Jane Eating 9264 Find Dot Eating 9260 Find Dot Open door & enter 9265 Find Max Crying 9268 Find Phil Go up/down stairs 9266 Find Jane Laughing 9275 Find Stacey Crying 9278 Find Stacey Go up/down stairs 9269 Find Jack Sit on couch 9249 Find Max Holding Cup/Glass 9257 Find Jane Holding phone 9251 Find Pat Holding Cup/Glass 9258 Find Pat Drinking 9250 Find Ian Holding Cup/Glass 9256 Find Phil Holding phone 9255 Find Ian Holding phone 9273 Find Jack Drinking 9272 Find Stacey Drinking 9252 Find Denise Holding Cup/Glass 9253 Find Pat Sit on couch 9254 Find Denise Sit on couch 9262 Find Phil Shouting 9261 Find Max Shouting 9274 Find Jack Shouting Number of runs with AP below 0.06
  • 22. TRECVID 201923 Some observations.. • From the previous two bar charts we can safely say that shouting is the easiest topic to find. This was not obvious from the boxplot of results by topics. • Drinking, sitting on couch, and holding phone are also among the easiest topics to find. • Open door & leave, open door & enter, and carrying bag are among the hardest topics to find.
  • 23. TRECVID 201924 Some Frequent False Positives Jack sit on couch Bradley carrying bag Jack is sitting on an armchair - a single seating structure. Topic specifies a couch - a comfortable seating structure which seats more than one person. Bradley is seated next to Stacey. Dot comes into the picture carrying a bag.
  • 24. TRECVID 201925 Some Frequent False Positives Dot open door & leave Stacey Drinking Dot opens the door to let people in and then closes the door again. Does not leave the room / house. In this shot we see Stacey holding a glass. Later in the shot Mo is seen drinking. Topic specifies that the person must be seen moving the glass/cup to their mouth and performing a sipping or drinking action.
  • 25. TRECVID 201926 Some Frequent False Positives Stacey crying Jack shouting Stacey appears to have hurt herself, with blood around her left eye. She is rubbing her eye but does not appear to be crying. This appears to be a frank discussion between Jack and another man. The other person appears to be angry and raises his voice at Jack, however Jack does not raise his voice.
  • 26. TRECVID 201927 Some Frequent False Positives Phil holding phone Pat drinking Phil is seen singing into a microphone, along with Garry. Another person is holding a phone recording them. Pat is seen clapping hands. Later in the shot she turns her head to look over at someone. Another person moves a glass to their mouth.
  • 27. TRECVID 201928 Further observations from viewing most frequent false positives of worst performing topics • Open door & enter - Systems tended to classify any shots with target person and a doorway as a positive detection. More work needed on training systems to classify the action itself. • Open door & leave - Same as above, systems classifying any shots with target person and a doorway as positive detection. More work needed on training systems to classify the action of opening a door and leaving. • Carrying bag - Fewer conclusions can be drawn. Many instances where a shot is classified as a positive detection if the target person appears in the shot and a different person is carrying a bag, however, many other shots contain the target person with no bag visible in the shot at all.
  • 28. TRECVID 201929 Automatic Run results + Randomization testing = > > > > > > > = > > > > > > > = > > > > > > = > > > > > > = > > > > > = > > > > = > > > = > > = = 1 2 3 4 5 6 7 8 9 10 > p < 0.05 0.242 F_A_PKU_ICST_4*^ 0.239 F_E_PKU_ICST_1*⤒ 0.235 F_A_PKU_ICST_3^⤉ 0.230 F_E_PKU_ICST_5⤒⤉ 0.201 F_E_PKU_ICST_6 0.198 F_A_PKU_ICST_7 0.119 F_E_BUPT_MCPRL_1 0.116 F_E_BUPT_MCPRL_2 0.024 F_E_NII_Hitachi_UIT_2⍏ 0.024 F_A_NII_Hitachi_UIT_3⍏ MAP Top 10 runs across all teams (automatic) p = probability the row run scored better than the column run due to chance *^⤒⤉⍏ = difference not statistically significant
  • 29. TRECVID 201930 Mean Average Precision vs. per run clock processing time (automatic) PKU-ICST
  • 30. TRECVID 201931 Interactive Run results + Randomization testing = > = 1 2 > p < 0.05 0.360 I_E_PKU_ICST_2 0.212 I_E_BUPT_MCPRL_4 MAP Runs across all teams (interactive) p = probability the row run scored better than the column run due to chance
  • 31. TRECVID 201932 Results by example set (A/E) - automatic
  • 33. TRECVID 201934 Some general observations about the task • Slight decrease in number of participants and finishers, but higher % of participants finished the task. • Many more teams now using E condition - training with video examples. Perhaps more necessary now with action recognition. But - Results from teams using both show little difference between image & video and image only! • Interactive search task: • Limited participation - only two interactive runs this year. • First year of updated task - results cannot be compared in any way to previous years - in subsequent years we can compare using the common topics.
  • 34. TRECVID 201935 Some general observations about the task – Data Source •Best results by far achieved using external data plus NIST provided data. •Huge gap to results from systems trained using only external data or using only sample video 0.
  • 35. TRECVID 201936 Further Conclusions •Person recognition has been a feature of the INS task since 2013 and is very mature by this stage. Very few frequent false positives misidentify the person. •Action recognition is a new feature of INS task. The much increased difficulty of the new INS task is due to this. Requires much more work to reach an acceptable level of maturity.
  • 36. TRECVID 201937 Further Conclusions •Visual Concepts very important. •Easier tasks mostly those with obvious visual context (sit on couch, hold phone, hold glass, etc.). •Harder tasks tend to be more independent from obvious visual context (crying, laughing, eating, different actions involving a doorway hard to isolate from others).
  • 37. TRECVID 201938 Inf • Person Search: • MTCNN Model to detect faces • VGG-Face2 for Feature Selection • Cosine Distance to measure similarity. • Action Search: • Faster RCNN model • Tracklets generated by DeepSort • Fine tune RGB benchmark of I3D model • Cosine similarity for action reranking • Reranking: • Person Search based • Action search based • Fusion based • Submitted 3 automatic runs
  • 38. TRECVID 201939 NII-Hitachi-UIT • Person Search: • VGG-Face2 for face representation • Cosine Similarity for matching and reranking • Action Search: • Audio type: VGGish • Visual type: C3D and Semantic Features from VGG-1k • Cosine Similarity for matching and reranking • Fusion of Similarity Scores and person and action for final rank list. • Submitted 8 Automatic runs: • 4 using video example set • 4 using no-video
  • 39. TRECVID 201940 PKU-ICST • Action Recognition: • Frame-level action recognition • Video-level action recognition trained using Kinetics-400 • Object Detection pre-trained on MS-COCO • Facial Expression Recognition • Person Specific Recognition: • Query Augmentation by super resolution. • Face Recognition with 2 Deep models + top N query expansion • Instance Score Fusion: • Score Fusion strategy to mine common information from action specific and person specific recognition. • Submitted 6 Automatic runs and 1 interactive run
  • 40. TRECVID 201941 WHU-NERCMS • Scheme 1: • Retrieve person and action respectively first then fuse • Person: Adopted face recognition model • Action: Adopted 3D Convolutional networks • Fusion: Weighting based and person identity based filter to combine results • Scheme 2: • Retrieve and track specific people then retrieve actions • Adopted face recognition to determine face ID and bind track ID of the track detected by the object tracking. • Adopt action recognition of tracked person target frames. • Submitted 3 Automatic runs
  • 41. TRECVID 201942 Overview of submissions (1) • 6 out of 6 teams described INS runs for the TV notebook • 2 teams will present their INS experiments 2:15 - 2:45, (BUPT_MCPRL Team– Beijing University of Posts and Telecommunications) 2:45 - 3:15, (HSMW_TUC Team– University of Applied Sciences Mittweida) 3:15 - 3:35, INS Discussion
  • 42. TRECVID 201943 INS 2019 Discussion • What do teams think of the new task (query type)? • Are the selected actions important in real life applications? • What is the main challenge in the new query type? - No enough training data? - actions are difficult? - fusion of persons + action detection results? • Is the task still of an ad-hoc nature? Or converting to a supervised learning? • Do we need additional run categories?
  • 43. TRECVID 201944 2019 - 2021 Progress Runs • 20 common topics. • Evaluate progress of participating teams 2019- 2021 using a set of common topics. • 12 runs submitted by 3 separate teams in 2019. Additional teams can still submit progress runs in 2020 on the 10 topics to be evaluated in 2021. • 10 common topics will be evaluated in 2020. • 10 remaining common topics evaluated in 2021.