3. Box-level post-processing
*Feature level learning
• Flow-Guided Feature Aggregation for Video Object Detection
• Deep Feature Flow for Video Recognition
• Towards High Performance Video Object Detection
• Fully Motion-Aware Network for Video Object Detection
10. Paper Date Base detector Backbone Tracking? Optical flow? Online? mAP(%) FPS (Titan X)
Seq-NMS Feb 2016 R-FCN ResNet101 no no no 76.8 2.3
T-CNN Apr 2016 RCNN
DeepIDNet+CRAF
T
yes no no 73.8 -
DFF Nov 2016 R-FCN ResNet101 no yes yes 73.0 29
TPN Feb 2017 TPN GoogLeNet yes no no 68.4 -
FGFA Mar 2017 R-FCN ResNet101 no yes yes 76.3 1.4
FGFA + Seq-NMS 29 Mar 2017 R-FCN ResNet101 no yes no 78.4 1.14
D&T Oct 2017
R-FCN(15
anchors)
ResNet101 yes no no 79.8 7.09
STMN Dec 2017 R-FCN ResNet101 no no no 80.5 -
Scale-time-lattice 16 Apr 2018
Faster RCNN(15
anchors)
ResNet101 no no no 79.6 20
Scale-time-lattice Apr 2018
Faster RCNN(15
anchors)
ResNet101 no no no 79.0 62
SSN (per-frame
baseline for STSN)
Mar 2018 R-FCN
Deformable
ResNet101
no no yes 76.0 -
STSN Mar 2018 R-FCN
Deformable
ResNet101
no no yes 78.9 -
STSN+Seq-NMS Mar 2018 R-FCN
Deformable
ResNet101
no no no 80.4 -
MANet Sep. 2018 R-FCN ResNet101 no yes yes 78.1 5
MANet+Seq-NMS Sep. 2018 R-FCN ResNet101 no yes no 80.3 -
Tracklet-
Conditioned
Detection
Nov 2018 R-FCN ResNet101 yes no yes 78.1 -
Tracklet-
Conditioned
Detection+DCNv2
Nov 2018 R-FCN ResNet101 yes no yes 82.0 -