SlideShare a Scribd company logo
1 of 45
Download to read offline
Nov 16, 2020 - Jan 4, 2021 https://www.kaggle.com/c/nfl-impact-detection/overview
Kazuyuki Miyazawa
Group Leader
AI R&D Group 2
AI System Dept.
Mobility Technologies Co., Ltd.
Past Work Experience
April 2019 - March 2020
AI Research Engineer@DeNA Co., Ltd.
April 2010 - March 2019
Research Scientist@Mitsubishi Electric Corp.
Education
PhD in Information Science@Tohoku University
@kzykmyzw
■ Detect helmet impacts that happen in NFL games using videos from sideline and endzone, and player
tracking data
■ For training, 9947 still images are provided for helmet detection, and 60 video pairs (sideline and
endzone) and player tracking data are provided for helmet impact detection
■ Video frame rate is 59.94, and duration is around 10 seconds
+
helmet bboxes
+
helmet bboxes w/ player ID
impact information
+
helmet bboxes w/ player ID
impact information
+
player positions w/ player ID
players’ speed, acceleration, etc
images videos from endzone videos from sideline player tracking data
time-synced
available in training
available in training and test
■ Labels for images
■ image: the image file name.
■ label: the label type (Helmet, Helmet-Blurred, Helmet-Difficult, Helmet-Sideline, Helmet-Partial).
■ [left/width/top/height]: the specification of the bounding box of the label, with left=0 and top=0 being the top left corner.
■ Labels for videos
■ gameKey: the ID code for the game.
■ playID: the ID code for the play.
■ view: the camera orientation.
■ video: the filename of the associated video.
■ frame: the frame number for this play.
■ label: the associate player's number.
■ [left/width/top/height]: the specification of the bounding box of the prediction.
■ impact: an indicator (1 = helmet impact) for bounding boxes associated with helmet impacts
■ confidence: 1 = Possible, 2 = Definitive, 3 = Definitive and Obvious
■ visibility: 0 = Not Visible from View, 1 = Minimum, 2 = Visible, 3 = Clearly Visible
■ impactType: a description of the type of helmet impact: helmet, shoulder, body, ground, etc.
For the purposes of
evaluation, definitive helmet
impacts are defined as
meeting three criteria:
● impact = 1
● confidence > 1
● visibility > 0
Helmet w/o impact
Helmet w/ impact
Helmet w/o impact
Helmet w/ impact
■ gameKey: the ID code for the game.
■ playID: the ID code for the play.
■ player: the player's ID code.
■ time: timestamp at 10 Hz.
■ x: player position along the long axis of the field.
■ y: player position along the short axis of the field.
■ s: speed in yards/second.
■ a: acceleration in yards/second^2.
■ dis: distance traveled from prior time point, in yards.
■ o: orientation of player (deg).
■ dir: angle of player motion (deg).
■ event: game events like a snap, whistle, etc.
In test, we cannot directly map players’ positions in given tracking data to detected players in videos.
Visualized by @hidehisaarai1213’s notebook
■ gameKey: the ID code for the game.
■ playID: the ID code for the play.
■ view: the camera orientation.
■ video: the filename of the associated video.
■ frame: the frame number for this play.
■ [left/width/top/height]: the specification of the bounding box of the prediction.
■ F1 score at an IoU threshold of 0.35
■ For a given ground truth impact, a prediction within +/- 4 frames (9 frames total) within the same play can
be accepted as valid without necessarily degrading the score.
■ If one or more predictions are assigned to more than one ground truth boxes, the metric will optimize for
the assignments between the prediction(s) and the ground truth boxes that lead to the highest total
number of True Positives (thereby maximizing the F1 score). At most one prediction will be assigned to
any ground truth box and vice versa.
https://www.kaggle.com/c/nfl-impact-detection/discussion/197672
■ Only use videos and images
■ 2 stage pipeline (detection + classification)
■ NMS using tracking results
helmet detection
on every frame
classification on
every bbox
post
processing
results
± n frames
■ Use DetectoRS (no strong reason other than it’s implemented in MMDetection and is easy to use)
■ Train using images + 80% of training video and validate using the other 20%
■ Detection performance seems to be quite high, so I didn’t pursue the accuracy by ensembling or TTA
■ Crop every bbox with expantion
■ Also crop bboxes at the same position from ±N frames (2N + 1 bboxes total)
■ 2N + 1 bboxes are concatenated in channel direction and fed into ResNet-50
■ ResNet outputs probability of the input bboxes include impact
h
w
4 x max(w, h)
4
x
max(w,
h)
+N frames
-N frames
ResNet-50
resize
224 x 224 x 3 x (2N + 1)
impact or not
■ Split endzone-sideline video pairs into 80% (48 pairs) train and 20% (12 pairs) val
■ Since positive samples (bboxes which have impact labels) are only 0.18% of total samples, employed
over-sampling to balance positive and negative samples
■ Augmentation: LR flip, color jitter, bbox position jitter, bbox size jitter
■ No cross validation because of time and computational resource limiation
■ Score calculation by @nvnn’s notebook
■ Train two ResNets for different types of labels
■ Type-I: Assign TRUE labels only for the bboxes of impact timing
■ Type-II: Assign TRUE labels for the bboxes of impact timing and ±4 consecutive bboxes
■ Type-I uses ±2 bboxes as inputs, and Type-II uses ±4 bboxes as inputs (N = 2 and N = 4 in P.18)
■ ResNet-Type-I and -Type-II achieve high precision and high recall, respectively, so their ensembling leads
to performance gain (Type-I: 0.40 + Type-II: 0.38 → 0.46)
■ Tried adding other models such as EfficientNet, but finally employed the two ResNets based on local val
impact
impact
TRUE
FALSE
Recall Precision F1
■ NMS in temporal domain is necessary since classifying all the detected bboxes produces a lot of false
positives
■ Employ IoU-based tracking and pick up the bbox which has the highest confidence value in a track
■ Remove bbox which couldn’t be tracked
■ Remove bbox whose confidence value is less than threshold
max conf > threshold
max conf < threshold
t t
Score
single, w/o post processing
single, w/ post processing
ensemble (2~4 models)
finetune w/ val data (2~3 models)
GT
Pred
GT
Pred
GT
Pred
GT
Pred
detection classification pp results
16 frames
tracking
1-class YOLOv5
Track helmet and estimate the
average helmet velocity over a few
surrounding frames by optical flow.
Normalize size of helmet to 128 x 128 x 3 x 16.
Correct helmet movement by optical flow to
differentiate (i) helmet at constant velocity and
(ii) helmet during acceleration.
Ensemble of EfficientNet B0-B3, ResNet-18, and ResNet-34
with TSM (Temporal Shift Module).
Mark 3 frames around the impact as positive and use 5 or 10%
positive samples. Add the false positive prediction from a few
undertrained detection models.
Average predictions of multiple models from 4 folds.
NMS in temporal direction
using tracking results
https://www.kaggle.com/c/nfl-impact-detection/discussion/209403
■ Can be inserted into 2D CNN backbone to enable joint spatial-temporal modeling at no additional cost.
■ Shift part of the channels along the temporal dimension; thus facilitate information exchanged among
neighboring frames.
■ Support both offline (bi-direction) and online (uni-direction) video recognition.
https://arxiv.org/abs/1811.08383
After the competition, I evaluated TSM in my own pipeline, and it showed
better performance compared to 2D CNN (0.380 → 0.436)
detection classification pp results
9 frames
1-class YOLOv5
Crop 2x width and height of the original bbox.
Ensemble of 6 different EfficientNets (B3 and B5) + horizontal flip TTA.
Replace the first 2D conv layers in the inverted residual blocks of EfficientNet with 3D conv layers.
Predict the different impact types for the center frame as output variable (no impact, helmet, shoulder, body,
ground impact) and optimize a softmax loss with class weights split 0.8:0.2 (non-impact : impact).
Select all of the positive impact samples and a random sample of negative impact samples according to a
specified ratio (0.99:0.01 non-impact:impact) at each epoch.
https://www.kaggle.com/c/nfl-impact-detection/discussion/208979
Thresholding using stage 1 score.
Filter out any frame earlier than 25.
NMS based on IoU to filter out duplicate boxes in
subsequent frames.
Consider the top 19 predicted boxes based on their stage
2 score and remove boxes below a threshold of 0.15.
detection classification pp results
9 frames
Ensemble of 7 EfficientDet models (WBF)
Detect helmet w/ impact and w/o impact
separately
Recall is around 0.97
Detected helmets with impact can be candidates.
Crop 3x width and height of the original bbox.
Convert to grayscale.
Ensemble of 18 EfficientNets and ReXNets.
Regard the impact bbox and its ±1 frame corresponding
bbox as positive.
https://www.kaggle.com/c/nfl-impact-detection/discussion/208787
Tune separated thresholds depending on the predictions in the other view. For
instance, the threshold for a certain Endzone frame depends on whether there
is a predicted bbox in the Sideline view within +-1 frame. If yes, the threshold
is lower (say 0.25); if not, the threshold is higher (say 0.45).
NMS based on IoU to filter out duplicate boxes in subsequent frames.
detection
classification pp
results
20 frames
https://www.kaggle.com/c/nfl-impact-detection/discussion/208947
1-class Faster R-CNN
classification pp
20 frames
A sequence of full frames is fed to the 3D CNN
input, a feature map is calculated, and using
the ROIAlign operation, features for the ROIs
are extracted and classified.
A sequence of frames cropped around the target box,
then the sequence of crops is fed to the 3D CNN input,
and then the impact probability is calculated directly.
Use 5 input channels instead of 3 RGB channels. The
first additional channel is the heatmap of the center of
the helmet of interest. The second additional channel is
the heatmap of the centers of all helmets.
NMS based on IoU to
filter out duplicate boxes
in subsequent frames.
https://www.kaggle.com/c/nfl-impact-detection/discussion/209235
I3D FPN
8 frames
Make patches of 224 x 224 using
a grid with a constant step.
3D feature maps from
second to fifth blocks of the
3D CNN are passed to FPN.
FPN produces a 6 x 56 x 56 grid.
1 : presence of helmet
2-5 : position and size of helmet
6 : presence of impact
1st stage
training
2nd stage
training
20 frames
I3D FPN
Freeze
pp results
NMS based on IoU to filter out duplicate boxes in
subsequent frames.
Remove predictions with low confidence if there is no
predictions in the other view.
https://www.kaggle.com/c/nfl-impact-detection/discussion/208833
detection classification pp results
9 frames
1-class EfficientDet-D5
Crop 3x width and height of the original bbox.
Resize to 112 x 112
Ensemble of 2D ResNet-18 and 3D ResNet-18
Augmentation: HorizontalFlip, RandomBrightness, RandomContrast,
one of(MotionBlur,MedianBlur,GaussianBlur,GaussNoise),
HueSaturationValue, ShiftScaleRotate, Cutout, Bbox jitter
If helmets are detected in the same position within 4
frames, only the middle frame is kept.
Ignored the first and last 10 frames of the video because
it is expected to be a low collision.
https://www.kaggle.com/c/nfl-impact-detection/discussion/208851
detection classification pp
Combine 4 folds with TTA for CenterNet.
8 consecutive frames are passed through the encoder individually,
then intermediate concatenated and fed through UNet-like decoder to
produce output heatmap & impacts map for 8 frames.
results
3D ResNet-50
8 frames
NMS based on IoU to
filter out duplicate boxes
in subsequent frames.
https://www.kaggle.com/c/nfl-impact-detection/discussion/209012
detection classification pp results
9 frames
(every 2 frames)
Detect helmet w/ impact and w/o impact
separately using DetectoRS (train 1-class
detector as warm-up)
Crop 1.22x width and height of the original bbox.
3D CNN (I3D, SlowFast)
Different thresholds for Endzone and Sideline views.
Different thresholds over time.
Use an IoU threshold and frame-difference threshold to
cluster detections (through multiple frames) which belong
to the same player and remove FP
Impacts that are detected with a confidence lower than T
are removed if no impact is found in the other view.
https://www.kaggle.com/c/nfl-impact-detection/discussion/208773
detection pp
Stack multiple CenterNet's heads on the top of the feature extraction
block (EfficientNet-B5). Each head is responsible for predicting helmet for
each frame.
Calculate loss independently between 2 classes (helmet w/ impact and
helmet w/o impact) and then using weighted sum of the 2 losses.
results
15 frames
NMS based on IoU to filter out duplicate boxes in
subsequent frames.
Dynamic confidence
threshold: Low threshold for
frame 30-80th (impact most
likely to happen in this
period), then slowly increase
the threshold.
■
■
■
■
■
■
■
■
■
■
■
■
■
■
■
■
https://www.kaggle.com/c/nfl-impact-detection/discussion/208767
MAYBE YES
https://hrmos.co/pages/mo-t/jobs
文章 画像等の内容の無断転載及び複製等の行為はご遠慮ください。

More Related Content

What's hot

論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative ModelsSeiya Tokui
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"Yuta Koreeda
 
マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向Koichiro Mori
 
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning   画像×言語の大規模基盤モ...【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning   画像×言語の大規模基盤モ...
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...Deep Learning JP
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"Deep Learning JP
 
【メタサーベイ】Video Transformer
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformercvpaper. challenge
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling ProblemDeep Learning JP
 
【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...
【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...
【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...Deep Learning JP
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision TransformerYusuke Uchida
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP
 
Probabilistic face embeddings
Probabilistic face embeddingsProbabilistic face embeddings
Probabilistic face embeddingsKazuki Maeno
 
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII
 
動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセットToru Tamaki
 
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?Deep Learning JP
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf幸太朗 岩澤
 
画像認識と深層学習
画像認識と深層学習画像認識と深層学習
画像認識と深層学習Yusuke Uchida
 
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてSho Takase
 

What's hot (20)

論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models論文紹介 Semi-supervised Learning with Deep Generative Models
論文紹介 Semi-supervised Learning with Deep Generative Models
 
論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"論文紹介 "DARTS: Differentiable Architecture Search"
論文紹介 "DARTS: Differentiable Architecture Search"
 
ゼロから始める転移学習
ゼロから始める転移学習ゼロから始める転移学習
ゼロから始める転移学習
 
マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向マルチモーダル深層学習の研究動向
マルチモーダル深層学習の研究動向
 
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning   画像×言語の大規模基盤モ...【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning   画像×言語の大規模基盤モ...
【DL輪読会】Flamingo: a Visual Language Model for Few-Shot Learning 画像×言語の大規模基盤モ...
 
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
【DL輪読会】"Instant Neural Graphics Primitives with a Multiresolution Hash Encoding"
 
【メタサーベイ】Video Transformer
 【メタサーベイ】Video Transformer 【メタサーベイ】Video Transformer
【メタサーベイ】Video Transformer
 
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
【DL輪読会】論文解説:Offline Reinforcement Learning as One Big Sequence Modeling Problem
 
【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...
【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...
【DL輪読会】GET3D: A Generative Model of High Quality 3D Textured Shapes Learned f...
 
近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer近年のHierarchical Vision Transformer
近年のHierarchical Vision Transformer
 
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 
Probabilistic face embeddings
Probabilistic face embeddingsProbabilistic face embeddings
Probabilistic face embeddings
 
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用
 
動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット動作認識の最前線:手法,タスク,データセット
動作認識の最前線:手法,タスク,データセット
 
[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?[DL輪読会]When Does Label Smoothing Help?
[DL輪読会]When Does Label Smoothing Help?
 
12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf12. Diffusion Model の数学的基礎.pdf
12. Diffusion Model の数学的基礎.pdf
 
画像認識と深層学習
画像認識と深層学習画像認識と深層学習
画像認識と深層学習
 
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative Model
 
Transformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法についてTransformerを多層にする際の勾配消失問題と解決法について
Transformerを多層にする際の勾配消失問題と解決法について
 

Similar to kaggle NFL 1st and Future - Impact Detection

Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
IRJET- Image Compression Technique for Quantized Encrypted Images using SVD
IRJET-  	  Image Compression Technique for Quantized Encrypted Images using SVDIRJET-  	  Image Compression Technique for Quantized Encrypted Images using SVD
IRJET- Image Compression Technique for Quantized Encrypted Images using SVDIRJET Journal
 
Improving AI surveillance using Edge Computing
Improving AI surveillance using Edge ComputingImproving AI surveillance using Edge Computing
Improving AI surveillance using Edge ComputingIRJET Journal
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTIRJET Journal
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2VijayKumarArya
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET Journal
 
ASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL AcceleratorASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL AcceleratorIRJET Journal
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsIRJET Journal
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET Journal
 
Trajectory Based Unusual Human Movement Identification for ATM System
	 Trajectory Based Unusual Human Movement Identification for ATM System	 Trajectory Based Unusual Human Movement Identification for ATM System
Trajectory Based Unusual Human Movement Identification for ATM SystemIRJET Journal
 
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘CHENHuiMei
 
L10_transmission redesign assignment_2021 (1).pptx
L10_transmission redesign assignment_2021 (1).pptxL10_transmission redesign assignment_2021 (1).pptx
L10_transmission redesign assignment_2021 (1).pptxhereslieve3
 
project_final_seminar
project_final_seminarproject_final_seminar
project_final_seminarMUKUL BICHKAR
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaAnton Bezuglov
 

Similar to kaggle NFL 1st and Future - Impact Detection (20)

Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
Gg3311121115
Gg3311121115Gg3311121115
Gg3311121115
 
IRJET- Image Compression Technique for Quantized Encrypted Images using SVD
IRJET-  	  Image Compression Technique for Quantized Encrypted Images using SVDIRJET-  	  Image Compression Technique for Quantized Encrypted Images using SVD
IRJET- Image Compression Technique for Quantized Encrypted Images using SVD
 
Improving AI surveillance using Edge Computing
Improving AI surveillance using Edge ComputingImproving AI surveillance using Edge Computing
Improving AI surveillance using Edge Computing
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Video Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFTVideo Stitching using Improved RANSAC and SIFT
Video Stitching using Improved RANSAC and SIFT
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
Field Dominance Algorithm
Field Dominance AlgorithmField Dominance Algorithm
Field Dominance Algorithm
 
Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL Accelerator
 
ASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL AcceleratorASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL Accelerator
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural Networks
 
Trajectory Based Unusual Human Movement Identification for ATM System
	 Trajectory Based Unusual Human Movement Identification for ATM System	 Trajectory Based Unusual Human Movement Identification for ATM System
Trajectory Based Unusual Human Movement Identification for ATM System
 
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
 
L10_transmission redesign assignment_2021 (1).pptx
L10_transmission redesign assignment_2021 (1).pptxL10_transmission redesign assignment_2021 (1).pptx
L10_transmission redesign assignment_2021 (1).pptx
 
project_final_seminar
project_final_seminarproject_final_seminar
project_final_seminar
 
Artificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North CarolinaArtificial Neural Networks for Storm Surge Prediction in North Carolina
Artificial Neural Networks for Storm Surge Prediction in North Carolina
 
Tele immersion
Tele immersionTele immersion
Tele immersion
 

More from Kazuyuki Miyazawa

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...Kazuyuki Miyazawa
 
Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Kazuyuki Miyazawa
 
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...Kazuyuki Miyazawa
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Kazuyuki Miyazawa
 
ドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologiesドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility TechnologiesKazuyuki Miyazawa
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionKazuyuki Miyazawa
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選Kazuyuki Miyazawa
 
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth EstimationKazuyuki Miyazawa
 
3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -Kazuyuki Miyazawa
 
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?Kazuyuki Miyazawa
 
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Kazuyuki Miyazawa
 
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsDevil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsKazuyuki Miyazawa
 

More from Kazuyuki Miyazawa (14)

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
 
Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)Teslaにおけるコンピュータビジョン技術の調査 (2)
Teslaにおけるコンピュータビジョン技術の調査 (2)
 
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monoc...
 
Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査
 
ドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologiesドラレコ + CV = 地図@Mobility Technologies
ドラレコ + CV = 地図@Mobility Technologies
 
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for VisionMLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
 
CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選CV分野での最近の脱○○系3選
CV分野での最近の脱○○系3選
 
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
[CVPR2020読み会@CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation
 
3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -3D Perception for Autonomous Driving - Datasets and Algorithms -
3D Perception for Autonomous Driving - Datasets and Algorithms -
 
How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?How Much Position Information Do Convolutional Neural Networks Encode?
How Much Position Information Do Convolutional Neural Networks Encode?
 
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unk...
 
SIGGRAPH 2019 Report
SIGGRAPH 2019 ReportSIGGRAPH 2019 Report
SIGGRAPH 2019 Report
 
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy AnnotationsDevil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
Devil is in the Edges: Learning Semantic Boundaries from Noisy Annotations
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

kaggle NFL 1st and Future - Impact Detection

  • 1. Nov 16, 2020 - Jan 4, 2021 https://www.kaggle.com/c/nfl-impact-detection/overview
  • 2. Kazuyuki Miyazawa Group Leader AI R&D Group 2 AI System Dept. Mobility Technologies Co., Ltd. Past Work Experience April 2019 - March 2020 AI Research Engineer@DeNA Co., Ltd. April 2010 - March 2019 Research Scientist@Mitsubishi Electric Corp. Education PhD in Information Science@Tohoku University @kzykmyzw
  • 3.
  • 4.
  • 5. ■ Detect helmet impacts that happen in NFL games using videos from sideline and endzone, and player tracking data ■ For training, 9947 still images are provided for helmet detection, and 60 video pairs (sideline and endzone) and player tracking data are provided for helmet impact detection ■ Video frame rate is 59.94, and duration is around 10 seconds + helmet bboxes + helmet bboxes w/ player ID impact information + helmet bboxes w/ player ID impact information + player positions w/ player ID players’ speed, acceleration, etc images videos from endzone videos from sideline player tracking data time-synced available in training available in training and test
  • 6. ■ Labels for images ■ image: the image file name. ■ label: the label type (Helmet, Helmet-Blurred, Helmet-Difficult, Helmet-Sideline, Helmet-Partial). ■ [left/width/top/height]: the specification of the bounding box of the label, with left=0 and top=0 being the top left corner. ■ Labels for videos ■ gameKey: the ID code for the game. ■ playID: the ID code for the play. ■ view: the camera orientation. ■ video: the filename of the associated video. ■ frame: the frame number for this play. ■ label: the associate player's number. ■ [left/width/top/height]: the specification of the bounding box of the prediction. ■ impact: an indicator (1 = helmet impact) for bounding boxes associated with helmet impacts ■ confidence: 1 = Possible, 2 = Definitive, 3 = Definitive and Obvious ■ visibility: 0 = Not Visible from View, 1 = Minimum, 2 = Visible, 3 = Clearly Visible ■ impactType: a description of the type of helmet impact: helmet, shoulder, body, ground, etc. For the purposes of evaluation, definitive helmet impacts are defined as meeting three criteria: ● impact = 1 ● confidence > 1 ● visibility > 0
  • 9. ■ gameKey: the ID code for the game. ■ playID: the ID code for the play. ■ player: the player's ID code. ■ time: timestamp at 10 Hz. ■ x: player position along the long axis of the field. ■ y: player position along the short axis of the field. ■ s: speed in yards/second. ■ a: acceleration in yards/second^2. ■ dis: distance traveled from prior time point, in yards. ■ o: orientation of player (deg). ■ dir: angle of player motion (deg). ■ event: game events like a snap, whistle, etc. In test, we cannot directly map players’ positions in given tracking data to detected players in videos.
  • 11. ■ gameKey: the ID code for the game. ■ playID: the ID code for the play. ■ view: the camera orientation. ■ video: the filename of the associated video. ■ frame: the frame number for this play. ■ [left/width/top/height]: the specification of the bounding box of the prediction.
  • 12. ■ F1 score at an IoU threshold of 0.35 ■ For a given ground truth impact, a prediction within +/- 4 frames (9 frames total) within the same play can be accepted as valid without necessarily degrading the score. ■ If one or more predictions are assigned to more than one ground truth boxes, the metric will optimize for the assignments between the prediction(s) and the ground truth boxes that lead to the highest total number of True Positives (thereby maximizing the F1 score). At most one prediction will be assigned to any ground truth box and vice versa. https://www.kaggle.com/c/nfl-impact-detection/discussion/197672
  • 13.
  • 14.
  • 15. ■ Only use videos and images ■ 2 stage pipeline (detection + classification) ■ NMS using tracking results helmet detection on every frame classification on every bbox post processing results ± n frames
  • 16. ■ Use DetectoRS (no strong reason other than it’s implemented in MMDetection and is easy to use) ■ Train using images + 80% of training video and validate using the other 20% ■ Detection performance seems to be quite high, so I didn’t pursue the accuracy by ensembling or TTA
  • 17.
  • 18.
  • 19. ■ Crop every bbox with expantion ■ Also crop bboxes at the same position from ±N frames (2N + 1 bboxes total) ■ 2N + 1 bboxes are concatenated in channel direction and fed into ResNet-50 ■ ResNet outputs probability of the input bboxes include impact h w 4 x max(w, h) 4 x max(w, h) +N frames -N frames ResNet-50 resize 224 x 224 x 3 x (2N + 1) impact or not
  • 20. ■ Split endzone-sideline video pairs into 80% (48 pairs) train and 20% (12 pairs) val ■ Since positive samples (bboxes which have impact labels) are only 0.18% of total samples, employed over-sampling to balance positive and negative samples ■ Augmentation: LR flip, color jitter, bbox position jitter, bbox size jitter ■ No cross validation because of time and computational resource limiation ■ Score calculation by @nvnn’s notebook
  • 21. ■ Train two ResNets for different types of labels ■ Type-I: Assign TRUE labels only for the bboxes of impact timing ■ Type-II: Assign TRUE labels for the bboxes of impact timing and ±4 consecutive bboxes ■ Type-I uses ±2 bboxes as inputs, and Type-II uses ±4 bboxes as inputs (N = 2 and N = 4 in P.18) ■ ResNet-Type-I and -Type-II achieve high precision and high recall, respectively, so their ensembling leads to performance gain (Type-I: 0.40 + Type-II: 0.38 → 0.46) ■ Tried adding other models such as EfficientNet, but finally employed the two ResNets based on local val impact impact TRUE FALSE Recall Precision F1
  • 22. ■ NMS in temporal domain is necessary since classifying all the detected bboxes produces a lot of false positives ■ Employ IoU-based tracking and pick up the bbox which has the highest confidence value in a track ■ Remove bbox which couldn’t be tracked ■ Remove bbox whose confidence value is less than threshold max conf > threshold max conf < threshold t t
  • 23. Score single, w/o post processing single, w/ post processing ensemble (2~4 models) finetune w/ val data (2~3 models)
  • 28.
  • 29.
  • 30. detection classification pp results 16 frames tracking 1-class YOLOv5 Track helmet and estimate the average helmet velocity over a few surrounding frames by optical flow. Normalize size of helmet to 128 x 128 x 3 x 16. Correct helmet movement by optical flow to differentiate (i) helmet at constant velocity and (ii) helmet during acceleration. Ensemble of EfficientNet B0-B3, ResNet-18, and ResNet-34 with TSM (Temporal Shift Module). Mark 3 frames around the impact as positive and use 5 or 10% positive samples. Add the false positive prediction from a few undertrained detection models. Average predictions of multiple models from 4 folds. NMS in temporal direction using tracking results https://www.kaggle.com/c/nfl-impact-detection/discussion/209403
  • 31. ■ Can be inserted into 2D CNN backbone to enable joint spatial-temporal modeling at no additional cost. ■ Shift part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. ■ Support both offline (bi-direction) and online (uni-direction) video recognition. https://arxiv.org/abs/1811.08383 After the competition, I evaluated TSM in my own pipeline, and it showed better performance compared to 2D CNN (0.380 → 0.436)
  • 32. detection classification pp results 9 frames 1-class YOLOv5 Crop 2x width and height of the original bbox. Ensemble of 6 different EfficientNets (B3 and B5) + horizontal flip TTA. Replace the first 2D conv layers in the inverted residual blocks of EfficientNet with 3D conv layers. Predict the different impact types for the center frame as output variable (no impact, helmet, shoulder, body, ground impact) and optimize a softmax loss with class weights split 0.8:0.2 (non-impact : impact). Select all of the positive impact samples and a random sample of negative impact samples according to a specified ratio (0.99:0.01 non-impact:impact) at each epoch. https://www.kaggle.com/c/nfl-impact-detection/discussion/208979 Thresholding using stage 1 score. Filter out any frame earlier than 25. NMS based on IoU to filter out duplicate boxes in subsequent frames. Consider the top 19 predicted boxes based on their stage 2 score and remove boxes below a threshold of 0.15.
  • 33. detection classification pp results 9 frames Ensemble of 7 EfficientDet models (WBF) Detect helmet w/ impact and w/o impact separately Recall is around 0.97 Detected helmets with impact can be candidates. Crop 3x width and height of the original bbox. Convert to grayscale. Ensemble of 18 EfficientNets and ReXNets. Regard the impact bbox and its ±1 frame corresponding bbox as positive. https://www.kaggle.com/c/nfl-impact-detection/discussion/208787 Tune separated thresholds depending on the predictions in the other view. For instance, the threshold for a certain Endzone frame depends on whether there is a predicted bbox in the Sideline view within +-1 frame. If yes, the threshold is lower (say 0.25); if not, the threshold is higher (say 0.45). NMS based on IoU to filter out duplicate boxes in subsequent frames.
  • 34. detection classification pp results 20 frames https://www.kaggle.com/c/nfl-impact-detection/discussion/208947 1-class Faster R-CNN classification pp 20 frames A sequence of full frames is fed to the 3D CNN input, a feature map is calculated, and using the ROIAlign operation, features for the ROIs are extracted and classified. A sequence of frames cropped around the target box, then the sequence of crops is fed to the 3D CNN input, and then the impact probability is calculated directly. Use 5 input channels instead of 3 RGB channels. The first additional channel is the heatmap of the center of the helmet of interest. The second additional channel is the heatmap of the centers of all helmets. NMS based on IoU to filter out duplicate boxes in subsequent frames.
  • 35. https://www.kaggle.com/c/nfl-impact-detection/discussion/209235 I3D FPN 8 frames Make patches of 224 x 224 using a grid with a constant step. 3D feature maps from second to fifth blocks of the 3D CNN are passed to FPN. FPN produces a 6 x 56 x 56 grid. 1 : presence of helmet 2-5 : position and size of helmet 6 : presence of impact 1st stage training 2nd stage training 20 frames I3D FPN Freeze pp results NMS based on IoU to filter out duplicate boxes in subsequent frames. Remove predictions with low confidence if there is no predictions in the other view.
  • 36. https://www.kaggle.com/c/nfl-impact-detection/discussion/208833 detection classification pp results 9 frames 1-class EfficientDet-D5 Crop 3x width and height of the original bbox. Resize to 112 x 112 Ensemble of 2D ResNet-18 and 3D ResNet-18 Augmentation: HorizontalFlip, RandomBrightness, RandomContrast, one of(MotionBlur,MedianBlur,GaussianBlur,GaussNoise), HueSaturationValue, ShiftScaleRotate, Cutout, Bbox jitter If helmets are detected in the same position within 4 frames, only the middle frame is kept. Ignored the first and last 10 frames of the video because it is expected to be a low collision.
  • 37. https://www.kaggle.com/c/nfl-impact-detection/discussion/208851 detection classification pp Combine 4 folds with TTA for CenterNet. 8 consecutive frames are passed through the encoder individually, then intermediate concatenated and fed through UNet-like decoder to produce output heatmap & impacts map for 8 frames. results 3D ResNet-50 8 frames NMS based on IoU to filter out duplicate boxes in subsequent frames.
  • 38. https://www.kaggle.com/c/nfl-impact-detection/discussion/209012 detection classification pp results 9 frames (every 2 frames) Detect helmet w/ impact and w/o impact separately using DetectoRS (train 1-class detector as warm-up) Crop 1.22x width and height of the original bbox. 3D CNN (I3D, SlowFast) Different thresholds for Endzone and Sideline views. Different thresholds over time. Use an IoU threshold and frame-difference threshold to cluster detections (through multiple frames) which belong to the same player and remove FP Impacts that are detected with a confidence lower than T are removed if no impact is found in the other view.
  • 39. https://www.kaggle.com/c/nfl-impact-detection/discussion/208773 detection pp Stack multiple CenterNet's heads on the top of the feature extraction block (EfficientNet-B5). Each head is responsible for predicting helmet for each frame. Calculate loss independently between 2 classes (helmet w/ impact and helmet w/o impact) and then using weighted sum of the 2 losses. results 15 frames NMS based on IoU to filter out duplicate boxes in subsequent frames. Dynamic confidence threshold: Low threshold for frame 30-80th (impact most likely to happen in this period), then slowly increase the threshold.
  • 40.