SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
Deep Moving Object Recognition:
Research Project on IBM
POWER9
Research Team: Lav Kush Kumar, Santosh Kumar Vipparthi
Vision Intelligence Lab, Malaviya National Institute of Technology Jaipur, India
Murari Mandal
Postdoctoral Researcher, NUS Singapore
Agenda
• Moving Object Recognition (MOR) in Regular View
▪ MotionRec: A Unified Deep Framework for Moving Object
Recognition, WACV-2020 [M. Mandal, L. K. Kumar, M. S. Saran,
S. K. Vipparthi]
• MOR in Aerial View
▪ MOR-UAV: A Benchmark Dataset and Baselines for Moving Object
Recognition in UAV Videos, ACM Multimedia-2020 [M. Mandal,
L.K. Kumar, S. K. Vipparthi]
Introduction
• Moving Object Recognition (MOR)?
• Simultaneous localization and classification of moving
objects in videos.
• Fundamental task for many computer vision and video
processing applications.
IN TELLIGENT VISUAL SU R VEILLAN C E
FOR IN TRUSION D ETECTION
TR AFFIC M ONITORING
M ARITIME S U R VEILLAN C E
source: www.google.com
SEARCH & RESCUE DISASTER RESPONSE
REMOTE MONITORINGDISASTER RESPONSE
source: https://visionintelligence.github.io/Datasets.html
Challenges: Regular View MOR
• MOR in Different Weather Conditions
• Background Changes and Camera Jitters
• Illumination Changes
• Variable Foreground Motion Speed
• Shadow, Camouflage and Occlusion
• Speed
6
Challenges: Regular View MOR
Challenges: Aerial View MOR
• Intra and Inter-class Variations
• Insufficient Annotated Data
• Realtime Challenges
• Locating Motion Clues
• Variable Object Density
• Small and Large Object Shapes
• Sporadic Camera Motion
• Changes in the Aerial View
Challenges: Aerial View MOR
Motivation
Object Detection Moving Object Detection Moving Object Recognition
Motivation
MotionRec: MOR in Regular View
• Current Systems:
▪ Object Detection
▪ Moving Object Detection
• Proposed System:
▪ A novel deep learning framework to perform online
moving object recognition (MOR) in streaming videos.
▪ First attempt for simultaneous localization and
classification of moving objects in a video, i.e. MOR in
a single-stage deep learning framework.
MotionRec
The proposed MotionRec framework.
Preliminary Concepts
• Resnet:
▪ Deep feature extraction with
high stack of layers with
“identity shortcut connections”
• Anchors:
▪ Pre defined bounding boxes at
different scale and aspect
ratios.
Preliminary Concepts
• Feature Pyramid Network
▪ Rich multi-scale feature pyramid from one single
resolution input image.
▪ Bottom-up pathway computes feature maps at different
scale.
▪ Top-down pathway and lateral connection constructs
higher resolution layers from a semantic rich layer.
Preliminary Concepts
• Intersection over Union (IoU):
▪ Highest overlap (Intersection)
divided by non-overlap (Union).
▪ IoU greater than a threshold
shows the existence of the object
in the anchor box
• Non Max Suppression (NMS)
▪ Select cell with largest
probability among candidates
for object as a prediction.
TDR Block: Background Estimation
Visualization: TDR Block
Visualization: Motion Saliency
Network Configurations
• MotionRec takes two tensors of shape 608x608xT (past
temporal history) and 608x608x3 (current frame) as input
and returns the spatial coordinates with class labels for
moving object instances.
• While training MotionRec, we use the ResNet50 backbone
pretrained over the ImageNet dataset.
Network Configurations
• For regression and classification, smooth L1 and focal loss
functions are used respectively.
• The training loss is the sum of above mentioned two losses.
The loss gradients are backpropagated through TDR blocks
as well.
Implementation Details
• Model Training:
• MotionRec forms a single-stage fully connected network
which ensures online operability and fast speed.
• The entire framework is implemented in Keras with
Tensorflow backend.
• Training is performed with batch size=1 over Titan V GPU
in the IBM POWER9 system.
Implementation Details
• We use adam optimizer with initial learning rate set to
1x10^-5.
• All models are trained for approximately 500k iterations.
• We only use horizontal image flipping for data
augmentation.
Implementation Details
• Inference:
• Similar to training, inference involves simply giving current
frame and recent T temporal history frames as input to the
network.
• Only few past frames (T=10/20/30) are required, enabling
online moving object recognition
Dataset Description
• Due to lack of available benchmark datasets with labelled
bounding boxes for MOR, we created a new set of ground
truths by annotating 42,614 objects (14,814 cars and 27,800
person) in 24,923 video frames from CDnet 2014.
• We selected 16 video sequences having 21,717 frames and
38,827 objects (13,442 cars and 25,385 person) for training.
• For testing, 3 video sequences with 3,206 frame and 3,787
objects (1 ,372 cars and 2,415 person) were chosen.
Dataset Description
• We created axis-aligned bounding box annotations for
moving object instances in all the frames.
• We define the baseline train and test divisions for qualitative
and quantitative evaluation.
Dataset Description
Quantitative Results
Speed and Efficiency Analysis
Performance Analysis
Regular Vs Aerial View
Regular View Aerial View
Expected Features for UAV Applications
• Resource Efficient Model.
• Memory – The model must take very less memory space.
• Compute – The model must operate even with minimal
computational support.
• Accuracy – The model must offer reasonable accurate
results.
• Real-time – The model must offer scope for real-time
inference.
MOR in Aerial View?
• Variable sizes of the vehicles (small, medium and large).
• High/low density of vehicles and complex background in the
cameras field of view.
• Moreover, the aerial scenes in urban setup usually comprises
of a varieties of object types leading to excessive interclass
object similarities.
• No existing dataset for MOR for analysis
MOR-UAV: MOR in Aerial View
• Our Contribution:
▪ We introduce MOR-UAV, a large-scale video dataset for
moving object recognition (MOR) in aerial videos.
▪ A novel deep learning framework to perform online
MOR in streaming videos.
▪ To simultaneous localization and classification of
moving objects, i.e. MOR in a single-stage deep learning
framework.
• Dataset Details
▪ 30 videos
▪ 89,783 moving object instances
▪ 10,948 frames
▪ Avg. bounding box (BB) height = 29.01, Avg. BB width = 17.64
▪ Min. BB height = 6, Min BB width = 6
▪ Max. BB height = 181, Max. BB width = 106
▪ Avg. video sequence length = 364.93, Min. video sequence length =
64, Max. video sequence length = 1,146
• Dataset Attributes
▪ Variable object density
▪ Small and large object shapes
▪ Sporadic camera motion
▪ Changes in the aerial view
MOR-UAV Dataset
The bounding-box (BB) height-width scatter-plot of all the object instances in
MOR-UAV along with the complete dataset description
MOR-UAV Dataset
Sample video frames from MOR-UAV dataset
MOR-UAV Dataset
Comparison of MOR-UAV with other largescale UAV datasets
Det: Detection, T: Visual tracking, Act: Action
recognition, MOR: Moving object recognition
MOR-UAV Dataset
MOR-UAVNet Framework
MOR-UAVNet Framework
• Schematic illustration of the proposed MOR-UAVNet
framework for MOR in UAV videos.
• The motion saliency is estimate d through cascaded optical
flow computation at multiple stages in the temp oral history
frames.
• In this figure, optical flow between the current frame and the
last (𝑂F-1), third last (𝑂F-3), fifth last frame (𝑂F-5) is
computed respectively.
• We then assimilate the salient motion features with the
current frame. These assimilated features are forwarded
through the ResNet backbone to extract spatial and temporal
dimension aware features.
MOR-UAVNet Framework
• Moreover, the base features from the current frame are also
extracted to reinforce the semantic context of the object
instances.
• These two feature maps are concatenated at matching scales
to produce a feature map for motion encoding.
• Afterward, multi-level feature pyramids are generated. The
dense bounding box and category scores are generated at
each level of the pyramid.
• We use 5 pyramid levels in our experiments.
Visualization
Network Configuration
• We resize all the video frames in MORUAV dataset to
608×608×3 for a uniform setting in training and evaluation.
• We compute the dense optical flow 𝟏 with the following
values of T is used in our experiments:
• T = 3 (𝐶_OF = 1-3-5),T = 2 (𝐶_OF = 1-3), T= 2 (𝐶_OF = 1-
5), T = 1(𝐶_OF = 1).
1Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In
Scandinavian conference on Image analysis. Springer, 363–370.
Model Training
• The one-stage MOR-UAVNet network is trained end-to-end
with multiple input layers.
• The complete framework is implemented in Keras with
Tensorflw backend.
• Training is performed with batch size = 1 over Titan V GPU
in IBM POWER9 systems.
Model Training
• The network is optimized with Adam optimizer and initial
learning rate of 10^-5. All models are trained for
approximately 250-300k iterations.
• For regression and classification, L-1 and focal loss
functions are used, respectively.
Model Inference
• Similar to training, inference involves simply giving the
current frame and cascaded optical flow maps computed
from past history frames as input to the network.
• Only a few optical flow maps (T = 1/ 2/ 3) are required,
enabling online moving object recognition for real-time
analysis.
Experimental Results
Training and
testing set
description
Experimental Results
Speed and efficiency analysis
Performance Analysis
mAP of MOR-UAVNetv1 across different IoU thresholds over (a) vid14, (b) vid15, (c) vid16, (d) vid17
Failure cases
Discussion
• Our dataset caters to real-world demands with vivid samples
collected from numerous unconstrained circumstances.
• We feel this benchmark dataset can support promising
research trends in UAV based vehicular technology.
• Research directions for exploration:
▪ Realtime challenges
▪ Locating motion clues
• Acknowledgement
▪ IBM POWER9
• Contact us for any queries:
▪ http://visionintelligence.github.io/
▪ https://github.com/murari023
▪ Email: murarimandal.cv@gmail.com
• Source Code
▪ https://github.com/lav-kush/MotionRec
Thank You!

Más contenido relacionado

La actualidad más candente

Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
 
Batch normalization
Batch normalizationBatch normalization
Batch normalizationYuichiro Iio
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationDongmin Choi
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Shunta Saito
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...Dongmin Choi
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Dongmin Choi
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Moving object detection using background subtraction algorithm using simulink
Moving object detection using background subtraction algorithm using simulinkMoving object detection using background subtraction algorithm using simulink
Moving object detection using background subtraction algorithm using simulinkeSAT Publishing House
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis taeseon ryu
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET Journal
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 

La actualidad más candente (20)

Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
Background Subtraction Algorithm for Moving Object Detection Using Denoising ...
Background Subtraction Algorithm for Moving Object Detection Using Denoising ...Background Subtraction Algorithm for Moving Object Detection Using Denoising ...
Background Subtraction Algorithm for Moving Object Detection Using Denoising ...
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
Batch normalization
Batch normalizationBatch normalization
Batch normalization
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
 
Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向Deep LearningフレームワークChainerと最近の技術動向
Deep LearningフレームワークChainerと最近の技術動向
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Background subtraction
Background subtractionBackground subtraction
Background subtraction
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Moving object detection using background subtraction algorithm using simulink
Moving object detection using background subtraction algorithm using simulinkMoving object detection using background subtraction algorithm using simulink
Moving object detection using background subtraction algorithm using simulink
 
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural NetworksIRJET-Multiple Object Detection using Deep Neural Networks
IRJET-Multiple Object Detection using Deep Neural Networks
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 

Similar a Deep learning fundamental and Research project on IBM POWER9 system from NUS

Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkNAVER Engineering
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...Edge AI and Vision Alliance
 
Geofencing_neuralnetworks.pptx
Geofencing_neuralnetworks.pptxGeofencing_neuralnetworks.pptx
Geofencing_neuralnetworks.pptxGeorge John
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learningSushant Shrivastava
 
Distributed Deep Learning Using Java on the Client and in the Cloud
Distributed Deep Learning Using Java on the Client and in the CloudDistributed Deep Learning Using Java on the Client and in the Cloud
Distributed Deep Learning Using Java on the Client and in the CloudData Science Leuven
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient SimulationDongwonSon1
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final PresentationPranay Mankad
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxAlyaaMachi
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMKyuri Kim
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognitionGeunhee Cho
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverviewMotaz El-Saban
 

Similar a Deep learning fundamental and Research project on IBM POWER9 system from NUS (20)

slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Online video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident networkOnline video object segmentation via convolutional trident network
Online video object segmentation via convolutional trident network
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
 
Geofencing_neuralnetworks.pptx
Geofencing_neuralnetworks.pptxGeofencing_neuralnetworks.pptx
Geofencing_neuralnetworks.pptx
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Distributed Deep Learning Using Java on the Client and in the Cloud
Distributed Deep Learning Using Java on the Client and in the CloudDistributed Deep Learning Using Java on the Client and in the Cloud
Distributed Deep Learning Using Java on the Client and in the Cloud
 
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation[RSS2023] Local Object Crop Collision Network for Efficient Simulation
[RSS2023] Local Object Crop Collision Network for Efficient Simulation
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
 
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptxVideo Annotation for Visual Tracking via Selection and Refinement_tran.pptx
Video Annotation for Visual Tracking via Selection and Refinement_tran.pptx
 
Future semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTMFuture semantic segmentation with convolutional LSTM
Future semantic segmentation with convolutional LSTM
 
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problemsDataset creation for Deep Learning-based Geometric Computer Vision problems
Dataset creation for Deep Learning-based Geometric Computer Vision problems
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
NetVLAD: CNN architecture for weakly supervised place recognition
NetVLAD:  CNN architecture for weakly supervised place recognitionNetVLAD:  CNN architecture for weakly supervised place recognition
NetVLAD: CNN architecture for weakly supervised place recognition
 
TechnicalBackgroundOverview
TechnicalBackgroundOverviewTechnicalBackgroundOverview
TechnicalBackgroundOverview
 

Más de Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency programGanesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISAGanesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsGanesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsGanesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsGanesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Ganesan Narayanasamy
 

Más de Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 

Último

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Último (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

Deep learning fundamental and Research project on IBM POWER9 system from NUS

  • 1. Deep Moving Object Recognition: Research Project on IBM POWER9 Research Team: Lav Kush Kumar, Santosh Kumar Vipparthi Vision Intelligence Lab, Malaviya National Institute of Technology Jaipur, India Murari Mandal Postdoctoral Researcher, NUS Singapore
  • 2. Agenda • Moving Object Recognition (MOR) in Regular View ▪ MotionRec: A Unified Deep Framework for Moving Object Recognition, WACV-2020 [M. Mandal, L. K. Kumar, M. S. Saran, S. K. Vipparthi] • MOR in Aerial View ▪ MOR-UAV: A Benchmark Dataset and Baselines for Moving Object Recognition in UAV Videos, ACM Multimedia-2020 [M. Mandal, L.K. Kumar, S. K. Vipparthi]
  • 3. Introduction • Moving Object Recognition (MOR)? • Simultaneous localization and classification of moving objects in videos. • Fundamental task for many computer vision and video processing applications.
  • 4. IN TELLIGENT VISUAL SU R VEILLAN C E FOR IN TRUSION D ETECTION TR AFFIC M ONITORING M ARITIME S U R VEILLAN C E source: www.google.com
  • 5. SEARCH & RESCUE DISASTER RESPONSE REMOTE MONITORINGDISASTER RESPONSE source: https://visionintelligence.github.io/Datasets.html
  • 6. Challenges: Regular View MOR • MOR in Different Weather Conditions • Background Changes and Camera Jitters • Illumination Changes • Variable Foreground Motion Speed • Shadow, Camouflage and Occlusion • Speed 6
  • 8. Challenges: Aerial View MOR • Intra and Inter-class Variations • Insufficient Annotated Data • Realtime Challenges • Locating Motion Clues • Variable Object Density • Small and Large Object Shapes • Sporadic Camera Motion • Changes in the Aerial View
  • 11. Object Detection Moving Object Detection Moving Object Recognition Motivation
  • 12. MotionRec: MOR in Regular View • Current Systems: ▪ Object Detection ▪ Moving Object Detection • Proposed System: ▪ A novel deep learning framework to perform online moving object recognition (MOR) in streaming videos. ▪ First attempt for simultaneous localization and classification of moving objects in a video, i.e. MOR in a single-stage deep learning framework.
  • 14. Preliminary Concepts • Resnet: ▪ Deep feature extraction with high stack of layers with “identity shortcut connections” • Anchors: ▪ Pre defined bounding boxes at different scale and aspect ratios.
  • 15. Preliminary Concepts • Feature Pyramid Network ▪ Rich multi-scale feature pyramid from one single resolution input image. ▪ Bottom-up pathway computes feature maps at different scale. ▪ Top-down pathway and lateral connection constructs higher resolution layers from a semantic rich layer.
  • 16. Preliminary Concepts • Intersection over Union (IoU): ▪ Highest overlap (Intersection) divided by non-overlap (Union). ▪ IoU greater than a threshold shows the existence of the object in the anchor box • Non Max Suppression (NMS) ▪ Select cell with largest probability among candidates for object as a prediction.
  • 17. TDR Block: Background Estimation
  • 20. Network Configurations • MotionRec takes two tensors of shape 608x608xT (past temporal history) and 608x608x3 (current frame) as input and returns the spatial coordinates with class labels for moving object instances. • While training MotionRec, we use the ResNet50 backbone pretrained over the ImageNet dataset.
  • 21. Network Configurations • For regression and classification, smooth L1 and focal loss functions are used respectively. • The training loss is the sum of above mentioned two losses. The loss gradients are backpropagated through TDR blocks as well.
  • 22. Implementation Details • Model Training: • MotionRec forms a single-stage fully connected network which ensures online operability and fast speed. • The entire framework is implemented in Keras with Tensorflow backend. • Training is performed with batch size=1 over Titan V GPU in the IBM POWER9 system.
  • 23. Implementation Details • We use adam optimizer with initial learning rate set to 1x10^-5. • All models are trained for approximately 500k iterations. • We only use horizontal image flipping for data augmentation.
  • 24. Implementation Details • Inference: • Similar to training, inference involves simply giving current frame and recent T temporal history frames as input to the network. • Only few past frames (T=10/20/30) are required, enabling online moving object recognition
  • 25. Dataset Description • Due to lack of available benchmark datasets with labelled bounding boxes for MOR, we created a new set of ground truths by annotating 42,614 objects (14,814 cars and 27,800 person) in 24,923 video frames from CDnet 2014. • We selected 16 video sequences having 21,717 frames and 38,827 objects (13,442 cars and 25,385 person) for training. • For testing, 3 video sequences with 3,206 frame and 3,787 objects (1 ,372 cars and 2,415 person) were chosen.
  • 26. Dataset Description • We created axis-aligned bounding box annotations for moving object instances in all the frames. • We define the baseline train and test divisions for qualitative and quantitative evaluation.
  • 31.
  • 32. Regular Vs Aerial View Regular View Aerial View
  • 33. Expected Features for UAV Applications • Resource Efficient Model. • Memory – The model must take very less memory space. • Compute – The model must operate even with minimal computational support. • Accuracy – The model must offer reasonable accurate results. • Real-time – The model must offer scope for real-time inference.
  • 34. MOR in Aerial View? • Variable sizes of the vehicles (small, medium and large). • High/low density of vehicles and complex background in the cameras field of view. • Moreover, the aerial scenes in urban setup usually comprises of a varieties of object types leading to excessive interclass object similarities. • No existing dataset for MOR for analysis
  • 35. MOR-UAV: MOR in Aerial View • Our Contribution: ▪ We introduce MOR-UAV, a large-scale video dataset for moving object recognition (MOR) in aerial videos. ▪ A novel deep learning framework to perform online MOR in streaming videos. ▪ To simultaneous localization and classification of moving objects, i.e. MOR in a single-stage deep learning framework.
  • 36. • Dataset Details ▪ 30 videos ▪ 89,783 moving object instances ▪ 10,948 frames ▪ Avg. bounding box (BB) height = 29.01, Avg. BB width = 17.64 ▪ Min. BB height = 6, Min BB width = 6 ▪ Max. BB height = 181, Max. BB width = 106 ▪ Avg. video sequence length = 364.93, Min. video sequence length = 64, Max. video sequence length = 1,146 • Dataset Attributes ▪ Variable object density ▪ Small and large object shapes ▪ Sporadic camera motion ▪ Changes in the aerial view MOR-UAV Dataset
  • 37. The bounding-box (BB) height-width scatter-plot of all the object instances in MOR-UAV along with the complete dataset description MOR-UAV Dataset
  • 38. Sample video frames from MOR-UAV dataset MOR-UAV Dataset
  • 39. Comparison of MOR-UAV with other largescale UAV datasets Det: Detection, T: Visual tracking, Act: Action recognition, MOR: Moving object recognition MOR-UAV Dataset
  • 41. MOR-UAVNet Framework • Schematic illustration of the proposed MOR-UAVNet framework for MOR in UAV videos. • The motion saliency is estimate d through cascaded optical flow computation at multiple stages in the temp oral history frames. • In this figure, optical flow between the current frame and the last (𝑂F-1), third last (𝑂F-3), fifth last frame (𝑂F-5) is computed respectively. • We then assimilate the salient motion features with the current frame. These assimilated features are forwarded through the ResNet backbone to extract spatial and temporal dimension aware features.
  • 42. MOR-UAVNet Framework • Moreover, the base features from the current frame are also extracted to reinforce the semantic context of the object instances. • These two feature maps are concatenated at matching scales to produce a feature map for motion encoding. • Afterward, multi-level feature pyramids are generated. The dense bounding box and category scores are generated at each level of the pyramid. • We use 5 pyramid levels in our experiments.
  • 44. Network Configuration • We resize all the video frames in MORUAV dataset to 608×608×3 for a uniform setting in training and evaluation. • We compute the dense optical flow 𝟏 with the following values of T is used in our experiments: • T = 3 (𝐶_OF = 1-3-5),T = 2 (𝐶_OF = 1-3), T= 2 (𝐶_OF = 1- 5), T = 1(𝐶_OF = 1). 1Gunnar Farnebäck. 2003. Two-frame motion estimation based on polynomial expansion. In Scandinavian conference on Image analysis. Springer, 363–370.
  • 45. Model Training • The one-stage MOR-UAVNet network is trained end-to-end with multiple input layers. • The complete framework is implemented in Keras with Tensorflw backend. • Training is performed with batch size = 1 over Titan V GPU in IBM POWER9 systems.
  • 46. Model Training • The network is optimized with Adam optimizer and initial learning rate of 10^-5. All models are trained for approximately 250-300k iterations. • For regression and classification, L-1 and focal loss functions are used, respectively.
  • 47. Model Inference • Similar to training, inference involves simply giving the current frame and cascaded optical flow maps computed from past history frames as input to the network. • Only a few optical flow maps (T = 1/ 2/ 3) are required, enabling online moving object recognition for real-time analysis.
  • 51. Performance Analysis mAP of MOR-UAVNetv1 across different IoU thresholds over (a) vid14, (b) vid15, (c) vid16, (d) vid17
  • 52.
  • 54. Discussion • Our dataset caters to real-world demands with vivid samples collected from numerous unconstrained circumstances. • We feel this benchmark dataset can support promising research trends in UAV based vehicular technology. • Research directions for exploration: ▪ Realtime challenges ▪ Locating motion clues
  • 55.
  • 56. • Acknowledgement ▪ IBM POWER9 • Contact us for any queries: ▪ http://visionintelligence.github.io/ ▪ https://github.com/murari023 ▪ Email: murarimandal.cv@gmail.com • Source Code ▪ https://github.com/lav-kush/MotionRec Thank You!