Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Prediction and planning for self driving at waymoYu Huang
ChauffeurNet: Learning To Drive By Imitating The Best Synthesizing The Worst
Multipath: Multiple Probabilistic Anchor Trajectory Hypotheses For Behavior Prediction
VectorNet: Encoding HD Maps And Agent Dynamics From Vectorized Representation
TNT: Target-driven Trajectory Prediction
Large Scale Interactive Motion Forecasting For Autonomous Driving : The Waymo Open Motion Dataset
Identifying Driver Interactions Via Conditional Behavior Prediction
Peeking Into The Future: Predicting Future Person Activities And Locations In Videos
STINet: Spatio-temporal-interactive Network For Pedestrian Detection And Trajectory Prediction
camera-based Lane detection by deep learningYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Camera-Based Road Lane Detection by Deep Learning IIYu Huang
lane detection, deep learning, autonomous driving, CNN, RNN, LSTM, GRU, lane localization, lane fitting, ego lane, end-to-end, vanishing point, segmentation, FCN, regression, classification
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
Road-line detection and 3D reconstruction using fisheye cameras
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a
Generic Framework for Handling Common Camera Distortion Models
• OmniDet: Surround View Cameras based Multi-task Visual Perception
Network for Autonomous Driving
• Adversarial Attacks on Multi-task Visual Perception for Autonomous Driving
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
FisheyeMultiNet: Real-time Multi-task Learning Architecture for
Surround-view Automated Parking System
• Generalized Object Detection on Fisheye Cameras for Autonomous
Driving: Dataset, Representations and Baseline
• SynWoodScape: Synthetic Surround-view Fisheye Camera Dataset for
Autonomous Driving
• Feasible Self-Calibration of Larger Field-of-View (FOV) Camera Sensors
for the ADAS
Fisheye based Perception for Autonomous Driving VIYu Huang
Disentangling and Vectorization: A 3D Visual Perception Approach for Autonomous Driving Based on Surround-View Fisheye Cameras
SVDistNet: Self-Supervised Near-Field Distance Estimation on Surround View Fisheye Cameras
FisheyeDistanceNet++: Self-Supervised Fisheye Distance Estimation with Self-Attention, Robust Loss Function and Camera View Generalization
An Online Learning System for Wireless Charging Alignment using Surround-view Fisheye Cameras
RoadEdgeNet: Road Edge Detection System Using Surround View Camera Images
An Assessment of Image Matching Algorithms in Depth EstimationCSCJournals
Computer vision is often used with mobile robot for feature tracking, landmark sensing, and obstacle detection. Almost all high-end robotics systems are now equipped with pairs of cameras arranged to provide depth perception. In stereo vision application, the disparity between the stereo images allows depth estimation within a scene. Detecting conjugate pair in stereo images is a challenging problem known as the correspondence problem. The goal of this research is to assess the performance of SIFT, MSER, and SURF, the well known matching algorithms, in solving the correspondence problem and then in estimating the depth within the scene. The results of each algorithm are evaluated and presented. The conclusion and recommendations for future works, lead towards the improvement of these powerful algorithms to achieve a higher level of efficiency within the scope of their performance.
Video Shot Boundary Detection Using The Scale Invariant Feature Transform and...IJECEIAES
Segmentation of the video sequence by detecting shot changes is essential for video analysis, indexing and retrieval. In this context, a shot boundary detection algorithm is proposed in this paper based on the scale invariant feature transform (SIFT). The first step of our method consists on a top down search scheme to detect the locations of transitions by comparing the ratio of matched features extracted via SIFT for every RGB channel of video frames. The overview step provides the locations of boundaries. Secondly, a moving average calculation is performed to determine the type of transition. The proposed method can be used for detecting gradual transitions and abrupt changes without requiring any training of the video content in advance. Experiments have been conducted on a multi type video database and show that this algorithm achieves well performances.
In this work, we define a new method for indexing and retrieving non-geotagged
video sequences based on the visual content only by using the Local Binary Pattern
(LBP) and Singular Value Decomposition (SVD) techniques. The main question of our
system, Is it possible to determine the geographic location of a video film on the GISmap
from just its pixels of frames?. The proposed system is introduced to answer the
questions like that. The GIS database was constructed by storing the reference images
on the intersection between segment roads in the map. The Local Binary Pattern
(LBP) is used to extract the features form images. The Singular Value Decomposition
(SVD) technique is used for compress the length of features and indexing the images
in the database. The input to the system is a video taken from the camera puts on a
vehicle as forward facing camera. The output of the proposed system is the geolocation
of keyframes of video which correspond the geo-tagged images retrieved
from the GIS database.
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...Kitsukawa Yuki
パターン・映像情報処理特論において論文を紹介した時の発表資料です。
Xiangyun Meng, Wei Wang, and Ben Leong. 2015. SkyStitch: A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). ACM, New York, NY, USA, 261-270. DOI=http://dx.doi.org/10.1145/2733373.2806225
Similar to Fisheye-Omnidirectional View in Autonomous Driving III (20)
Application of Foundation Model for Autonomous DrivingYu Huang
Since DARPA’s Grand Challenges (rural) in 2004/05 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. Recently powered by large language models (LLMs), chat systems, such as chatGPT and PaLM, emerge and rapidly become a promising direction to achieve artificial general intelligence (AGI) in natural language processing (NLP). There comes a natural thinking that we could employ these abilities to reformulate autonomous driving. By combining LLM with foundation models, it is possible to utilize the human knowledge, commonsense and reasoning to rebuild autonomous driving systems from the current long-tailed AI dilemma. In this paper, we investigate the techniques of foundation models and LLMs applied for autonomous driving, categorized as simulation, world model, data annotation and planning or E2E solutions etc.
Autonomous driving for robotaxi, like perception, prediction, planning, decision making and control etc. As well as simulation, visualization and data closed loop etc.
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
Canadian Adverse Driving Conditions Dataset, 2020, 2
Deep multimodal sensor fusion in unseen adverse weather, 2020, 8
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather, 2021, 4
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection, 2021, 7
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather, 2021, 8
DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather, 2021, 9
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
Formal Scenario-Based Testing of Autonomous Vehicles: From Simulation to the Real World, 2020
A Scenario-Based Development Framework for Autonomous Driving, 2020
A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving, 2020
Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction, 2021
Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles, 2021
Systems Approach to Creating Test Scenarios for Automated Driving Systems, Reliability Engineering and System Safety (215), 2021
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
Introduction;
data driven models for autonomous driving;
cloud computing infrastructure and big data processing;
annotation tools for training data;
large scale model training platform;
model testing and verification;
related machine learning techniques;
Conclusion.
Simulation for autonomous driving at uber atgYu Huang
Testing Safety of SDVs by Simulating Perception and Prediction
LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World
Recovering and Simulating Pedestrians in the Wild
S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling
SceneGen: Learning to Generate Realistic Traffic Scenes
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors
GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles
Appendix: (Waymo)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
RegNet: Multimodal Sensor Registration Using Deep Neural Networks
CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks
RGGNet: Tolerance Aware LiDAR-Camera Online Calibration with Geometric Deep Learning and Generative Model
CalibRCNN: Calibrating Camera and LiDAR by Recurrent Convolutional Neural Network and Geometric Constraints
LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
CFNet: LiDAR-Camera Registration Using Calibration Flow Network
Online aptitude test management system project report.pdfKamal Acharya
The purpose of on-line aptitude test system is to take online test in an efficient manner and no time wasting for checking the paper. The main objective of on-line aptitude test system is to efficiently evaluate the candidate thoroughly through a fully automated system that not only saves lot of time but also gives fast results. For students they give papers according to their convenience and time and there is no need of using extra thing like paper, pen etc. This can be used in educational institutions as well as in corporate world. Can be used anywhere any time as it is a web based application (user Location doesn’t matter). No restriction that examiner has to be present when the candidate takes the test.
Every time when lecturers/professors need to conduct examinations they have to sit down think about the questions and then create a whole new set of questions for each and every exam. In some cases the professor may want to give an open book online exam that is the student can take the exam any time anywhere, but the student might have to answer the questions in a limited time period. The professor may want to change the sequence of questions for every student. The problem that a student has is whenever a date for the exam is declared the student has to take it and there is no way he can take it at some other time. This project will create an interface for the examiner to create and store questions in a repository. It will also create an interface for the student to take examinations at his convenience and the questions and/or exams may be timed. Thereby creating an application which can be used by examiners and examinee’s simultaneously.
Examination System is very useful for Teachers/Professors. As in the teaching profession, you are responsible for writing question papers. In the conventional method, you write the question paper on paper, keep question papers separate from answers and all this information you have to keep in a locker to avoid unauthorized access. Using the Examination System you can create a question paper and everything will be written to a single exam file in encrypted format. You can set the General and Administrator password to avoid unauthorized access to your question paper. Every time you start the examination, the program shuffles all the questions and selects them randomly from the database, which reduces the chances of memorizing the questions.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
A review on techniques and modelling methodologies used for checking electrom...nooriasukmaningtyas
The proper function of the integrated circuit (IC) in an inhibiting electromagnetic environment has always been a serious concern throughout the decades of revolution in the world of electronics, from disjunct devices to today’s integrated circuit technology, where billions of transistors are combined on a single chip. The automotive industry and smart vehicles in particular, are confronting design issues such as being prone to electromagnetic interference (EMI). Electronic control devices calculate incorrect outputs because of EMI and sensors give misleading values which can prove fatal in case of automotives. In this paper, the authors have non exhaustively tried to review research work concerned with the investigation of EMI in ICs and prediction of this EMI using various modelling methodologies and measurement setups.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
2. Outline
• DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation
through SwaftNet for Surrounding Sensing
• The OmniScape Dataset (ICRA’2020)
• Universal Semantic Segmentation for Fisheye Urban Driving Images
• Vehicle Re-ID for Surround-view Camera System
• SynDistNet: Self-Supervised Monocular Fisheye Camera Distance
Estimation Synergized with Semantic Segmentation for Autonomous
Driving
• Towards Autonomous Driving: a Multi-Modal 360 Perception Proposal
3. DS-PASS: Detail-Sensitive Panoramic Annular Semantic
Segmentation through SwaftNet for Surrounding Sensing
• In this paper, propose a network adaptation framework to achieve Panoramic
Annular Semantic Segmentation (PASS), which allows to re-use conventional
pinhole-view image datasets, enabling modern segmentation networks to
comfortably adapt to panoramic images.
• Specifically, adapt our proposed SwaftNet to enhance the sensitivity to details by
implementing attention-based latera connections between the detail-critical
encoder layers and the context-critical decoder layers. It benchmarks the
performance of efficient segmenters on panoramic segmentation with an
extended PASS dataset, demonstrating that the proposed realtime SwaftNet
outperforms state-of-the-art efficient networks.
• Furthermore, assess real-world performance when deploying the Detail-Sensitive
PASS (DS-PASS) system on a mobile robot and an instrumented vehicle, as well as
the benefit of panoramic semantics for visual odometry, showing the robustness
and potential to support diverse navigational applications.
4. DS-PASS: Detail-Sensitive Panoramic Annular Semantic
Segmentation through SwaftNet for Surrounding Sensing
Panoramic annular semantic segmentation. On the left: raw annular image; First row on the right:
unfolded panorama; Second row: panoramic segmentation of the baseline method, the
classification heatmap of pedestrian is blurry; Third row: detail-sensitive panoramic segmentation
of the proposed method, the heatmap and semantic map are detail-preserved.
5. DS-PASS: Detail-Sensitive Panoramic Annular Semantic
Segmentation through SwaftNet for Surrounding Sensing
The proposed framework for panoramic
annular semantic segmentation. Each
feature model (corresponding to the single
feature model like encoder in conventional
architectures) is responsible for predicting
the semantically-meaningful high-level
feature map of a panorama segment while
interacting with the neighboring ones
through cross-segment padding (indicated
by the dotted arrows). Fusion model
incorporates the feature maps and
completes the panoramic segmentation.
The proposed architecture follows the single-
scale model of SwiftNet, based on an U-
shape structure like Unet and LinkNet.
6. DS-PASS: Detail-Sensitive Panoramic Annular Semantic
Segmentation through SwaftNet for Surrounding Sensing
The proposed architecture with attention-based lateral connections to blend semantically-
rich deep layers with spatially-detailed shallow layers. The down-sampling path with the SPP
module (encoder) corresponds to the feature model in last figure, while the up-sampling path
(decoder) corresponds to the fusion model
8. The OmniScape Dataset
• Despite the utility and benefits of omnidirectional images in robotics and automotive applications,
there are no datasets of omnidirectional images available with semantic segmentation, depth map,
and dynamic properties.
• This is due to the time cost and human effort required to annotate ground truth images.
• This paper presents a framework for generating omnidirectional images using images that are
acquired from a virtual environment.
• For this purpose, it demonstrates the relevance of the proposed framework on two well-known
simulators: CARLA Simulator, which is an open-source simulator for autonomous driving research, and
Grand Theft Auto V(GTA V), which is a very high quality video game.
• It explains in details the generated OmniScape dataset, which includes stereo fisheye and catadioptric
images acquired from the two front sides of a motorcycle, including semantic segmentation, depth
map, intrinsic parameters of the cameras and the dynamic parameters of the motorcycle.
• It is worth noting that the case of two-wheeled vehicles is more challenging than cars due to the
specific dynamic of these vehicles.
13. Universal Semantic Segmentation for Fisheye
Urban Driving Images
• When performing semantic image segmentation, a wider field of view (FoV) helps to
obtain more information about the surrounding environment, making automatic driving
safer and more reliable, which could be offered by fisheye cameras.
• However, large public fisheye datasets are not available, and the fisheye images captured
by the fisheye camera with large FoV comes with large distortion, so commonly-used
semantic segmentation model cannot be directly utilized.
• In this paper, a 7 DoF augmentation method is proposed to transform rectilinear image
to fisheye image in a more comprehensive way.
• In training, rectilinear images are transformed into fisheye images in 7 DoF, which
simulates the fisheye images from different positions, orientations and focal lengths.
• The result shows that training with the seven-DoF augmentation can improve the models
accuracy and robustness against different distorted fisheye data.
• This seven-DoF augmentation provides a universal semantic segmentation solution for
fisheye cameras in different autonomous driving applications.
• The code and configurations are released at https://github.com/Yaozhuwa/FisheyeSeg.
14. Universal Semantic Segmentation for Fisheye
Urban Driving Images
Projection model of fisheye camera. PW is a
point on a rectilinear image that we place on
the x-y plane of the world coordinate system.
Ɵ is the Angle of incidence of the point
relative to the fisheye camera. P is the
imaging point of PW on the fisheye image.
|OP| = fƟ. The relative rotation and
translation between the world coordinate
system and the camera coordinate system
results in six degrees of freedom.
15. Universal Semantic Segmentation for Fisheye
Urban Driving Images
The six DoF augmentation.
Except the first row, every
image is transformed using
a virtual fisheye camera
with focal length of 300
pixels. The letter in brackets
means that which axis the
camera is panning along or
rotating around.
16. Universal Semantic Segmentation for Fisheye
Urban Driving Images
the synthetic fisheye images with different f(focal length)
17. Universal Semantic Segmentation for Fisheye
Urban Driving Images
1. Base Aug: random clipping + random flip + color
jitter + z-aug of fixed focal length
2. RandF Aug: Base Aug + random focal length
3. RandR Aug: Base Aug + random rotation
4. RandT Aug: Base Aug + random translation
5. RandFR Aug: Base Aug + random focal length +
random rotation
6. RandFT Aug: Base Aug + random focal length +
random translation
7. Six-DoF Aug: Base Aug + random rotation +
random translation
8. Seven-DoF Aug: Base Aug + random focal length
+ random rotation + random translation
Seven-DoF Augmentation
18. Vehicle Re-ID for Surround-view Camera System
• The vehicle re-identification (Re-ID) plays a critical role in the perception system of
autonomous driving, which attracts more and more attention in recent years.
• However, there is no existing complete solution for the surround-view system mounted
on the vehicle.
• Two main challenges in above scenario: i) In single-camera view, it is difficult to recognize
the same vehicle from the past image frames due to the fish-eye distortion, occlusion,
truncation, etc. ii) In multi-camera view, the appearance of the same vehicle varies
greatly from different cameras viewpoints.
• Thus, an integral vehicle Re-ID solution to address these problems.
• Specifically, a quality evaluation mechanism to balance the effect of tracking boxes drift
and targets consistence.
• Besides, take advantage of the Re-ID network based on attention mechanism, then
combined with a spatial constraint strategy to further boost the performance between
different cameras.
• It will release the code and annotated fisheye dataset for the benefit of community.
19. Vehicle Re-ID for Surround-view Camera System
360 surround-view camera system. Each
arrow points to an image captured by the
corresponding camera.
20. Vehicle Re-ID for Surround-view Camera System
Vehicles in single view of fisheye camera. (a) The same vehicle features change dramatically in
consecutive frames and vehicles tend to obscure each other. (b) Matching errors are caused
by tracking results. (c) The vehicle center indicated by the orange box is stable while the IoU in
consecutive frames indicated by the yellow box decreases with movement.
21. Vehicle Re-ID for Surround-view Camera System
The overall framework of vehicle Re-ID in single camera. Each object is assigned
a single tracker to realize Re-ID in single channel. Tracking templates are
initialized with object detection results. All tracking outputs are post-processed by
the quality evaluation module to deal with the distorted or occluded objects.
22. Vehicle Re-ID for Surround-view Camera System
The overall framework of the vehicle Re-ID in multi-camera. For the new target, Re-ID model is used first
to extract the features, followed by the distance metrics is carried out for this feature and features in
gallery. Besides, the spatial constraint strategy is adopted to improve the correlation effect.
23. Vehicle Re-ID for Surround-view Camera System
Samples captured by different cameras. (a) The appearances of the same vehicle
captured by different cameras vary greatly, and the same color represents the same
object. (b) Objects have a similar appearance may appear in the same camera
view, as shown by these two black vehicles in green boxes.
24. Vehicle Re-ID for Surround-view Camera System
Illustration of the multi-camera Re-ID
network. This network is a 2 branch
parallel structure. The top branch is
employed to make the network pay
more attention on object regions, and
anther is for extracting global
features.
25. Vehicle Re-ID for Surround-view Camera System
Projection uncertainty of key points. Ellipse 1 and ellipse 2 are
uncertainty ranges of front and left (right) cameras, respectively.
27. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
• In this paper, introduce a novel multi-task learning strategy to improve self
supervised monocular distance estimation on fisheye and pinhole camera images.
• The contribution to this work is threefold:
• Firstly, we introduce a novel distance estimation network architecture using a self-attention
based encoder coupled with robust semantic feature guidance to the decoder that can be
trained in a one-stage fashion.
• Secondly, we integrate a generalized robust loss function, which improves performance
significantly while removing the need for hyperparameter tuning with the reprojection loss.
• Finally, we reduce the artifacts caused by dynamic objects violating static world assumption
by using a semantic masking strategy.
• As there is limited work on fisheye cameras, it is evaluated on KITTI using a
pinhole model.
• It achieved state-of-the-art performance among self-supervised methods without
requiring an external scale estimation.
28. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
Overview over the joint prediction of distance
^Dt and semantic segmentation Mt from a
single input image It. Compared to previous
approaches, the semantically guided
distance estimation produces sharper depth
edges and reasonable distance estimates for
dynamic objects.
29. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
• The self-supervised depth and distance estimation is developed within a self-
supervised monocular structure-from-motion (SfM) framework which requires two
networks aiming at learning:
• 1. a monocular depth/distance model gD : It -> ^Dt predicting a scale-ambiguous
depth or distance (the equivalent of depth for general image geometries) ^Dt =
gD(It(ij)) per pixel ij in the target image It;
• 2. an ego-motion predictor gT : (It; It’ ) -> Tt->t0 predicting a set of 6 degrees of
freedom which implement a rigid transformation Tt->t’ ∊ SE(3), between the target
image It and the set of reference images It’. Typically, t’ ∊ {t + 1; t – 1}, i.e. the frames
It-1 and It+1 are used as reference images, although using a larger window is possible.
30. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
Overview of proposed
framework for the joint
prediction of distance and
semantic segmentation. The
upper part (blue blocks)
describes the single steps for
the depth estimation, while the
green blocks describe the
single steps needed for the
prediction of the semantic
segmentation.
31. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
Visualization of proposed network architecture
to semantically guide the depth estimation.
They utilize a self-attention-based encoder
and a semantically guided decoder using
pixel-adaptive convolutions.
32. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
Quantitative performance comparison of network with other self-supervised monocular methods for depths
up to 80m for KITTI. Original uses raw depth maps for evaluation, and Improved uses annotated depth
maps. At test-time, all methods excluding FisheyeDistanceNet, PackNet-SfM and this method, scale the
estimated depths using median ground-truth LiDAR depth.
33. SynDistNet: Self-Supervised Monocular Fisheye Camera Distance Estimation
Synergized with Semantic Segmentation for Autonomous Driving
Qualitative result comparison on the Fisheye WoodScape dataset between the baseline model without
contributions and the proposed SynDistNet. SynDistNet can recover the distance of dynamic objects (left
images) which eventually solves the infinite distance issue. In the 3rd and 4th columns, can see that
semantic guidance helps to recover the thin structure and resolve the distance of homogeneous areas
outputting sharp distance maps on raw fisheye images.
34. Towards Autonomous Driving: a Multi-Modal
360 Perception Proposal
• A multi-modal 360 framework for 3D object detection and tracking for
autonomous vehicles is presented.
• The process is divided into four main stages.
• First, images are fed into a CNN network to obtain instance segmentation of the
surrounding road participants.
• Second, LiDAR-to-image association is performed for the estimated mask proposals.
• Then, the isolated points of every object are processed by a PointNet ensemble to
compute their corresponding 3D bounding boxes and poses.
• A tracking stage based on Unscented Kalman Filter is used to track the agents along
time.
• The solution, based on a sensor fusion configuration, provides accurate and
reliable road environment detection.
• A wide variety of tests of the system, deployed in an autonomous vehicle,
have successfully assessed the suitability of the proposed perception stack
in a real autonomous driving application.
35. Towards Autonomous Driving: a Multi-Modal
360 Perception Proposal
• The following sensors are employed:
• Five CMOS cameras equipped with an 85-HFOV lens.
• A 32-layer LiDAR scanner featuring a minimum vertical resolution of 0.33 and a range of
200m (Velodyne Ultra Puck).
• Accurate synchronization and calibration between sensors are of paramount
importance.
• Hence, they all are synchronized with the clock provided by a GPS receiver, and
cameras are externally triggered at a 10 Hz rate.
• Regarding calibration, cameras’ intrinsic parameters are obtained through the
checkerboard-based approach by Zhang, and extrinsic parameters representing
the relative position between sensors are estimated through a monocular-ready
variant of the velo2cam method.
• The result of this automatic procedure is further validated by visual inspection.
36. Towards Autonomous Driving: a Multi-Modal
360 Perception Proposal
• The proposed solution is based on three pillars.
• First, visual data is employed to perform detection and instance level
semantic segmentation.
• Then, LiDAR points whose image projection falls within each obstacle
bounding polygon are employed to estimate its 3D pose.
• Finally, the tracking stage provides consistency, thus mitigating
occasional misdetections and enabling trajectory prediction.
• The combination of these three stages allows accurate and robust
identification of the dynamic agents surrounding the vehicle.
37. Towards Autonomous Driving: a Multi-Modal
360 Perception Proposal
System overview. Images from all the cameras are processed by individual instances of Mask R-
CNN, which provide detections endowed with a semantic mask. LiDAR points in these regions are
used as an input for several F-PointNets responsible for estimating a 3D bounding box and
estimate its position with respect to the car. Then, 3D detections from each camera are fused using
an NMS procedure. A subsequent tracking stage provides consistency across several frames and
avoids temporary misdetections.
38. Towards Autonomous Driving: a Multi-Modal
360 Perception Proposal
Qualitative results of the proposed system on some typical traffic scenarios. From top to bottom: 3D
detections in rear-left, front-left, front, front-right, and rear-right cameras, and Bird’s Eye View representation.