SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
PV-RCNN: Point-Voxel Feature Set Abstraction
for 3D Object Detection
Kohei Nishimura, DeepX
2
概要
• 点群から3Dの物体検知を行う新しいモジュールを提案した
– Voxel CNNの特徴ベクトルとPoint Netの特徴ベクトルから、キーポイントの特徴ベクトルを抽出
するモジュール
– RoIの特徴ベクトルを抽出するpoolingモジュール
• 提案したモジュールを用いて、3D物体検知のタスクにおいてSOTAであることを確認した
3
補足: 3次元表現
•
4
書誌情報
• title: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
• authors: Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang,
Hongsheng Li
• institutes: CUHK-SenseTime Joint Laboratory/The Chinese University of Hong Kong, SenseTime
Research Abstract, NLPR/CASIA, CSE/CUHK
• publication: CVPR 2020
• paper url: https://arxiv.org/pdf/1912.13192
• code: https://github.com/jhultman/PV-RCNN
※スライド内の図は注釈がない限り、本論文からの引用です。
5
関連研究: Voxel CNNベースの研究
• 点群をvoxelにして、CNNを用いて特徴抽出する
• 効率的に点群を処理できるが、畳み込みを行うためにvoxelにするため、情報が欠損してしまう
VoxelNet: End-to-End Learning for Point Cloud Based 3D
Object Detection
https://arxiv.org/abs/1711.06396
SECOND: Sparsely Embedded Convolutional Detection
https://www.mdpi.com/1424-8220/18/10/3337/pdf
6
関連研究 PointNetベース
• 点群をそのままCNNの入力とする
• より多くの情報を抽出できるが、処理が重くなってしまう。
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a
Metric Space
https://arxiv.org/pdf/1706.02413.pdf
Deep Hough Voting for 3D Object Detection in Point Clouds
http://openaccess.thecvf.com/content_ICCV_2019/papers/Qi_Deep
_Hough_Voting_for_3D_Object_Detection_in_Point_Clouds_ICCV_20
19_paper.pdf
7
Overview
•
8
Overview
Voxel-to-keypoint Scene Encoding
Keypoint-to-grid
RoI Feature Abstraction
9
Overview
Voxel-to-keypoint Scene Encoding
Keypoint-to-grid
RoI Feature Abstraction
10
Voxel-to-keypoint Scene Encoding
• 目的
– 点群の良い表現を獲得する
• 概要
– Voxel CNNとkeypointを組み合わせて点群の表現を抽出する
• モジュール(赤字が本論文で提案した内容)
– 3D Voxel CNN
– Keypoints Sampling Module
• Furthest Point-Sampling
– Extened Voxel Set Abstraction Module
– Predicted Keypoint Weighting Module
11
notation
• 𝑖: キーポイントのインデックス
• 𝑛: キーポイントの数
• 𝐹: ボクセル化したCNNの特徴ベクトル𝑓の集合
• 𝑉: ボクセルの3次元座標𝑣の集合
• 𝑘: ボクセルCNNのlayerの数
• 𝑝𝑣 𝑘: 𝑘層目のNNの特徴ベクトル
• 𝑁𝑘: 点群が含まれているボクセルの数
12
3D Voxel CNN
• 3D voxel CNN
– 点群を𝐿 ✕ 𝑊✕ 𝐻のボクセルに分割し、3 x 3 x 3の3D sparce convolution層の入力とする
– 4層でdownsamplingして、各層の特徴ベクトルをボクセルの特徴ベクトルとみなす
• downsamplingのサイズは、1x, 2x, 4x ,8x
• 3D proposal generation
– 3D voxel CNNをdownsampleした最終層の特徴ベクトルをz方向にstackして、2Dの特徴ベクトル
(bird-view feature以下、bev)とする
13
Voxel Set Abstraction Module
• 3D CNNの各層の特徴をキーポイントに埋め込むためのモジュール
– PointNet ++で提案
– PointNet++では、VoxelCNNではなく、PointNet++の特徴ベクトルを利用した
• 𝑘層目の3DCNNのキーポイント𝑖に対する特徴ベクトル𝑓𝑖
𝑝𝑣 𝑘
は
– 𝑀(・)は近傍𝑇𝑘個の𝑆𝑖をランダムサンプリングする
– 𝐺(・)はボクセル化した特徴ベクトルと相対座標を埋め込むMLP
– 𝑆𝑖は、キーポイント𝑖から距離𝑟𝑘にあるボクセルの3DCNNの特徴ベクトルとキーポイントからの相
対座標
• キーポイント𝑖に対する特徴ベクトルは、各層の特徴ベクトルをconcatして求める
14
Extened Voxel Set Abstraction Module
• Voxel Set Abstraction(VSA)を拡張し、より情報を獲得する
– keypoint 𝑖の特徴ベクトル𝑓𝑖
𝑝
は、以下3つの特徴ベクトルをconcat
• VSAの特徴ベクトル:𝑓𝑖
𝑝𝑣
• raw point cloudの特徴ベクトル: 𝑓𝑖
(𝑟𝑎𝑤)
• 2Dの鳥瞰特徴ベクトル: 𝑓𝑖
(𝑏𝑒𝑣)
– raw point cloudの特徴ベクトル𝑓𝑖
(𝑟𝑎𝑤)
: ボクセル化するときの情報量の欠損を補完できる
• 𝑓𝑖
𝑝𝑣
と同じと記載されているが、キーポイントの特徴ベクトルの計算方法は??
– 2Dの鳥瞰特徴ベクトル𝑓𝑖
(𝑏𝑒𝑣)
: ボクセルよりも広い受容野を持つため、全域的な特徴を抽出できる
• 3D Voxel CNNで説明した特徴ベクトルから、座標を用いて計算する
15
Predicted Keypoint Weighting Module(PKW)
• キーポイントの中で重要度を割り振り、物体領域の提案精度を高める
– 最前面にあるキーポイントは、背景にあるキーポイントよりも重要度が高い
• PKWを用いた重要度付きのキーポイント𝑖の特徴ベクトルは
– 𝐴(・)は3層のMLPで特徴ベクトルからキーポイントが最前面にあるかどうかを[0, 1]で推論する
– 𝐴(・)の学習の詳細
• キーポイントが最前面にあるかのフラグは、
segmentationのラベルから求める
– キーポイントが3Dのground-truthのボックスの
内側にあるかどうか
• 学習の誤差関数はfocal loss(default parameter)
16
Overview
Voxel-to-keypoint Scene Encoding
Keypoint-to-grid
RoI Feature Abstraction
17
Keypoint-to-grid RoI Feature Abstraction
• 目的
– 物体検知精度を高めるために、RoIのよい特徴表現を獲得する
• 概要
– キーポイントの特徴ベクトルからRoIの特徴ベクトルを計算し、物体検知のrefinementを行う
• モジュール(赤字が本論文が提案した内容)
– RoI-grid Pooling
– 3D Proposal Refinement and Confidence Prediction.
18
RoI-grid Pooling
• 物体検知精度を高めるために、キーポイントの特徴ベクトルを用いて、
RoIの良い特徴ベクトルを獲得する
– キーポイントの特徴ベクトルがRoIの外部の情報を取り込めるようにす
る
• RoIの特徴ベクトルは、同一RoI内のグリッドポイントの特徴ベクトルの
concatからなる
– グリッドポイントはRoIから一様にサンプリングされた点
• RoI内のグリッドポイント𝑔𝑖の特徴ベクトル𝑓𝑖
𝑔
は、
– 𝑀(・)は近傍𝑇𝑘個のΨ𝑖をランダムサンプリングする
– 𝐺(・)はボクセル化した特徴ベクトルと相対座標を埋め込むPointNet-
block
– Ψ𝑖は、以下の式で求められる距離𝑟内にあるキーポイント𝑝𝑗の特徴ベク
トルと𝑔𝑖からの相対位置をconcatして計算する
– 𝑓𝑖
𝑔
は、複数の𝑟に対して計算しconcatする
19
3D Proposal Refinement and Confidence Prediction
• refinement networkが、RoIのrefinementと確信度予測を行う
– 入力は、Grid-pooling後のRoIの特徴ベクトル
– 出力は、検知した領域の大きさ&位置と確信度
– 構造は2ブランチの2層のMLP
• 確信度予測について
– 予測対象の値𝑦 𝑘は以下の式
• 𝐼𝑜𝑈 𝑘は、𝑘番目のRoIのGround-Truthに対する𝐼𝑜𝑈
– 誤差関数はIoUを正規化したもの値のクロスエントロピー
20
Trainining Losses
• 誤差関数は、以下3つの誤差関数の総和
– 𝐿 𝑟𝑝𝑛: 提案した物体検知の誤差
– 𝐿 𝑠𝑒𝑔: キーポイントのセグメンテーションの誤差(PKWで説明)
– 𝐿 𝑟𝑐𝑛: refinementの誤差
• 𝐿 𝑟𝑝𝑛: 提案した物体検知の誤差
– 𝐿 𝑐𝑙𝑠: 物体検知のfocal loss
– 𝐿 𝑠𝑚𝑜𝑜𝑡ℎ−𝐿1:3D Voxel CNNが推論した物体とGTとのL1誤差(anchor boxの回帰を学習するため)
• 𝐿 𝑟𝑐𝑛: refinementの誤差
– 𝐿𝑖𝑜𝑢: 確信度予測誤差
– 𝐿 𝑠𝑚𝑜𝑜𝑡ℎ−𝐿1: refinement NNが推論したrefinementとそのGTとのL1誤差
21
Experiment
• Dataset
– KITTI
– Waymo Open
• Evaluation Metrics
– KITTI
• mean Average Precision with 40 recall positions
– Waymo Open
• mean Average Precision
• mean Average Precision weighted by heading
22
Results in KITTI
• PV-RCNN が多くの実験で既存手法を上回った
23
Results in Waymo
• PV-RCNN がすべての実験で既存手法を上回った。
24
Ablation Studies:Voxel-to-keypoint scene encoding & RoI-grid pooling
• Voxel-to-keypoint, RoI-grid poolingの両方が有効であることを確認した
25
Ablation Studies: VSAモジュール
• VSAモジュールはすべての特徴ベクトルを使うことで検知性能が向上することを確認した。
26
Ablation Studies: PKW & RoI-grid pooling module
• PKWとRoI-grid poolingモジュールが有効であることを確認した。
– RoI-aware Poolingは、RoI内の点の特徴ベクトルに対してMaxPoolingとAvgPoolingを組み合わせ
て特徴抽出を行う手法
27
まとめ
• 点群から3Dの物体検知を行う新しいモジュールを提案した
– Voxel CNNの特徴ベクトルとPoint Netの特徴ベクトルから、キーポイントの特徴ベクトルを抽出
するモジュール
– RoIの特徴ベクトルを抽出するpoolingモジュール
• 提案したモジュールを用いて、3D物体検知のタスクにおいてSOTAであることを確認した
28
Appendix
29
Implementation Details
• Keypoints Sampling
– n = 2048 in KITTI
– n = 4096 in Waymo
• VSA module
– two neighboring radii of each level
• (0.4m, 0.8m), (0.8m, 1.2m), (1.2m, 2.4m), (2.4m, 4.8m)
• RoI grid pooling operations
– the neighborhood raddi of set abstraction for raw points are (0.4m, 0.8)
30
Dataset Details
• KITTI
– detection range
• [0, 70.4]m for the X axis,
• [−40, 40]m for the Y axis
• [−3, 1]m for the Z axis
– the voxel size (0.05m,0.05m,0.1m) in each axis.
• Waymo Open dataset
– detection range
• [−75.2, 75.2]m for the X and Y axes
• [−2,4]m for the Z axis
– the voxel size to (0.1m, 0.1m, 0.15m).
31
Training Details
• optimizer: Adam
– cosine anealing learing rate
• KITTI
– batch size: 24
– learning rate: 0.01
– epoch: 80
– GPU: 8 GTX 1080Ti
– training time: 5 hours
• Waymot Open
– batch size: 64
– learning rate 0.01
– epochs 50
– GPU: 32 GTX 1080Ti
– training time: 25 hours
32
Training Details2
• For the proposal refinement stage, we randomly sample 128 proposals with 1:1 ratio for posi- tive
and negative proposals, where a proposal is considered as a positive proposal for box refinement
branch if it has at least 0.55 3D IoU with the ground-truth boxes, otherwise it is treated as a
negative proposal.
• Data Augmentation
– random flipping along the X axis,
– global scaling with a random scaling factor sampled from [0.95, 1.05]
– global rotation around the Z axis with a random angle sampled from [-pi/4, pi/4]
– randomly “paste” some new ground-truth objects from other scenes to the current training
scenes, for simulating objects in various environments
33
References
• PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
– https://arxiv.org/abs/1912.13192v1
• VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
– https://arxiv.org/abs/1711.06396
• SECOND: Sparsely Embedded Convolutional Detection
– https://www.mdpi.com/1424-8220/18/10/3337/pdf
• PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
– https://arxiv.org/pdf/1706.02413.pdf
• Deep Hough Voting for 3D Object Detection in Point Clouds
– http://openaccess.thecvf.com/content_ICCV_2019/papers/Qi_Deep_Hough_Voting_for_3D_Ob
ject_Detection_in_Point_Clouds_ICCV_2019_paper.pdf

Más contenido relacionado

La actualidad más candente

[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted WindowsDeep Learning JP
 
点群深層学習 Meta-study
点群深層学習 Meta-study点群深層学習 Meta-study
点群深層学習 Meta-studyNaoya Chiba
 
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...Kazuki Adachi
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説tancoro
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習cvpaper. challenge
 
三次元点群を取り扱うニューラルネットワークのサーベイ
三次元点群を取り扱うニューラルネットワークのサーベイ三次元点群を取り扱うニューラルネットワークのサーベイ
三次元点群を取り扱うニューラルネットワークのサーベイNaoya Chiba
 
CVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNetCVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNetTakuya Minagawa
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報Deep Learning JP
 
A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”Kota Matsui
 
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs Deep Learning JP
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative ModelsDeep Learning JP
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP
 
[DL輪読会]モデルベース強化学習とEnergy Based Model
[DL輪読会]モデルベース強化学習とEnergy Based Model[DL輪読会]モデルベース強化学習とEnergy Based Model
[DL輪読会]モデルベース強化学習とEnergy Based ModelDeep Learning JP
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot LearningDeep Learning JP
 
論文紹介「A Perspective View and Survey of Meta-Learning」
論文紹介「A Perspective View and Survey of Meta-Learning」論文紹介「A Perspective View and Survey of Meta-Learning」
論文紹介「A Perspective View and Survey of Meta-Learning」Kota Matsui
 
Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Surveytmtm otm
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisDeep Learning JP
 
【DL輪読会】Emergence of maps in the memories of blind navigation agents
【DL輪読会】Emergence of maps in the memories of blind navigation agents【DL輪読会】Emergence of maps in the memories of blind navigation agents
【DL輪読会】Emergence of maps in the memories of blind navigation agentsDeep Learning JP
 

La actualidad más candente (20)

[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[DL輪読会]Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
 
点群深層学習 Meta-study
点群深層学習 Meta-study点群深層学習 Meta-study
点群深層学習 Meta-study
 
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
論文紹介:Grad-CAM: Visual explanations from deep networks via gradient-based loca...
 
Triplet Loss 徹底解説
Triplet Loss 徹底解説Triplet Loss 徹底解説
Triplet Loss 徹底解説
 
【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習【メタサーベイ】数式ドリブン教師あり学習
【メタサーベイ】数式ドリブン教師あり学習
 
三次元点群を取り扱うニューラルネットワークのサーベイ
三次元点群を取り扱うニューラルネットワークのサーベイ三次元点群を取り扱うニューラルネットワークのサーベイ
三次元点群を取り扱うニューラルネットワークのサーベイ
 
Point net
Point netPoint net
Point net
 
CVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNetCVPR2018のPointCloudのCNN論文とSPLATNet
CVPR2018のPointCloudのCNN論文とSPLATNet
 
[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報[DL輪読会]ICLR2020の分布外検知速報
[DL輪読会]ICLR2020の分布外検知速報
 
A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”A summary on “On choosing and bounding probability metrics”
A summary on “On choosing and bounding probability metrics”
 
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs 【DL輪読会】Perceiver io  a general architecture for structured inputs & outputs
【DL輪読会】Perceiver io a general architecture for structured inputs & outputs
 
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
SSII2022 [SS1] ニューラル3D表現の最新動向〜 ニューラルネットでなんでも表せる?? 〜​
 
[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models[DL輪読会]Flow-based Deep Generative Models
[DL輪読会]Flow-based Deep Generative Models
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 
[DL輪読会]モデルベース強化学習とEnergy Based Model
[DL輪読会]モデルベース強化学習とEnergy Based Model[DL輪読会]モデルベース強化学習とEnergy Based Model
[DL輪読会]モデルベース強化学習とEnergy Based Model
 
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning【DL輪読会】DayDreamer: World Models for Physical Robot Learning
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
 
論文紹介「A Perspective View and Survey of Meta-Learning」
論文紹介「A Perspective View and Survey of Meta-Learning」論文紹介「A Perspective View and Survey of Meta-Learning」
論文紹介「A Perspective View and Survey of Meta-Learning」
 
Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Survey
 
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
[DL輪読会]NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
【DL輪読会】Emergence of maps in the memories of blind navigation agents
【DL輪読会】Emergence of maps in the memories of blind navigation agents【DL輪読会】Emergence of maps in the memories of blind navigation agents
【DL輪読会】Emergence of maps in the memories of blind navigation agents
 

Similar a [DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Dongmin Choi
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSGanesan Narayanasamy
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingYu Huang
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningCharles Deledalle
 
Introducción a las redes convolucionales
Introducción a las redes convolucionalesIntroducción a las redes convolucionales
Introducción a las redes convolucionalesJoseAlGarcaGutierrez
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Centertrack and naver airush 2020 review
Centertrack and naver airush 2020 reviewCentertrack and naver airush 2020 review
Centertrack and naver airush 2020 review경훈 김
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureSanghamitra Deb
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with TransformersDatabricks
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA Taiwan
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixelsDave Snowdon
 

Similar a [DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection (20)

Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]Objects as points (CenterNet) review [CDM]
Objects as points (CenterNet) review [CDM]
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
20220811 - computer vision
20220811 - computer vision20220811 - computer vision
20220811 - computer vision
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUSDeep learning fundamental and Research project on IBM POWER9 system from NUS
Deep learning fundamental and Research project on IBM POWER9 system from NUS
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Unsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object trackingUnsupervised/Self-supervvised visual object tracking
Unsupervised/Self-supervvised visual object tracking
 
MLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, CaptioningMLIP - Chapter 5 - Detection, Segmentation, Captioning
MLIP - Chapter 5 - Detection, Segmentation, Captioning
 
Introducción a las redes convolucionales
Introducción a las redes convolucionalesIntroducción a las redes convolucionales
Introducción a las redes convolucionales
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Centertrack and naver airush 2020 review
Centertrack and naver airush 2020 reviewCentertrack and naver airush 2020 review
Centertrack and naver airush 2020 review
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Computer Vision Landscape : Present and Future
Computer Vision Landscape : Present and FutureComputer Vision Landscape : Present and Future
Computer Vision Landscape : Present and Future
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digitsNVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
NVIDIA 深度學習教育機構 (DLI): Medical image segmentation using digits
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixels
 

Más de Deep Learning JP

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving PlannersDeep Learning JP
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについてDeep Learning JP
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...Deep Learning JP
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-ResolutionDeep Learning JP
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxivDeep Learning JP
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLMDeep Learning JP
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...Deep Learning JP
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place RecognitionDeep Learning JP
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?Deep Learning JP
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )Deep Learning JP
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...Deep Learning JP
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"Deep Learning JP
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "Deep Learning JP
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat ModelsDeep Learning JP
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"Deep Learning JP
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...Deep Learning JP
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...Deep Learning JP
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...Deep Learning JP
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...Deep Learning JP
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...Deep Learning JP
 

Más de Deep Learning JP (20)

【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
 
【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて【DL輪読会】事前学習用データセットについて
【DL輪読会】事前学習用データセットについて
 
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
【DL輪読会】 "Learning to render novel views from wide-baseline stereo pairs." CVP...
 
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
【DL輪読会】Zero-Shot Dual-Lens Super-Resolution
 
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
【DL輪読会】BloombergGPT: A Large Language Model for Finance arxiv
 
【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM【DL輪読会】マルチモーダル LLM
【DL輪読会】マルチモーダル LLM
 
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo... 【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
【 DL輪読会】ToolLLM: Facilitating Large Language Models to Master 16000+ Real-wo...
 
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
【DL輪読会】AnyLoc: Towards Universal Visual Place Recognition
 
【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?【DL輪読会】Can Neural Network Memorization Be Localized?
【DL輪読会】Can Neural Network Memorization Be Localized?
 
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
【DL輪読会】SimPer: Simple self-supervised learning of periodic targets( ICLR 2023 )
 
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
【DL輪読会】RLCD: Reinforcement Learning from Contrast Distillation for Language M...
 
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
【DL輪読会】"Secrets of RLHF in Large Language Models Part I: PPO"
 
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "【DL輪読会】"Language Instructed Reinforcement Learning  for Human-AI Coordination "
【DL輪読会】"Language Instructed Reinforcement Learning for Human-AI Coordination "
 
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
【DL輪読会】Llama 2: Open Foundation and Fine-Tuned Chat Models
 
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
【DL輪読会】"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware"
 
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
【DL輪読会】Parameter is Not All You Need:Starting from Non-Parametric Networks fo...
 
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
【DL輪読会】Drag Your GAN: Interactive Point-based Manipulation on the Generative ...
 
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
【DL輪読会】Self-Supervised Learning from Images with a Joint-Embedding Predictive...
 
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
【DL輪読会】Towards Understanding Ensemble, Knowledge Distillation and Self-Distil...
 
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
【DL輪読会】VIP: Towards Universal Visual Reward and Representation via Value-Impl...
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

  • 1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection Kohei Nishimura, DeepX
  • 2. 2 概要 • 点群から3Dの物体検知を行う新しいモジュールを提案した – Voxel CNNの特徴ベクトルとPoint Netの特徴ベクトルから、キーポイントの特徴ベクトルを抽出 するモジュール – RoIの特徴ベクトルを抽出するpoolingモジュール • 提案したモジュールを用いて、3D物体検知のタスクにおいてSOTAであることを確認した
  • 4. 4 書誌情報 • title: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection • authors: Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li • institutes: CUHK-SenseTime Joint Laboratory/The Chinese University of Hong Kong, SenseTime Research Abstract, NLPR/CASIA, CSE/CUHK • publication: CVPR 2020 • paper url: https://arxiv.org/pdf/1912.13192 • code: https://github.com/jhultman/PV-RCNN ※スライド内の図は注釈がない限り、本論文からの引用です。
  • 5. 5 関連研究: Voxel CNNベースの研究 • 点群をvoxelにして、CNNを用いて特徴抽出する • 効率的に点群を処理できるが、畳み込みを行うためにvoxelにするため、情報が欠損してしまう VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection https://arxiv.org/abs/1711.06396 SECOND: Sparsely Embedded Convolutional Detection https://www.mdpi.com/1424-8220/18/10/3337/pdf
  • 6. 6 関連研究 PointNetベース • 点群をそのままCNNの入力とする • より多くの情報を抽出できるが、処理が重くなってしまう。 PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space https://arxiv.org/pdf/1706.02413.pdf Deep Hough Voting for 3D Object Detection in Point Clouds http://openaccess.thecvf.com/content_ICCV_2019/papers/Qi_Deep _Hough_Voting_for_3D_Object_Detection_in_Point_Clouds_ICCV_20 19_paper.pdf
  • 10. 10 Voxel-to-keypoint Scene Encoding • 目的 – 点群の良い表現を獲得する • 概要 – Voxel CNNとkeypointを組み合わせて点群の表現を抽出する • モジュール(赤字が本論文で提案した内容) – 3D Voxel CNN – Keypoints Sampling Module • Furthest Point-Sampling – Extened Voxel Set Abstraction Module – Predicted Keypoint Weighting Module
  • 11. 11 notation • 𝑖: キーポイントのインデックス • 𝑛: キーポイントの数 • 𝐹: ボクセル化したCNNの特徴ベクトル𝑓の集合 • 𝑉: ボクセルの3次元座標𝑣の集合 • 𝑘: ボクセルCNNのlayerの数 • 𝑝𝑣 𝑘: 𝑘層目のNNの特徴ベクトル • 𝑁𝑘: 点群が含まれているボクセルの数
  • 12. 12 3D Voxel CNN • 3D voxel CNN – 点群を𝐿 ✕ 𝑊✕ 𝐻のボクセルに分割し、3 x 3 x 3の3D sparce convolution層の入力とする – 4層でdownsamplingして、各層の特徴ベクトルをボクセルの特徴ベクトルとみなす • downsamplingのサイズは、1x, 2x, 4x ,8x • 3D proposal generation – 3D voxel CNNをdownsampleした最終層の特徴ベクトルをz方向にstackして、2Dの特徴ベクトル (bird-view feature以下、bev)とする
  • 13. 13 Voxel Set Abstraction Module • 3D CNNの各層の特徴をキーポイントに埋め込むためのモジュール – PointNet ++で提案 – PointNet++では、VoxelCNNではなく、PointNet++の特徴ベクトルを利用した • 𝑘層目の3DCNNのキーポイント𝑖に対する特徴ベクトル𝑓𝑖 𝑝𝑣 𝑘 は – 𝑀(・)は近傍𝑇𝑘個の𝑆𝑖をランダムサンプリングする – 𝐺(・)はボクセル化した特徴ベクトルと相対座標を埋め込むMLP – 𝑆𝑖は、キーポイント𝑖から距離𝑟𝑘にあるボクセルの3DCNNの特徴ベクトルとキーポイントからの相 対座標 • キーポイント𝑖に対する特徴ベクトルは、各層の特徴ベクトルをconcatして求める
  • 14. 14 Extened Voxel Set Abstraction Module • Voxel Set Abstraction(VSA)を拡張し、より情報を獲得する – keypoint 𝑖の特徴ベクトル𝑓𝑖 𝑝 は、以下3つの特徴ベクトルをconcat • VSAの特徴ベクトル:𝑓𝑖 𝑝𝑣 • raw point cloudの特徴ベクトル: 𝑓𝑖 (𝑟𝑎𝑤) • 2Dの鳥瞰特徴ベクトル: 𝑓𝑖 (𝑏𝑒𝑣) – raw point cloudの特徴ベクトル𝑓𝑖 (𝑟𝑎𝑤) : ボクセル化するときの情報量の欠損を補完できる • 𝑓𝑖 𝑝𝑣 と同じと記載されているが、キーポイントの特徴ベクトルの計算方法は?? – 2Dの鳥瞰特徴ベクトル𝑓𝑖 (𝑏𝑒𝑣) : ボクセルよりも広い受容野を持つため、全域的な特徴を抽出できる • 3D Voxel CNNで説明した特徴ベクトルから、座標を用いて計算する
  • 15. 15 Predicted Keypoint Weighting Module(PKW) • キーポイントの中で重要度を割り振り、物体領域の提案精度を高める – 最前面にあるキーポイントは、背景にあるキーポイントよりも重要度が高い • PKWを用いた重要度付きのキーポイント𝑖の特徴ベクトルは – 𝐴(・)は3層のMLPで特徴ベクトルからキーポイントが最前面にあるかどうかを[0, 1]で推論する – 𝐴(・)の学習の詳細 • キーポイントが最前面にあるかのフラグは、 segmentationのラベルから求める – キーポイントが3Dのground-truthのボックスの 内側にあるかどうか • 学習の誤差関数はfocal loss(default parameter)
  • 17. 17 Keypoint-to-grid RoI Feature Abstraction • 目的 – 物体検知精度を高めるために、RoIのよい特徴表現を獲得する • 概要 – キーポイントの特徴ベクトルからRoIの特徴ベクトルを計算し、物体検知のrefinementを行う • モジュール(赤字が本論文が提案した内容) – RoI-grid Pooling – 3D Proposal Refinement and Confidence Prediction.
  • 18. 18 RoI-grid Pooling • 物体検知精度を高めるために、キーポイントの特徴ベクトルを用いて、 RoIの良い特徴ベクトルを獲得する – キーポイントの特徴ベクトルがRoIの外部の情報を取り込めるようにす る • RoIの特徴ベクトルは、同一RoI内のグリッドポイントの特徴ベクトルの concatからなる – グリッドポイントはRoIから一様にサンプリングされた点 • RoI内のグリッドポイント𝑔𝑖の特徴ベクトル𝑓𝑖 𝑔 は、 – 𝑀(・)は近傍𝑇𝑘個のΨ𝑖をランダムサンプリングする – 𝐺(・)はボクセル化した特徴ベクトルと相対座標を埋め込むPointNet- block – Ψ𝑖は、以下の式で求められる距離𝑟内にあるキーポイント𝑝𝑗の特徴ベク トルと𝑔𝑖からの相対位置をconcatして計算する – 𝑓𝑖 𝑔 は、複数の𝑟に対して計算しconcatする
  • 19. 19 3D Proposal Refinement and Confidence Prediction • refinement networkが、RoIのrefinementと確信度予測を行う – 入力は、Grid-pooling後のRoIの特徴ベクトル – 出力は、検知した領域の大きさ&位置と確信度 – 構造は2ブランチの2層のMLP • 確信度予測について – 予測対象の値𝑦 𝑘は以下の式 • 𝐼𝑜𝑈 𝑘は、𝑘番目のRoIのGround-Truthに対する𝐼𝑜𝑈 – 誤差関数はIoUを正規化したもの値のクロスエントロピー
  • 20. 20 Trainining Losses • 誤差関数は、以下3つの誤差関数の総和 – 𝐿 𝑟𝑝𝑛: 提案した物体検知の誤差 – 𝐿 𝑠𝑒𝑔: キーポイントのセグメンテーションの誤差(PKWで説明) – 𝐿 𝑟𝑐𝑛: refinementの誤差 • 𝐿 𝑟𝑝𝑛: 提案した物体検知の誤差 – 𝐿 𝑐𝑙𝑠: 物体検知のfocal loss – 𝐿 𝑠𝑚𝑜𝑜𝑡ℎ−𝐿1:3D Voxel CNNが推論した物体とGTとのL1誤差(anchor boxの回帰を学習するため) • 𝐿 𝑟𝑐𝑛: refinementの誤差 – 𝐿𝑖𝑜𝑢: 確信度予測誤差 – 𝐿 𝑠𝑚𝑜𝑜𝑡ℎ−𝐿1: refinement NNが推論したrefinementとそのGTとのL1誤差
  • 21. 21 Experiment • Dataset – KITTI – Waymo Open • Evaluation Metrics – KITTI • mean Average Precision with 40 recall positions – Waymo Open • mean Average Precision • mean Average Precision weighted by heading
  • 22. 22 Results in KITTI • PV-RCNN が多くの実験で既存手法を上回った
  • 23. 23 Results in Waymo • PV-RCNN がすべての実験で既存手法を上回った。
  • 24. 24 Ablation Studies:Voxel-to-keypoint scene encoding & RoI-grid pooling • Voxel-to-keypoint, RoI-grid poolingの両方が有効であることを確認した
  • 25. 25 Ablation Studies: VSAモジュール • VSAモジュールはすべての特徴ベクトルを使うことで検知性能が向上することを確認した。
  • 26. 26 Ablation Studies: PKW & RoI-grid pooling module • PKWとRoI-grid poolingモジュールが有効であることを確認した。 – RoI-aware Poolingは、RoI内の点の特徴ベクトルに対してMaxPoolingとAvgPoolingを組み合わせ て特徴抽出を行う手法
  • 27. 27 まとめ • 点群から3Dの物体検知を行う新しいモジュールを提案した – Voxel CNNの特徴ベクトルとPoint Netの特徴ベクトルから、キーポイントの特徴ベクトルを抽出 するモジュール – RoIの特徴ベクトルを抽出するpoolingモジュール • 提案したモジュールを用いて、3D物体検知のタスクにおいてSOTAであることを確認した
  • 29. 29 Implementation Details • Keypoints Sampling – n = 2048 in KITTI – n = 4096 in Waymo • VSA module – two neighboring radii of each level • (0.4m, 0.8m), (0.8m, 1.2m), (1.2m, 2.4m), (2.4m, 4.8m) • RoI grid pooling operations – the neighborhood raddi of set abstraction for raw points are (0.4m, 0.8)
  • 30. 30 Dataset Details • KITTI – detection range • [0, 70.4]m for the X axis, • [−40, 40]m for the Y axis • [−3, 1]m for the Z axis – the voxel size (0.05m,0.05m,0.1m) in each axis. • Waymo Open dataset – detection range • [−75.2, 75.2]m for the X and Y axes • [−2,4]m for the Z axis – the voxel size to (0.1m, 0.1m, 0.15m).
  • 31. 31 Training Details • optimizer: Adam – cosine anealing learing rate • KITTI – batch size: 24 – learning rate: 0.01 – epoch: 80 – GPU: 8 GTX 1080Ti – training time: 5 hours • Waymot Open – batch size: 64 – learning rate 0.01 – epochs 50 – GPU: 32 GTX 1080Ti – training time: 25 hours
  • 32. 32 Training Details2 • For the proposal refinement stage, we randomly sample 128 proposals with 1:1 ratio for posi- tive and negative proposals, where a proposal is considered as a positive proposal for box refinement branch if it has at least 0.55 3D IoU with the ground-truth boxes, otherwise it is treated as a negative proposal. • Data Augmentation – random flipping along the X axis, – global scaling with a random scaling factor sampled from [0.95, 1.05] – global rotation around the Z axis with a random angle sampled from [-pi/4, pi/4] – randomly “paste” some new ground-truth objects from other scenes to the current training scenes, for simulating objects in various environments
  • 33. 33 References • PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection – https://arxiv.org/abs/1912.13192v1 • VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection – https://arxiv.org/abs/1711.06396 • SECOND: Sparsely Embedded Convolutional Detection – https://www.mdpi.com/1424-8220/18/10/3337/pdf • PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space – https://arxiv.org/pdf/1706.02413.pdf • Deep Hough Voting for 3D Object Detection in Point Clouds – http://openaccess.thecvf.com/content_ICCV_2019/papers/Qi_Deep_Hough_Voting_for_3D_Ob ject_Detection_in_Point_Clouds_ICCV_2019_paper.pdf

Notas del editor

  1. プサイのインデックスが抜けている? 提案手法のRoI-grid Poolingは、キーポイントがよい表現を獲得することを意図して設計されている RoI内の特徴ベクトルの平均や、情報量のない値を特徴ベクトルとするのに比べて3次元のよりよい表現を抽出できる。