4. 4
書誌情報
• title: PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
• authors: Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang,
Hongsheng Li
• institutes: CUHK-SenseTime Joint Laboratory/The Chinese University of Hong Kong, SenseTime
Research Abstract, NLPR/CASIA, CSE/CUHK
• publication: CVPR 2020
• paper url: https://arxiv.org/pdf/1912.13192
• code: https://github.com/jhultman/PV-RCNN
※スライド内の図は注釈がない限り、本論文からの引用です。
5. 5
関連研究: Voxel CNNベースの研究
• 点群をvoxelにして、CNNを用いて特徴抽出する
• 効率的に点群を処理できるが、畳み込みを行うためにvoxelにするため、情報が欠損してしまう
VoxelNet: End-to-End Learning for Point Cloud Based 3D
Object Detection
https://arxiv.org/abs/1711.06396
SECOND: Sparsely Embedded Convolutional Detection
https://www.mdpi.com/1424-8220/18/10/3337/pdf
6. 6
関連研究 PointNetベース
• 点群をそのままCNNの入力とする
• より多くの情報を抽出できるが、処理が重くなってしまう。
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a
Metric Space
https://arxiv.org/pdf/1706.02413.pdf
Deep Hough Voting for 3D Object Detection in Point Clouds
http://openaccess.thecvf.com/content_ICCV_2019/papers/Qi_Deep
_Hough_Voting_for_3D_Object_Detection_in_Point_Clouds_ICCV_20
19_paper.pdf
21. 21
Experiment
• Dataset
– KITTI
– Waymo Open
• Evaluation Metrics
– KITTI
• mean Average Precision with 40 recall positions
– Waymo Open
• mean Average Precision
• mean Average Precision weighted by heading
29. 29
Implementation Details
• Keypoints Sampling
– n = 2048 in KITTI
– n = 4096 in Waymo
• VSA module
– two neighboring radii of each level
• (0.4m, 0.8m), (0.8m, 1.2m), (1.2m, 2.4m), (2.4m, 4.8m)
• RoI grid pooling operations
– the neighborhood raddi of set abstraction for raw points are (0.4m, 0.8)
30. 30
Dataset Details
• KITTI
– detection range
• [0, 70.4]m for the X axis,
• [−40, 40]m for the Y axis
• [−3, 1]m for the Z axis
– the voxel size (0.05m,0.05m,0.1m) in each axis.
• Waymo Open dataset
– detection range
• [−75.2, 75.2]m for the X and Y axes
• [−2,4]m for the Z axis
– the voxel size to (0.1m, 0.1m, 0.15m).
32. 32
Training Details2
• For the proposal refinement stage, we randomly sample 128 proposals with 1:1 ratio for posi- tive
and negative proposals, where a proposal is considered as a positive proposal for box refinement
branch if it has at least 0.55 3D IoU with the ground-truth boxes, otherwise it is treated as a
negative proposal.
• Data Augmentation
– random flipping along the X axis,
– global scaling with a random scaling factor sampled from [0.95, 1.05]
– global rotation around the Z axis with a random angle sampled from [-pi/4, pi/4]
– randomly “paste” some new ground-truth objects from other scenes to the current training
scenes, for simulating objects in various environments
33. 33
References
• PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
– https://arxiv.org/abs/1912.13192v1
• VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
– https://arxiv.org/abs/1711.06396
• SECOND: Sparsely Embedded Convolutional Detection
– https://www.mdpi.com/1424-8220/18/10/3337/pdf
• PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
– https://arxiv.org/pdf/1706.02413.pdf
• Deep Hough Voting for 3D Object Detection in Point Clouds
– http://openaccess.thecvf.com/content_ICCV_2019/papers/Qi_Deep_Hough_Voting_for_3D_Ob
ject_Detection_in_Point_Clouds_ICCV_2019_paper.pdf