SlideShare una empresa de Scribd logo
1 de 123
P H D T H ES I S D E F E N C E
S U N A N D O S E N G U P TA
OX FO R D B RO O K ES U N I V E RS I T Y
Semantic Mapping of Road
Scenes
1
Supervisors – Prof. Philip Torr and Prof. David Duce
16/06/2014
Outline
 Introduction
 The Labelling problem
 Dense Semantic Map (chap. 3)
 Dense 3D Semantic Modelling (chap. 4)
 Mesh Based Inference (chap. 5)
 Hierarchical CRF on an Octree Graph (chap. 6)
 Conclusion
2
Objective
 Holy grail of computer vision
 What are the objects present in the scene
 Where are they located
 Biological vision performs these two activities through human
visual perception.
 Computers ( or humans through them) try to solve the same
issue through an information processing route.
 Gather sensor data (images, gps, imu,…)
 Represent them into a map
 Recognise objects in the map
 This thesis aims to look in this very problem and propose
solution towards addressing it.
3
Can happen simultaneously or
sequentially
Chap 1, Sec 1.2
Objective - Visually
 Input image of a street scene, person cleaning, some cars in the
background, and buildings in the horizon.
 Place the appropriate objects at right distance from camera in correct size.
4
Chap 1, Sec 1.2
Image courtesy: Antonio Torallba,
http://6.869.csail.mit.edu/fa13/
Why it is important
5
 Numerous applications from robotics, entertainment,
engineering, medical…
 Self driving cars
 Engineering
 Robots for manipulation
 Humanoids
 Assistive vision for impaired
 Entertainment
 Aim for a vision based system to produce a semantically
consistent scene from visual inputs
Chap 1, Sec 1.2
Essentially a hard problem
6
 Large variation in the image formulation
 Scene Variation
 Varying scene type and geometry
 Object level variation
 Large number of object classes
 Individual Object location and orientation
 Object shape and appearance
 Depth/occlusions
 Illumination
 Shadows
 Motion blur
Chap 1, Sec 1.2
Thesis - Contributions
7
 This thesis provides solutions for large scale outdoor
urban semantic mapping.
 Large scale Dense overhead semantic mapping.
 Semantic from local images fused to
form a global ground plane map
 First attempt to generate such map.
 ~15km of semantic mapping
 One of the first large scale semantic map
 Presented as oral in IEEE IROS 2012
Chap 1, Sec 1.3
Thesis - Contributions
8
 Dense semantic reconstruction
 Dense 3D semantic reconstruction from kms of
stereo images.
 Online sequential volumetric reconstruction to
accommodate arbitrarily long road scenes.
 Presented as oral in IEEE ICRA 2013.
 Mesh based inference for scene labelling
 Improved labelling accuracy and consistency.
 Depth sensitive classifier fusion.
 25x faster in inference time (than image labelling).
 Presented as poster in CVPR 2013.
Chap 1, Sec 1.3
Thesis - Contributions
9
 Hierarchical CRF on an Octree Graph
 Unified framework to determine free and
occupied regions in a scene along with
object class labels.
 Robust PN potential over octree volumes
 Datasets (available online)
 Yotta labelled dataset: multiview street images (urban, rural,
highway) containing 8000+ images, with object class labellings
 Kitti Labelled dataset: Object class labelling for publicly available
KITTI dataset
Chap 1, Sec 1.3
Publications
10
 Related to Thesis
 S. Sengupta, P. Sturgess, L. Ladicky, P. H. S. Torr: Automatic dense visual semantic mapping from street-
level imagery. IEEE/RSJ IROS 2012 (Chapter 3 )
 S. Sengupta, E. Greveson, A. Shahrokni, P. H.S. Torr: Urban 3D Semantic Modelling Using Stereo Vision, IEEE
ICRA, 2013 (Chapter 4 )
 S. Sengupta*, J. Valentin*, J. Warrell, A. Shahrokni, P. H.S. Torr: Mesh Based Semantic Modelling for Indoor
and Outdoor Scenes, IEEE CVPR, 2013. ( *Joint first authors, Chapter 5.)
 S. Sengupta*, J. Valentin*, J. Warrell, A. Shahrokni, P. H.S. Torr: Mesh Based Semantic Modelling for Indoor
and Outdoor Scenes. SUNw: Scene Understanding Workshop. Held in conjunction with CVPR , 2013.
(*Joint first authors, Invited paper )
 Datasets
 Yotta Labeled road scene dataset.
 KITTI object labelling. (Datasets available at http://www.robots.ox.ac.uk/~tvg/projects )
 Other publications
 Z. Zhang, P. Sturgess, S. Sengupta, N. Crook, P. H.S. Torr: Efficient discriminative learning of parametric
nearest neighbor classifiers, IEEE CVPR, 2012
 L. Ladicky, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. F. Clocksin, P. H. S. Torr: Joint Optimization
for Object Class Segmentation and Dense Stereo Reconstruction. IJCV 2012 (Invited paper)
 L. Ladicky, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. F. Clocksin, P. H. S. Torr : Joint Optimisation
for Object Class Segmentation and Dense Stereo Reconstruction. BMVC 2010 (BMVA Best science paper )
Chap 1, Sec 1.4
 Multiple computer vision task modelled as labelling problem
 Assign a discrete set of sites a label from the set
 E.g. pixel associated with an object class label
The labelling problem
11
Chap 2, Sec 2.1
12
What are the Labels
 Discrete or continuous
 Discrete
 Image pixels assigned to object classes like Cars, humans, buildings, pavement,
trees etc.
 Foreground/background labels
 Indoor/outdoor labels…
 Continuous range
 Depth: Pixels can take a set of disparity labels
 Optical flow
Chap 2, Sec 2.1
13
CRF-Framework
 Set of random variables corresponding to each
pixel and the label set
 Aim is to associate every random variable with a label
 The conditional probability of the labelling x given the data D,
 Gibbs free energy is given as
 MAP labelling x*of the random field is defined by
},...,,{ 21 NxxxX 
Chap 2, Sec 2.2
14
• The pixel labelling problem can be formulated as an pair-
wise/higher-order CRF problem whose energy is
• The image is represented as a graph: G = {V,E}
• V is the total set of nodes of the graph
• Ni represents the neighbourhood of the node i
• The unary potential measures the cost of assigning
particular label to the pixel
• Generated using the result of a boosted classifier over a
region about each pixel
CRF modelling for image labelling
Chap 2, Sec 2.2
15
• The pairwise term or the smoothness term depends on
the inter-pixel observations, should be discontinuity preserving
across the object boundaries
• Takes Potts form
• where
• Higher order potentials defined on a group of pixels conditionally
dependant on each other.
• Robust PN, Hierarchical PN models [1]
• Final labelling obtained through minimising the Energy E
CRF modelling for image labelling
Chap 2, Sec 2.2
[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for
object class image segmentation,” in ICCV, 2009.
16
Quite hard
 The energy minimization is quite hard (large number of
random variables with interconnections).
 Possible solution – simulated annealing, ICM, but slow.
 Approximate algorithms exist for certain energy functions
for a multi-label problem.
 Move-making algorithms[1]
 α – expansion: for each α, allow the random variables to retain existing label or
change to the label α, using graph cuts.
 αβ swap: considers a pair of label at each iteration, such that all pixels change
their label from β to α though graph cuts.
Chap 2, Sec 2.2[1]Boykov et.al. Fast Approximate Energy Minimization via Graph Cuts, ICCV
Stereo
 Early attempts to explain depth begins in the renaissance
 Essentially the images subtended at the left and right eyes can
be used to obtain a disparity/depth map
17
Stereo sketch by Jacopo Chimenti da Empoli,
Italy , around 1600 AD
Leonardo da Vinci, Optical Studies
on Binocular vision
Chap 2, Sec 2.3
Depth from Sequence of images
18
 Structure from motion for sparse 3d reconstruction.[1]
 Visual hull/Silhouettes based volume carving[2]
 Elevation/Height/2.5D maps[3]
 Tsdf/Voxel based Fusion[4]
Chap 2, Sec 2.3
[1] Sameer A. et.al. Building rome in a day. Commun. ACM, 2011.
[2] Friedrich E. Al. Stixmentation - probabilistic stixel based traffic scene labeling. BMVC 12
[3] Y. Furukawa et.al. Carved visual hulls for image-based modeling. IJCV, 2009
[4] Richard N. et. al. Kinectfusion: Real-time dense surface mapping and tracking. In IEEE ISMAR 2011.
Dense Semantic Mapping
 Generate an overhead view of an urban region.
 Label every pixel in the Map View is associated with an
object class label
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
19
Chap 3, Sec 3.1
 Street images captured inexpensively from vehicle with
multiple mounted camera[1].
[1] Yotta. DCL, “Yotta dcl case studies,” Available: http://www.yottadcl.com/surveys/case-studies/
20
Dense Semantic Mapping
Semantic Mapping Framework
 Semantic mapping framework comprises of two stages
Street level Images
acquisition
21
Chap 3, Sec 3.3
Semantic Mapping Framework
 Semantic mapping framework comprises of two stages
 Semantic Image Segmentation at street level.
Street level Images
acquisition
Image
Segmentation
22
 Semantic mapping framework comprises of two stages
 Semantic Image Segmentation at street level.
 Ground Plane Labelling at a global level.
 First attempt to do an overhead mapping from street
level images.
Semantic Mapping Framework
Street level Images
acquisition
Image
Segmentation
Ground plane
labelling
23
Street-level Image Segmentation
 Label every pixels in the image with object class labels
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
Input Output
Raw Image Labelled Image
Automatic
Labeller
Object Class Labels
24
Chap 3, Sec 3.3.1
Street-level Image Segmentation
25
 CRF based image labeller
 Each pixel is a node in a grid graph G = (V,E).
 Each node is a random variable x taking a label from
label set.
CRF
construction
Final SegmentationInput Image
Semantic Image Segmentation - CRF
26
 Total energy
 Optimal labelling given as
 

Cc
cc
NjVi
jiij
Vi
ii
i
xxxE )(),()()(
,
xx 
Epix Epair
Eregion
 Total energy E = Epix + Epair + Eregion
 Epix - Model individual pixel’s cost of taking a label.
 Computed via the dense boosting approach
 Multi feature variant of texton boost[1]
Semantic Image Segmentation - CRF
27
x
Car 0.2
Road 0.3
[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for
object class image segmentation,” in ICCV, 2009.
 Total energy E = Epix + Epair + Eregion
 Epair - Model each pixel neighbourhood interactions.
 Encourages label consistency in adjacent pixels
 Sensitive to edges in images.
 Contrast sensitive Potts model
xi xj
CarCar
Road
0
g(i,j) Road
Semantic Image Segmentation - CRF
28
[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for
object class image segmentation,” in ICCV, 2009.
Epair
 Total energy E = Epix + Epair + Eregion
 Eregion - Model behaviour of a group of pixels.
 Classify a region
 Encourages all the pixels in a region
to take the same label.
 Group of pixels given by multiple meanshift segmentations
Semantic Image Segmentation - CRF
29
[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for
object class image segmentation,” in ICCV, 2009.
30
 Energy minimisation using alpha-expansion algorithm[1]
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
Input Image Road Expansion
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
30
Semantic Image Segmentation - CRF
31
Input Image Building Expansion
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
31
 Solved using alpha-expansion algorithm[1]
Semantic Image Segmentation - CRF
Input Image Sky Expansion
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 9932
32
 Solved using alpha-expansion algorithm[1]
Semantic Image Segmentation - CRF
Input Image Pavement Expansion
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 9933
33
 Solved using alpha-expansion algorithm[1]
Semantic Image Segmentation - CRF
Input Image Final solution
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 9934
34
 Solved using alpha-expansion algorithm[1]
Semantic Image Segmentation - CRF
Ground Plane Labelling
 Combine many labellings from street level imagery.
Automatic
Labeller
Output
Labelled Ground PlaneStreet Level
labellings
Input
35
Ground Plane CRF
 A CRF defined over the ground plane.
 Each ground plane pixel (zi) is a random variable taking a
label from the label set.
 Energy for ground plane CRF is
Z
36
g
pair
g
pix
g
EEZE )(
Chap 3, Sec 3.3.2
37
Ground Plane Pixel Cost
 We assume a flat world.
K
X
Z
37
Ground Plane Pixel Cost
Homography Road Pavement Post/Pole
K
X
Z
 A ground plane region is estimated.
38
38
• Each point in the image projects to a unique point on the
ground plane.
– Creating a homography
K
X
Z
Ground Plane Pixel Cost
Homography Road Pavement Post/Pole
39
39
• The image labelling is mapped to the ground plane
– via the homography.
K
X
Z
Ground Plane Pixel Cost
Ground plane
Pixel histograms
Homography Road Pavement Post/Pole
40
40
• Labels projected from many views are combined in a
histogram.
• The normalised histogram gives the naïve probability of
the ground plane pixel taking a label.
Ground Plane Pixel Cost
41
K
X
Z Ground plane
Pixel histogramsHomography Road Pavement Post/Pole
41
41
• Labels projected from many views are combined in a
histogram.
• The normalised histogram gives the naïve probability of
the ground plane pixel taking a label.
Ground Plane Pixel Cost
K
X
Z Ground plane
Pixel histogramsHomography Road Pavement Post/Pole
42 Chap 3, Sec 3.3.2
42
Ground Plane labelling
 Histogram is built for every ground plane pixel giving Eg
pix
 Pairwise cost (Eg
pair) added to induce smoothness
 Contrast sensitive potts model
Z
43
Ground Plane labelling
 Final CRF solution obtained using alpha expansion.
Void
44
Ground Plane labelling
Road expansion
 Final CRF solution obtained using alpha expansion.
45
Ground Plane labelling
Building expansion
46
 Final CRF solution obtained using alpha expansion.
Ground Plane labelling
Pavement expansion
47
 Final CRF solution obtained using alpha expansion.
Ground Plane Labelling
Final Solution
48
 Final CRF solution obtained using alpha expansion.
Experiments - Dataset
 Subset of the images captured by the van
 ~15 km of track, 8000 images from each camera.
 Pixel-level labelled ground truth images. Dataset
available[1].
 13 object categories –
 Training - 44 images, testing - 42 images.
[1] http://www.robots.ox.ac.uk/~tvg/projects/SemanticMap/index.php
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
49
Chap 3, Sec 3.4.1
SIS Results
 Input Images, output of our image level CRF, ground truths.
 Used Automatic Labelling environment[1]
[1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available
http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
50
Input
Semantic
segmentation
Ground Truth
Semantic Map Results
51
Semantic map of Pembroke city
Chap 3, Sec 3.4.2
Ground plane Map Evaluation
52
Street Images
Back-projected
Map results
Ground Truth
• We back-project the ground plane map into image domain
and evaluate the results.
• Global pixel accuracy of 83%
52
52
Results - video
53
Chapter Summary
 Presented a method to generate
overhead view semantic mapping.
 Experiments on large tracks (~15km)
which can be scaled up to country
wide mapping
 Dataset available[1].
 However a flat world assumption
does not represent the 3D scene
properly – our aim is to perform a
semantic metric reconstruction of
the world.
[1] http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php
54
Urban 3D Semantic Modelling Using Stereo Vision
55
[1]
Input Stereo image Sequence Dense 3D Semantic Model
 Given a sequence of stereo images we generate a
dense 3D, semantic model
Chap 4, Sec 4.1
Pipeline –Semantic Reconstruction
56
 Stereo images
Chap 4, Sec 4.3
Pipeline –Semantic Reconstruction
57
 Stereo images
 Camera pose estimation and individual depth map generation
Pipeline –Semantic Reconstruction
58
 Surface reconstruction
Pipeline –Semantic Reconstruction
59
 Semantic labelling of street view images
Pipeline –Semantic Reconstruction
60
 Semantic model generation
Camera Estimation
61
 Feature tracking using left-right pair and consecutive
frames
Chap 4, Sec 4.3.1
Camera Estimation
 Use the feature tracks to
estimate camera poses.
 Use bundle adjustment
[a]Andreas Geiger et. Al. Are we ready for Autonomous Driving? The KITTI Vision Benchmark
Suite CVPR 2012
62
Bundle Results
63
 Bundler results after 10, 20, 30 and 40 frames
Sparse Reconstructions
64
 But our target is to
obtain a large scale
dense 3D world
representation.
Depth-Map Estimation
 Semiglobal block matching[1] for disparity estimation
 Per-pixel depth computed as z = B × f / d
[1] H. Hirschmueller, Stereo Processing by Semi-Global Matching and Mutual Information. PAMI 2008.
B – Baseline
f - Focal Length
d – pixel disparity
65
Depth Fusion
 Depth estimates are fused using
camera poses.
 Fused into truncated signed
distance (TSDF) volumetric
representation[1].
 Surface mesh generated though
marching tetrahedra algorithm.
[1] Brian Curless and Marc Levoy, A Volumetric Method for Building
Complex Models from Range Images Siggraph 96.
Chap 4, Sec 4.3.2
66
Depth fusion using TSDF Volume [1]
 Entire space divided into grids of voxels.
 For each voxel compute the truncated signed distance.
 +ve increasing when it lies in the free space,
 -ve when it lies behind the surface
 zero when lies on the surface
 Performed for all depth maps.
[1] Brian Curless and Marc Levoy, A Volumetric Method for Building
Complex Models from Range Images Siggraph 96.
67
TSDF Volume
-.8
-.4 .1 .5 1
1 1
Camera
Actual
surfaceTSDF volume
68
TSDF Volume
-1 -.8 -.3 .2 .8 1 1 1
-1 -.9 -.4 .1 .5 1 1 1
-1 -1 -.8 -.2 .1 1 1 1
-1 -1 -.9 -.3 .2 .8 1 1
-1 -1 -.9 -.4 .3 .9 1 1
-1 -1 -.8 -.3 .3 .9 1 1
-1 -1 -.9 -.5 .2 .8 1 1
-1 -1 -.6 .1 .7 1 1 1
Camera
TSDF volume
Actual
surface
69
Fusing multiple depth maps
70
 Increased number of depth maps results in smooth
surface generation
Chap 4, Sec 4.3.2
Incremental Volume Update
 Road scenes are generally described
through arbitrarily long image sequence.
 3x3x1 volume of voxel grids initialised
71
Vehicle path ~1km
Incremental Volume Update
 Need to map large sequence
 3x3x1 volume of voxel grids initialised
 Incrementally add volume as the vehicle
moves out of the region
 Allows to map arbitrarily
long sequence
 Important for outdoor
scenes
72
Vehicle path ~1km
Large scale dense reconstruction
73
 Textured reconstruction.
Semantic Model Generation
 We use conditional random field framework (CRF)
74
• Each pixel is a node in a grid graph G = (V,E) having a random
variable x taking a label from label set.
• Total energy E = Epix + Epair + Eregion
• Epix - Model individual pixel’s cost of taking a label.
[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for
object class image segmentation,” in ICCV, 2009.
CRF construction[1] Image SegmentationInput Image
Chap 4, Sec 4.4.1
x
Fence 0.2
Road 0.3
Semantic Image Segmentation
 Epair- Model each pixels neighbourhood interaction.
 Encourages label consistency in
adjacent pixels and sensitive to edges.
 Contrast sensitive Potts model
 Both colour and depth images are used
 Eregion - Model behaviour of a group of pixels
 Groupings though superpixels
xi xj
Fence
Road
0
g(i,j)
Fence
Road
75
Epair
Semantic Image Segmentation - Results
 Input Images, output of our image level CRF, ground
truths.
76
Mesh Face Labelling
 A histogram of labels is
built for each mesh face
(Zf ), by projecting the
points from the face into
labelled images.
 Majority label is
considered as the label of
the face.
Chap 4, Sec 4.4.2
77
Semantic Model
Top: Left – Surface reconstruction, Right – Semantic model
Bottom: Left - input image, Right- object label set
78
Evaluation
 KITTI Object Labelled Datasets: Manually labelled images for object
class training (available for download). [1]
 The Model is projected back using the estimated camera poses to
create labelled images.
 The points in the model far away from the camera are ignored in
the projection.
[1] http://www.robots.ox.ac.uk/~tvg/projects/SemanticUrbanModelling/index.php Chap 4, Sec 4.5
79
Evaluation
 Metrics
 Recall = tp/(tp+fn)
 Intersection vs Union = tp/(tp+fn+fp)
80
Video
Long Sequence
82
 1km dense reconstruction overlaid on a google map.
Path of the vehicle.
Chapter Conclusion
 Large scale dense semantic reconstruction
 Sequential volume update for accommodating long sequences
 Labelled dataset released.
 Labelling performed in image level – results in semantic
inconsistency, redundant labelling and slow overall inference
process.
 Object layout in the scene helps in labelling
83
Chapter 5 - Mesh Based Scene Labelling
84
 Motivation
 Redundancy : Individual street level image labelling – 0.5m pixels
per image to process. (scene of 100-150 images ~ 75m pixels) : Slow
 Inconsistency in labelling
 Utilizing structure through mesh connectivity.
 Solution: Perform labelling on mesh
Chap 5, Sec 5.1
Mesh labelling Framework
85
 Depth maps fused into mesh.
 Every mesh location associated
with set of image pixels across a
set of images.
 Obtain a combined appearance
score from these pixels through
a depth sensitive fusion of
scores.
 Define CRF on mesh and
perform inference on the
structure. Mesh based labelling framework
CRF over Scene Mesh
86
 We use conditional random field framework (CRF) defined
over the mesh locations.
• Each mesh vertex is a node in a graph G = (V,E), where E is
defined according to mesh neighbourhood.
• Each node is a random variable x taking a label from label set.
Chap 5, Sec 5.3
Unary Score
87
 Total energy
 Pixel class-wise classifier score given as , which are
combined as:
 ‘f’ can take ‘max’, ‘average’ or ‘weighted’.
 ‘weighted’ – weigh inversely the class scores by 3D distance of
the pixel from respective camera centre.
xi
Image pixel set from K
images (Registration)
vertex
:=
Chap 5, Sec 5.3.1
 Pairwise defined on the mesh connectivity.
 Takes the form of potts
 , with Zi and Zj are the 3D
locations of the mesh vertex i and j .
 Thus the mesh location close to each other are encouraged to take
same labels.
Pairwise
88
Experiments and results
89
 Mesh segmentation
with the corresponding
images of the scene
Chap 5, Sec 5.4
Results - video
90
Evaluation
91
 Created ground truth mesh for evaluation [1].
[1] http://www.robots.ox.ac.uk/~tvg/projects/
Observations
92
 Improved accuracy for mesh based inference over image
based labelling and projecting the labels
 The pairwise connection respecting mesh connectivity
improves labelling
Ground Truth Unary only Unary + Pair
Image
Timing performance
93
 Labelling over mesh improves performance in inference
stage.
 Scene of 150 images of resulotion 1281x376 ≅ 75𝑚𝑙𝑛
 Mesh 704K vertex and 1.27m faces
 25x speedup in inference at our operating point
 Further speedup possible by computing classifier
response only for registered pixels to mesh.
Inference Time with varying mesh size
94
 Mesh created for the same scene with finer granularity.
 Note –ground truth mesh generated for each granularity
 Varying mesh granularity makes smaller sized mesh face
and has effect on pairwise cost
Accuracy with varying mesh granularity
95
Scene editing
96
 Labelling in 3D structure can help to categorize the 3D
regions.
 Some active scene editing ,e.g. vehicle moving on the
road.
Chap 5, Sec 5.4
Scene edit - dynamic
97
Chapter Conclusions
98
 Present a mesh based inference for scene labelling.
 Inference on mesh provides an accurate and faster approach
towards scene labelling.
 Presented a classifier score combination method which
improves accuracy.
 Upto 25x faster in inference stage for outdoor scenes.
 Applications – scene editing can be performed once scene is
labelled.
 However the mesh representation is limiting for various
robotic tasks, which we try to overcome in next chapter.
Chapter 6 - Hierarchical CRF on an Octree Graph
99
 Computer vision – attempts to recognise scene has been studied
exhaustively.
 Robotics – efficient/accurate 3D representation of scene for
various robotic tasks, but little for understanding semantics.
 Aim - Join the two hands towards recognition in an efficient
representation, and present a method which
 Performs jointly recognition and infers occupancy.
 Uses hierarchal constraints to perform scene labelling
 Uses an efficient 3D representation for determining occupied, free and
unknown area.
Chap 6, Sec 6.1
Good 3D representation
100
 Why
 Needed for further processing tasks
 Robotics domain – mapping, grasping/manipulation, navigation
 Graphics domain – efficient rendering over graphics processing unit and
visualization
 What
 Should map accurately
 Occupied: Objects present in the world,
 Free: required for collision avoidance, path planning.
 Unmapped: unknown areas in the scene need to be avoided.
 Efficiency: Any 3D volume requires to be identified as
free/occupied/unmapped efficiently.
Existing 3d representation
101
 Storing 3D measurements from sensors through point clouds
– cannot map free and unknown area 
 Mesh – same limitations as pt. clouds 
 Stixels/Height maps/2.5D : one height value in a 2D grid, but
free area not accurately mapped 
 Fixed sized grid of voxels: Voxels not indexed which makes it
inefficient 
 Octree based volumetric representation – Introduced more
than three decade back, represents accurately 3d space,
efficient indexing of volume 
Octomap - representation
102
 Octree representatation
 Every voxels/volume divided into 8 subvolume, allowing fast
indexing of voxels
 Advantageous in comparison to point clouds, surface maps,
elevation/2.5d representations
 Used widely across computer science
 Hardware friendly (cpu, gpu, fpga)
 Octomap [a] proposed in 2013
 Probabilistic representation of occupied, free and unknown regions
 Based on octree based 3d representation
 Demonstrated to map large areas though fusion of depth estimates.
[a] O Armin Hornung, ctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots, 2013.
Multi-resolution approaches in Computer vision
103
 Multi-resolution approach used for recognition,
classification detection
 Information at pixel level, pair of pixels or group of pixels
combined together
 Robust PN model [1] - penalised label inconsistency over a
group of pixels.
 Grouping determined through unsupervised image segmentation
 Here we extend the multi-resolution image based
classification approach to 3D volume indexed through an
octree
[1], P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency
Semantic Octree - framework
104
 Input stereo images
Chap 6, Sec 6.3
Semantic Octree - framework
105
 Generate point clouds and class hypothesis for every pixel
Chap 6, Sec 6.3
Semantic Octree - framework
106
 Fuse into an octree through estimated camera
 Octree – each volume subdivided in 8 sub-volumes
 Leaf- nodes (xi) are the smallest sized voxels
 Any internal node (xc) gives a natural grouping of 3D
space
Chap 6, Sec 6.3
 Perform inference over 3D voxels to give labelled scene.
Semantic Octree - framework
107
Chap 6, Sec 6.3
CRF graph on Octree voxels
 Octree divides the space into subvolumes indexed through tree
with nodes
 τint : Internal nodes in the tree (xc)
 τleaf : leaf level voxels (xi)
 Random variable for every leaf voxel
 Every internal node is associated with a set of leaf voxels
resulting in a clique
 Label set defined as
 Final energy :
108
Chap 6, Sec 6.3
 Octree Volume update
 All voxels initially set unknown and occupancy probability P(xi) = 0.5 and
log odds
 For each 3D point (obtained from stereo pairs), voxels’ log odds updated in
a ray casting manner
 Log odds are updated for all 3D points for every stereo pairs
 Final occupancy probability obtained as
Unary score for leaf voxels
109
Chap 6, Sec 6.3.1
Unary score for leaf voxels
 Each occupied voxel xi is associated with a set of 3D pts
 The corresponding image pixels denoted as
 Pixel scores combined together
 Given the initial occupancy P(xi), the unary is given as:
 Thus, for every initially estimated occupied voxels have low cost for
free label and vice verca
110
Chap 6, Sec 6.3.1
Hierarchical tree potential
 Robust PN potential applied over hierarchical groupings of voxels
 Penalise label inconsistency within the grouping of voxels
 Takes the form
 Maximum cost truncated to ϒmax
 Grouping of voxels correspond to internals nodes in the octree
111
Chap 6, Sec 6.3.2
Experiments
112
 Octree defined of 16 levels
 Smallest resolution of voxels = (8x8x8)cm3
 Maximum mapped volume (216 x 8 )3cm ~ 5.243 km3
 Hierarchical grouping of voxels corresponding to internal nodes
13-15 considered
Results
113
 Higherarchial grouping while inference vs leaf level voxel
labelling (much sparser)
Chap 6, Sec 6.4
 Quantitative evaluation :
 Performed by projecting into image domain
 Observations
 Small objects tend to get decimated due to octree quantization hence reduced
accuracy
 Mesh based representation better in representing surface.
 Non-uniform Grouping of volumes (k-d tree) can be used to improve results
Results
114
Occupancy mapping
115
 Grouping of voxels hierarchically increases the occupied
volume reducing the sparsity
Chapter Conclusion
116
 A method to infer jointly object class labels and
occupancy mapping proposed
 Efficient representation of 3D space for further
operations like navigation and manipulation
 Octree poses a quantization error which can be
approached through grouping of volumes through k-d
tree
Thesis - Conclusions
117
 This thesis covered the aspects of scene understanding
and proposed solutions for dense semantic mapping and
reconstruction
 Chapter 3 – Large scale Dense semantic mapping
 Overhead semantic view of an urban
region
 Experiments to generate ~15km map
 One of the first large scale semantic map
 Presented as oral in IEEE IROS 2012
Chap 7, Sec 7.1
Thesis - Conclusions
118
 Chapter 4 – Dense semantic reconstruction
 Dense semantic reconstruction from kms of
stereo images.
 Online volumetric reconstruction to
accommodate arbitrarily long road scenes.
 Presented as oral in IEEE ICRA 2013
 Chapter 5 – Mesh based inference for scene labelling
 Improved labelling accuracy (pairwise connections
respect mesh connectivity) and consistency.
 Depth sensitive classifier fusion.
 25x faster in inference time
 Presented as poster in CVPR 2013
Conclusions
119
 Chapter 6 – Hierarchical CRF on an Octree Graph
 Unified framework to determine 3D
volume occupancy and with object class
labels in the scene.
 Efficient representation
 Robust PN potential over octree volumes
 Datasets (available publicly)
 Yotta labelled dataset: multiview street images (urban, rural,
highway) containing 8000+ images, with object class labellings
 Kitti Labelled dataset: Object class labelling for publicly available
KITTI dataset
Way forward
120
 Transfer learning – so many datasets with so many labellings. Should aim to
learn from multiple source and apply in test cases.
 Life long learning – an agent needs to identify the object irrespective of
changes in environment
 Exploit High level attributes
 Need to investigate for an end-to-end real-time pipeline for dense
recognition, reconstruction
 Exploit scene dynamics – DVS (dynamic vision systems) give only modified
pixels through efficient sensors.
Chap 7, sec 7.2
Thank you
121
 Acknowledgements
 Supervisors: Philip Torr and David Duce
 Thesis Examiners: Gabriel Brostow and Nigel Crook
 Collaborators: Paul Sturgess, Lubor Ladicky, Ali Shahrokni, Eric
Greeveson, Julien Valentin, Ziming Zhang, Johnathan Warrell, Chris
Russell, Yalin Bastanlar, William Clocksin, Vibhav Vineet, Mike Sapi.
References
122
 Lubor Ladicky et. al. Associative hierarchical crfs for object class image
segmentation. ICCV, 2009, PAM13
 Pushmeet Kohli et. Al Robust Higher Order Potentials for Enforcing Label
Consistency, IJCV 09
 Paul Sturgess et. Al. Combining Appearance and Structure from Motion
Features for Road Scene Understanding, BMVC 09
 Lubor Ladicky et. al. Joint optimisation for object class segmentation and
dense stereo reconstruction. BMVC, 2010, IJCV 12
 Richard A. Newcombe et. al. Kinectfusion: Real-time dense surface mapping
and tracking. In IEEE ISMAR 2011.
123

Más contenido relacionado

La actualidad más candente

3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an ObjectAnkur Tyagi
 
Build Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionBuild Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionDouglas Lanman
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesDouglas Lanman
 
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View SynthesisNeural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View SynthesisVincent Sitzmann
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learningYu Huang
 
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...Sergio Orts-Escolano
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachUniversitat de Barcelona
 
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015) Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015) Konrad Wenzel
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsvirtualcitySYSTEMS GmbH
 
Build Your Own 3D Scanner: The Mathematics of 3D Triangulation
Build Your Own 3D Scanner: The Mathematics of 3D TriangulationBuild Your Own 3D Scanner: The Mathematics of 3D Triangulation
Build Your Own 3D Scanner: The Mathematics of 3D TriangulationDouglas Lanman
 
Structure and Motion - 3D Reconstruction of Cameras and Structure
Structure and Motion - 3D Reconstruction of Cameras and StructureStructure and Motion - 3D Reconstruction of Cameras and Structure
Structure and Motion - 3D Reconstruction of Cameras and StructureGiovanni Murru
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learningYu Huang
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM SeminarDong-Won Shin
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 

La actualidad más candente (20)

3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
 
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
 
Build Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface ReconstructionBuild Your Own 3D Scanner: Surface Reconstruction
Build Your Own 3D Scanner: Surface Reconstruction
 
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-PlanesBuild Your Own 3D Scanner: 3D Scanning with Swept-Planes
Build Your Own 3D Scanner: 3D Scanning with Swept-Planes
 
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View SynthesisNeural Scene Representation & Rendering: Introduction to Novel View Synthesis
Neural Scene Representation & Rendering: Introduction to Novel View Synthesis
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
 
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
A Three-Dimensional Representation method for Noisy Point Clouds based on Gro...
 
Convolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approachConvolutional Patch Representations for Image Retrieval An unsupervised approach
Convolutional Patch Representations for Image Retrieval An unsupervised approach
 
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015) Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
Dense Image Matching - Challenges and Potentials (Keynote 3D-ARCH 2015)
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developments
 
Orb feature by nitin
Orb feature by nitinOrb feature by nitin
Orb feature by nitin
 
Survey 1 (project overview)
Survey 1 (project overview)Survey 1 (project overview)
Survey 1 (project overview)
 
Build Your Own 3D Scanner: The Mathematics of 3D Triangulation
Build Your Own 3D Scanner: The Mathematics of 3D TriangulationBuild Your Own 3D Scanner: The Mathematics of 3D Triangulation
Build Your Own 3D Scanner: The Mathematics of 3D Triangulation
 
Structure and Motion - 3D Reconstruction of Cameras and Structure
Structure and Motion - 3D Reconstruction of Cameras and StructureStructure and Motion - 3D Reconstruction of Cameras and Structure
Structure and Motion - 3D Reconstruction of Cameras and Structure
 
Anchor free object detection by deep learning
Anchor free object detection by deep learningAnchor free object detection by deep learning
Anchor free object detection by deep learning
 
RWDA
RWDARWDA
RWDA
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM Seminar
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 

Destacado

Mapping of one model into other model
Mapping of one model into other modelMapping of one model into other model
Mapping of one model into other modelratikaagarwal
 
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...icwe2015
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Yi-Hsuan Tsai
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Universitat Politècnica de Catalunya
 
Cognitive Mapping
Cognitive MappingCognitive Mapping
Cognitive MappingIva Ivanova
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSungjoon Choi
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentationMatthew Opala
 
The 7 Stage Brain Based Learning Lesson Planning
The 7 Stage Brain Based Learning Lesson PlanningThe 7 Stage Brain Based Learning Lesson Planning
The 7 Stage Brain Based Learning Lesson PlanningKaren Brooks
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingYosuke Shinya
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning introbeamandrew
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationNamHyuk Ahn
 

Destacado (20)

cognitive mapping
cognitive mapping cognitive mapping
cognitive mapping
 
Morphing Image
Morphing Image Morphing Image
Morphing Image
 
Mapping of one model into other model
Mapping of one model into other modelMapping of one model into other model
Mapping of one model into other model
 
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
(Semantic Web Technologies and Applications track) "A Quantitative Comparison...
 
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)Semantic-Aware Sky Replacement (SIGGRAPH 2016)
Semantic-Aware Sky Replacement (SIGGRAPH 2016)
 
Improving Spatial Codification in Semantic Segmentation
Improving Spatial Codification in Semantic SegmentationImproving Spatial Codification in Semantic Segmentation
Improving Spatial Codification in Semantic Segmentation
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
 
Cognitive mapping
Cognitive mappingCognitive mapping
Cognitive mapping
 
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)
 
crfasrnn_presentation
crfasrnn_presentationcrfasrnn_presentation
crfasrnn_presentation
 
Cognitive Mapping
Cognitive MappingCognitive Mapping
Cognitive Mapping
 
Semantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep LearningSemantic Segmentation Methods using Deep Learning
Semantic Segmentation Methods using Deep Learning
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation#6 PyData Warsaw: Deep learning for image segmentation
#6 PyData Warsaw: Deep learning for image segmentation
 
The 7 Stage Brain Based Learning Lesson Planning
The 7 Stage Brain Based Learning Lesson PlanningThe 7 Stage Brain Based Learning Lesson Planning
The 7 Stage Brain Based Learning Lesson Planning
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene Understanding
 
Deep learning intro
Deep learning introDeep learning intro
Deep learning intro
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Semantic segmentation
Semantic segmentationSemantic segmentation
Semantic segmentation
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)Deep Learning for Computer Vision: Segmentation (UPC 2016)
Deep Learning for Computer Vision: Segmentation (UPC 2016)
 

Similar a Semantic Mapping of Road Scenes

3680-NoCA.pptx
3680-NoCA.pptx3680-NoCA.pptx
3680-NoCA.pptxgrssieee
 
Image compression using fractal functions
Image compression using fractal functionsImage compression using fractal functions
Image compression using fractal functionskanimozhirajasekaren
 
Building extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniquesBuilding extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniqueseSAT Journals
 
Building extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniquesBuilding extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniqueseSAT Publishing House
 
A Review over Different Blur Detection Techniques in Image Processing
A Review over Different Blur Detection Techniques in Image ProcessingA Review over Different Blur Detection Techniques in Image Processing
A Review over Different Blur Detection Techniques in Image Processingpaperpublications3
 
IJARCCE 22
IJARCCE 22IJARCCE 22
IJARCCE 22Prasad K
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...ijcsa
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...paperpublications3
 
INFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSION
INFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSIONINFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSION
INFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSIONIJCI JOURNAL
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfSofianeHassine2
 
Single image haze removal
Single image haze removalSingle image haze removal
Single image haze removalMohsinGhazi2
 
Unsupervised Building Extraction from High Resolution Satellite Images Irresp...
Unsupervised Building Extraction from High Resolution Satellite Images Irresp...Unsupervised Building Extraction from High Resolution Satellite Images Irresp...
Unsupervised Building Extraction from High Resolution Satellite Images Irresp...CSCJournals
 
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...CSCJournals
 
REMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESIS
REMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESISREMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESIS
REMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESISIJCSEA Journal
 
An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...IJECEIAES
 

Similar a Semantic Mapping of Road Scenes (20)

3680-NoCA.pptx
3680-NoCA.pptx3680-NoCA.pptx
3680-NoCA.pptx
 
Image compression using fractal functions
Image compression using fractal functionsImage compression using fractal functions
Image compression using fractal functions
 
Ku2518881893
Ku2518881893Ku2518881893
Ku2518881893
 
Ku2518881893
Ku2518881893Ku2518881893
Ku2518881893
 
Building extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniquesBuilding extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniques
 
Building extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniquesBuilding extraction from remote sensing imageries by data fusion techniques
Building extraction from remote sensing imageries by data fusion techniques
 
A Review over Different Blur Detection Techniques in Image Processing
A Review over Different Blur Detection Techniques in Image ProcessingA Review over Different Blur Detection Techniques in Image Processing
A Review over Different Blur Detection Techniques in Image Processing
 
9.venkata naga vamsi. a
9.venkata naga vamsi. a9.venkata naga vamsi. a
9.venkata naga vamsi. a
 
IJARCCE 22
IJARCCE 22IJARCCE 22
IJARCCE 22
 
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
Automatic 3D view Generation from a Single 2D Image for both Indoor and Outdo...
 
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
NUMBER PLATE IMAGE DETECTION FOR FAST MOTION VEHICLES USING BLUR KERNEL ESTIM...
 
INFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSION
INFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSIONINFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSION
INFORMATION SATURATION IN MULTISPECTRAL PIXEL LEVEL IMAGE FUSION
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdf
 
Single image haze removal
Single image haze removalSingle image haze removal
Single image haze removal
 
Unsupervised Building Extraction from High Resolution Satellite Images Irresp...
Unsupervised Building Extraction from High Resolution Satellite Images Irresp...Unsupervised Building Extraction from High Resolution Satellite Images Irresp...
Unsupervised Building Extraction from High Resolution Satellite Images Irresp...
 
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...
Image Segmentation from RGBD Images by 3D Point Cloud Attributes and High-Lev...
 
REMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESIS
REMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESISREMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESIS
REMOVING OCCLUSION IN IMAGES USING SPARSE PROCESSING AND TEXTURE SYNTHESIS
 
427lects
427lects427lects
427lects
 
Lw3620362041
Lw3620362041Lw3620362041
Lw3620362041
 
An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...An effective RGB color selection for complex 3D object structure in scene gra...
An effective RGB color selection for complex 3D object structure in scene gra...
 

Último

300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 

Último (20)

300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 

Semantic Mapping of Road Scenes

  • 1. P H D T H ES I S D E F E N C E S U N A N D O S E N G U P TA OX FO R D B RO O K ES U N I V E RS I T Y Semantic Mapping of Road Scenes 1 Supervisors – Prof. Philip Torr and Prof. David Duce 16/06/2014
  • 2. Outline  Introduction  The Labelling problem  Dense Semantic Map (chap. 3)  Dense 3D Semantic Modelling (chap. 4)  Mesh Based Inference (chap. 5)  Hierarchical CRF on an Octree Graph (chap. 6)  Conclusion 2
  • 3. Objective  Holy grail of computer vision  What are the objects present in the scene  Where are they located  Biological vision performs these two activities through human visual perception.  Computers ( or humans through them) try to solve the same issue through an information processing route.  Gather sensor data (images, gps, imu,…)  Represent them into a map  Recognise objects in the map  This thesis aims to look in this very problem and propose solution towards addressing it. 3 Can happen simultaneously or sequentially Chap 1, Sec 1.2
  • 4. Objective - Visually  Input image of a street scene, person cleaning, some cars in the background, and buildings in the horizon.  Place the appropriate objects at right distance from camera in correct size. 4 Chap 1, Sec 1.2 Image courtesy: Antonio Torallba, http://6.869.csail.mit.edu/fa13/
  • 5. Why it is important 5  Numerous applications from robotics, entertainment, engineering, medical…  Self driving cars  Engineering  Robots for manipulation  Humanoids  Assistive vision for impaired  Entertainment  Aim for a vision based system to produce a semantically consistent scene from visual inputs Chap 1, Sec 1.2
  • 6. Essentially a hard problem 6  Large variation in the image formulation  Scene Variation  Varying scene type and geometry  Object level variation  Large number of object classes  Individual Object location and orientation  Object shape and appearance  Depth/occlusions  Illumination  Shadows  Motion blur Chap 1, Sec 1.2
  • 7. Thesis - Contributions 7  This thesis provides solutions for large scale outdoor urban semantic mapping.  Large scale Dense overhead semantic mapping.  Semantic from local images fused to form a global ground plane map  First attempt to generate such map.  ~15km of semantic mapping  One of the first large scale semantic map  Presented as oral in IEEE IROS 2012 Chap 1, Sec 1.3
  • 8. Thesis - Contributions 8  Dense semantic reconstruction  Dense 3D semantic reconstruction from kms of stereo images.  Online sequential volumetric reconstruction to accommodate arbitrarily long road scenes.  Presented as oral in IEEE ICRA 2013.  Mesh based inference for scene labelling  Improved labelling accuracy and consistency.  Depth sensitive classifier fusion.  25x faster in inference time (than image labelling).  Presented as poster in CVPR 2013. Chap 1, Sec 1.3
  • 9. Thesis - Contributions 9  Hierarchical CRF on an Octree Graph  Unified framework to determine free and occupied regions in a scene along with object class labels.  Robust PN potential over octree volumes  Datasets (available online)  Yotta labelled dataset: multiview street images (urban, rural, highway) containing 8000+ images, with object class labellings  Kitti Labelled dataset: Object class labelling for publicly available KITTI dataset Chap 1, Sec 1.3
  • 10. Publications 10  Related to Thesis  S. Sengupta, P. Sturgess, L. Ladicky, P. H. S. Torr: Automatic dense visual semantic mapping from street- level imagery. IEEE/RSJ IROS 2012 (Chapter 3 )  S. Sengupta, E. Greveson, A. Shahrokni, P. H.S. Torr: Urban 3D Semantic Modelling Using Stereo Vision, IEEE ICRA, 2013 (Chapter 4 )  S. Sengupta*, J. Valentin*, J. Warrell, A. Shahrokni, P. H.S. Torr: Mesh Based Semantic Modelling for Indoor and Outdoor Scenes, IEEE CVPR, 2013. ( *Joint first authors, Chapter 5.)  S. Sengupta*, J. Valentin*, J. Warrell, A. Shahrokni, P. H.S. Torr: Mesh Based Semantic Modelling for Indoor and Outdoor Scenes. SUNw: Scene Understanding Workshop. Held in conjunction with CVPR , 2013. (*Joint first authors, Invited paper )  Datasets  Yotta Labeled road scene dataset.  KITTI object labelling. (Datasets available at http://www.robots.ox.ac.uk/~tvg/projects )  Other publications  Z. Zhang, P. Sturgess, S. Sengupta, N. Crook, P. H.S. Torr: Efficient discriminative learning of parametric nearest neighbor classifiers, IEEE CVPR, 2012  L. Ladicky, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. F. Clocksin, P. H. S. Torr: Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction. IJCV 2012 (Invited paper)  L. Ladicky, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. F. Clocksin, P. H. S. Torr : Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction. BMVC 2010 (BMVA Best science paper ) Chap 1, Sec 1.4
  • 11.  Multiple computer vision task modelled as labelling problem  Assign a discrete set of sites a label from the set  E.g. pixel associated with an object class label The labelling problem 11 Chap 2, Sec 2.1
  • 12. 12 What are the Labels  Discrete or continuous  Discrete  Image pixels assigned to object classes like Cars, humans, buildings, pavement, trees etc.  Foreground/background labels  Indoor/outdoor labels…  Continuous range  Depth: Pixels can take a set of disparity labels  Optical flow Chap 2, Sec 2.1
  • 13. 13 CRF-Framework  Set of random variables corresponding to each pixel and the label set  Aim is to associate every random variable with a label  The conditional probability of the labelling x given the data D,  Gibbs free energy is given as  MAP labelling x*of the random field is defined by },...,,{ 21 NxxxX  Chap 2, Sec 2.2
  • 14. 14 • The pixel labelling problem can be formulated as an pair- wise/higher-order CRF problem whose energy is • The image is represented as a graph: G = {V,E} • V is the total set of nodes of the graph • Ni represents the neighbourhood of the node i • The unary potential measures the cost of assigning particular label to the pixel • Generated using the result of a boosted classifier over a region about each pixel CRF modelling for image labelling Chap 2, Sec 2.2
  • 15. 15 • The pairwise term or the smoothness term depends on the inter-pixel observations, should be discontinuity preserving across the object boundaries • Takes Potts form • where • Higher order potentials defined on a group of pixels conditionally dependant on each other. • Robust PN, Hierarchical PN models [1] • Final labelling obtained through minimising the Energy E CRF modelling for image labelling Chap 2, Sec 2.2 [1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.
  • 16. 16 Quite hard  The energy minimization is quite hard (large number of random variables with interconnections).  Possible solution – simulated annealing, ICM, but slow.  Approximate algorithms exist for certain energy functions for a multi-label problem.  Move-making algorithms[1]  α – expansion: for each α, allow the random variables to retain existing label or change to the label α, using graph cuts.  αβ swap: considers a pair of label at each iteration, such that all pixels change their label from β to α though graph cuts. Chap 2, Sec 2.2[1]Boykov et.al. Fast Approximate Energy Minimization via Graph Cuts, ICCV
  • 17. Stereo  Early attempts to explain depth begins in the renaissance  Essentially the images subtended at the left and right eyes can be used to obtain a disparity/depth map 17 Stereo sketch by Jacopo Chimenti da Empoli, Italy , around 1600 AD Leonardo da Vinci, Optical Studies on Binocular vision Chap 2, Sec 2.3
  • 18. Depth from Sequence of images 18  Structure from motion for sparse 3d reconstruction.[1]  Visual hull/Silhouettes based volume carving[2]  Elevation/Height/2.5D maps[3]  Tsdf/Voxel based Fusion[4] Chap 2, Sec 2.3 [1] Sameer A. et.al. Building rome in a day. Commun. ACM, 2011. [2] Friedrich E. Al. Stixmentation - probabilistic stixel based traffic scene labeling. BMVC 12 [3] Y. Furukawa et.al. Carved visual hulls for image-based modeling. IJCV, 2009 [4] Richard N. et. al. Kinectfusion: Real-time dense surface mapping and tracking. In IEEE ISMAR 2011.
  • 19. Dense Semantic Mapping  Generate an overhead view of an urban region.  Label every pixel in the Map View is associated with an object class label BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post 19 Chap 3, Sec 3.1
  • 20.  Street images captured inexpensively from vehicle with multiple mounted camera[1]. [1] Yotta. DCL, “Yotta dcl case studies,” Available: http://www.yottadcl.com/surveys/case-studies/ 20 Dense Semantic Mapping
  • 21. Semantic Mapping Framework  Semantic mapping framework comprises of two stages Street level Images acquisition 21 Chap 3, Sec 3.3
  • 22. Semantic Mapping Framework  Semantic mapping framework comprises of two stages  Semantic Image Segmentation at street level. Street level Images acquisition Image Segmentation 22
  • 23.  Semantic mapping framework comprises of two stages  Semantic Image Segmentation at street level.  Ground Plane Labelling at a global level.  First attempt to do an overhead mapping from street level images. Semantic Mapping Framework Street level Images acquisition Image Segmentation Ground plane labelling 23
  • 24. Street-level Image Segmentation  Label every pixels in the image with object class labels BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post Input Output Raw Image Labelled Image Automatic Labeller Object Class Labels 24 Chap 3, Sec 3.3.1
  • 25. Street-level Image Segmentation 25  CRF based image labeller  Each pixel is a node in a grid graph G = (V,E).  Each node is a random variable x taking a label from label set. CRF construction Final SegmentationInput Image
  • 26. Semantic Image Segmentation - CRF 26  Total energy  Optimal labelling given as    Cc cc NjVi jiij Vi ii i xxxE )(),()()( , xx  Epix Epair Eregion
  • 27.  Total energy E = Epix + Epair + Eregion  Epix - Model individual pixel’s cost of taking a label.  Computed via the dense boosting approach  Multi feature variant of texton boost[1] Semantic Image Segmentation - CRF 27 x Car 0.2 Road 0.3 [1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.
  • 28.  Total energy E = Epix + Epair + Eregion  Epair - Model each pixel neighbourhood interactions.  Encourages label consistency in adjacent pixels  Sensitive to edges in images.  Contrast sensitive Potts model xi xj CarCar Road 0 g(i,j) Road Semantic Image Segmentation - CRF 28 [1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009. Epair
  • 29.  Total energy E = Epix + Epair + Eregion  Eregion - Model behaviour of a group of pixels.  Classify a region  Encourages all the pixels in a region to take the same label.  Group of pixels given by multiple meanshift segmentations Semantic Image Segmentation - CRF 29 [1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.
  • 30. 30  Energy minimisation using alpha-expansion algorithm[1] BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post Input Image Road Expansion [1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99 30 Semantic Image Segmentation - CRF
  • 31. 31 Input Image Building Expansion BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post [1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99 31  Solved using alpha-expansion algorithm[1] Semantic Image Segmentation - CRF
  • 32. Input Image Sky Expansion BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post [1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 9932 32  Solved using alpha-expansion algorithm[1] Semantic Image Segmentation - CRF
  • 33. Input Image Pavement Expansion BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post [1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 9933 33  Solved using alpha-expansion algorithm[1] Semantic Image Segmentation - CRF
  • 34. Input Image Final solution BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post [1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 9934 34  Solved using alpha-expansion algorithm[1] Semantic Image Segmentation - CRF
  • 35. Ground Plane Labelling  Combine many labellings from street level imagery. Automatic Labeller Output Labelled Ground PlaneStreet Level labellings Input 35
  • 36. Ground Plane CRF  A CRF defined over the ground plane.  Each ground plane pixel (zi) is a random variable taking a label from the label set.  Energy for ground plane CRF is Z 36 g pair g pix g EEZE )( Chap 3, Sec 3.3.2
  • 37. 37 Ground Plane Pixel Cost  We assume a flat world. K X Z 37
  • 38. Ground Plane Pixel Cost Homography Road Pavement Post/Pole K X Z  A ground plane region is estimated. 38 38
  • 39. • Each point in the image projects to a unique point on the ground plane. – Creating a homography K X Z Ground Plane Pixel Cost Homography Road Pavement Post/Pole 39 39
  • 40. • The image labelling is mapped to the ground plane – via the homography. K X Z Ground Plane Pixel Cost Ground plane Pixel histograms Homography Road Pavement Post/Pole 40 40
  • 41. • Labels projected from many views are combined in a histogram. • The normalised histogram gives the naïve probability of the ground plane pixel taking a label. Ground Plane Pixel Cost 41 K X Z Ground plane Pixel histogramsHomography Road Pavement Post/Pole 41 41
  • 42. • Labels projected from many views are combined in a histogram. • The normalised histogram gives the naïve probability of the ground plane pixel taking a label. Ground Plane Pixel Cost K X Z Ground plane Pixel histogramsHomography Road Pavement Post/Pole 42 Chap 3, Sec 3.3.2 42
  • 43. Ground Plane labelling  Histogram is built for every ground plane pixel giving Eg pix  Pairwise cost (Eg pair) added to induce smoothness  Contrast sensitive potts model Z 43
  • 44. Ground Plane labelling  Final CRF solution obtained using alpha expansion. Void 44
  • 45. Ground Plane labelling Road expansion  Final CRF solution obtained using alpha expansion. 45
  • 46. Ground Plane labelling Building expansion 46  Final CRF solution obtained using alpha expansion.
  • 47. Ground Plane labelling Pavement expansion 47  Final CRF solution obtained using alpha expansion.
  • 48. Ground Plane Labelling Final Solution 48  Final CRF solution obtained using alpha expansion.
  • 49. Experiments - Dataset  Subset of the images captured by the van  ~15 km of track, 8000 images from each camera.  Pixel-level labelled ground truth images. Dataset available[1].  13 object categories –  Training - 44 images, testing - 42 images. [1] http://www.robots.ox.ac.uk/~tvg/projects/SemanticMap/index.php BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post 49 Chap 3, Sec 3.4.1
  • 50. SIS Results  Input Images, output of our image level CRF, ground truths.  Used Automatic Labelling environment[1] [1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm BuildingRoadTreeVegetation FenceSignage SkyPavement Car Pedestrian Bollard Shop Sign Post 50 Input Semantic segmentation Ground Truth
  • 51. Semantic Map Results 51 Semantic map of Pembroke city Chap 3, Sec 3.4.2
  • 52. Ground plane Map Evaluation 52 Street Images Back-projected Map results Ground Truth • We back-project the ground plane map into image domain and evaluate the results. • Global pixel accuracy of 83% 52 52
  • 54. Chapter Summary  Presented a method to generate overhead view semantic mapping.  Experiments on large tracks (~15km) which can be scaled up to country wide mapping  Dataset available[1].  However a flat world assumption does not represent the 3D scene properly – our aim is to perform a semantic metric reconstruction of the world. [1] http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php 54
  • 55. Urban 3D Semantic Modelling Using Stereo Vision 55 [1] Input Stereo image Sequence Dense 3D Semantic Model  Given a sequence of stereo images we generate a dense 3D, semantic model Chap 4, Sec 4.1
  • 56. Pipeline –Semantic Reconstruction 56  Stereo images Chap 4, Sec 4.3
  • 57. Pipeline –Semantic Reconstruction 57  Stereo images  Camera pose estimation and individual depth map generation
  • 59. Pipeline –Semantic Reconstruction 59  Semantic labelling of street view images
  • 60. Pipeline –Semantic Reconstruction 60  Semantic model generation
  • 61. Camera Estimation 61  Feature tracking using left-right pair and consecutive frames Chap 4, Sec 4.3.1
  • 62. Camera Estimation  Use the feature tracks to estimate camera poses.  Use bundle adjustment [a]Andreas Geiger et. Al. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite CVPR 2012 62
  • 63. Bundle Results 63  Bundler results after 10, 20, 30 and 40 frames
  • 64. Sparse Reconstructions 64  But our target is to obtain a large scale dense 3D world representation.
  • 65. Depth-Map Estimation  Semiglobal block matching[1] for disparity estimation  Per-pixel depth computed as z = B × f / d [1] H. Hirschmueller, Stereo Processing by Semi-Global Matching and Mutual Information. PAMI 2008. B – Baseline f - Focal Length d – pixel disparity 65
  • 66. Depth Fusion  Depth estimates are fused using camera poses.  Fused into truncated signed distance (TSDF) volumetric representation[1].  Surface mesh generated though marching tetrahedra algorithm. [1] Brian Curless and Marc Levoy, A Volumetric Method for Building Complex Models from Range Images Siggraph 96. Chap 4, Sec 4.3.2 66
  • 67. Depth fusion using TSDF Volume [1]  Entire space divided into grids of voxels.  For each voxel compute the truncated signed distance.  +ve increasing when it lies in the free space,  -ve when it lies behind the surface  zero when lies on the surface  Performed for all depth maps. [1] Brian Curless and Marc Levoy, A Volumetric Method for Building Complex Models from Range Images Siggraph 96. 67
  • 68. TSDF Volume -.8 -.4 .1 .5 1 1 1 Camera Actual surfaceTSDF volume 68
  • 69. TSDF Volume -1 -.8 -.3 .2 .8 1 1 1 -1 -.9 -.4 .1 .5 1 1 1 -1 -1 -.8 -.2 .1 1 1 1 -1 -1 -.9 -.3 .2 .8 1 1 -1 -1 -.9 -.4 .3 .9 1 1 -1 -1 -.8 -.3 .3 .9 1 1 -1 -1 -.9 -.5 .2 .8 1 1 -1 -1 -.6 .1 .7 1 1 1 Camera TSDF volume Actual surface 69
  • 70. Fusing multiple depth maps 70  Increased number of depth maps results in smooth surface generation Chap 4, Sec 4.3.2
  • 71. Incremental Volume Update  Road scenes are generally described through arbitrarily long image sequence.  3x3x1 volume of voxel grids initialised 71 Vehicle path ~1km
  • 72. Incremental Volume Update  Need to map large sequence  3x3x1 volume of voxel grids initialised  Incrementally add volume as the vehicle moves out of the region  Allows to map arbitrarily long sequence  Important for outdoor scenes 72 Vehicle path ~1km
  • 73. Large scale dense reconstruction 73  Textured reconstruction.
  • 74. Semantic Model Generation  We use conditional random field framework (CRF) 74 • Each pixel is a node in a grid graph G = (V,E) having a random variable x taking a label from label set. • Total energy E = Epix + Epair + Eregion • Epix - Model individual pixel’s cost of taking a label. [1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009. CRF construction[1] Image SegmentationInput Image Chap 4, Sec 4.4.1 x Fence 0.2 Road 0.3
  • 75. Semantic Image Segmentation  Epair- Model each pixels neighbourhood interaction.  Encourages label consistency in adjacent pixels and sensitive to edges.  Contrast sensitive Potts model  Both colour and depth images are used  Eregion - Model behaviour of a group of pixels  Groupings though superpixels xi xj Fence Road 0 g(i,j) Fence Road 75 Epair
  • 76. Semantic Image Segmentation - Results  Input Images, output of our image level CRF, ground truths. 76
  • 77. Mesh Face Labelling  A histogram of labels is built for each mesh face (Zf ), by projecting the points from the face into labelled images.  Majority label is considered as the label of the face. Chap 4, Sec 4.4.2 77
  • 78. Semantic Model Top: Left – Surface reconstruction, Right – Semantic model Bottom: Left - input image, Right- object label set 78
  • 79. Evaluation  KITTI Object Labelled Datasets: Manually labelled images for object class training (available for download). [1]  The Model is projected back using the estimated camera poses to create labelled images.  The points in the model far away from the camera are ignored in the projection. [1] http://www.robots.ox.ac.uk/~tvg/projects/SemanticUrbanModelling/index.php Chap 4, Sec 4.5 79
  • 80. Evaluation  Metrics  Recall = tp/(tp+fn)  Intersection vs Union = tp/(tp+fn+fp) 80
  • 81. Video
  • 82. Long Sequence 82  1km dense reconstruction overlaid on a google map. Path of the vehicle.
  • 83. Chapter Conclusion  Large scale dense semantic reconstruction  Sequential volume update for accommodating long sequences  Labelled dataset released.  Labelling performed in image level – results in semantic inconsistency, redundant labelling and slow overall inference process.  Object layout in the scene helps in labelling 83
  • 84. Chapter 5 - Mesh Based Scene Labelling 84  Motivation  Redundancy : Individual street level image labelling – 0.5m pixels per image to process. (scene of 100-150 images ~ 75m pixels) : Slow  Inconsistency in labelling  Utilizing structure through mesh connectivity.  Solution: Perform labelling on mesh Chap 5, Sec 5.1
  • 85. Mesh labelling Framework 85  Depth maps fused into mesh.  Every mesh location associated with set of image pixels across a set of images.  Obtain a combined appearance score from these pixels through a depth sensitive fusion of scores.  Define CRF on mesh and perform inference on the structure. Mesh based labelling framework
  • 86. CRF over Scene Mesh 86  We use conditional random field framework (CRF) defined over the mesh locations. • Each mesh vertex is a node in a graph G = (V,E), where E is defined according to mesh neighbourhood. • Each node is a random variable x taking a label from label set. Chap 5, Sec 5.3
  • 87. Unary Score 87  Total energy  Pixel class-wise classifier score given as , which are combined as:  ‘f’ can take ‘max’, ‘average’ or ‘weighted’.  ‘weighted’ – weigh inversely the class scores by 3D distance of the pixel from respective camera centre. xi Image pixel set from K images (Registration) vertex := Chap 5, Sec 5.3.1
  • 88.  Pairwise defined on the mesh connectivity.  Takes the form of potts  , with Zi and Zj are the 3D locations of the mesh vertex i and j .  Thus the mesh location close to each other are encouraged to take same labels. Pairwise 88
  • 89. Experiments and results 89  Mesh segmentation with the corresponding images of the scene Chap 5, Sec 5.4
  • 91. Evaluation 91  Created ground truth mesh for evaluation [1]. [1] http://www.robots.ox.ac.uk/~tvg/projects/
  • 92. Observations 92  Improved accuracy for mesh based inference over image based labelling and projecting the labels  The pairwise connection respecting mesh connectivity improves labelling Ground Truth Unary only Unary + Pair Image
  • 93. Timing performance 93  Labelling over mesh improves performance in inference stage.  Scene of 150 images of resulotion 1281x376 ≅ 75𝑚𝑙𝑛  Mesh 704K vertex and 1.27m faces  25x speedup in inference at our operating point  Further speedup possible by computing classifier response only for registered pixels to mesh.
  • 94. Inference Time with varying mesh size 94  Mesh created for the same scene with finer granularity.
  • 95.  Note –ground truth mesh generated for each granularity  Varying mesh granularity makes smaller sized mesh face and has effect on pairwise cost Accuracy with varying mesh granularity 95
  • 96. Scene editing 96  Labelling in 3D structure can help to categorize the 3D regions.  Some active scene editing ,e.g. vehicle moving on the road. Chap 5, Sec 5.4
  • 97. Scene edit - dynamic 97
  • 98. Chapter Conclusions 98  Present a mesh based inference for scene labelling.  Inference on mesh provides an accurate and faster approach towards scene labelling.  Presented a classifier score combination method which improves accuracy.  Upto 25x faster in inference stage for outdoor scenes.  Applications – scene editing can be performed once scene is labelled.  However the mesh representation is limiting for various robotic tasks, which we try to overcome in next chapter.
  • 99. Chapter 6 - Hierarchical CRF on an Octree Graph 99  Computer vision – attempts to recognise scene has been studied exhaustively.  Robotics – efficient/accurate 3D representation of scene for various robotic tasks, but little for understanding semantics.  Aim - Join the two hands towards recognition in an efficient representation, and present a method which  Performs jointly recognition and infers occupancy.  Uses hierarchal constraints to perform scene labelling  Uses an efficient 3D representation for determining occupied, free and unknown area. Chap 6, Sec 6.1
  • 100. Good 3D representation 100  Why  Needed for further processing tasks  Robotics domain – mapping, grasping/manipulation, navigation  Graphics domain – efficient rendering over graphics processing unit and visualization  What  Should map accurately  Occupied: Objects present in the world,  Free: required for collision avoidance, path planning.  Unmapped: unknown areas in the scene need to be avoided.  Efficiency: Any 3D volume requires to be identified as free/occupied/unmapped efficiently.
  • 101. Existing 3d representation 101  Storing 3D measurements from sensors through point clouds – cannot map free and unknown area   Mesh – same limitations as pt. clouds   Stixels/Height maps/2.5D : one height value in a 2D grid, but free area not accurately mapped   Fixed sized grid of voxels: Voxels not indexed which makes it inefficient   Octree based volumetric representation – Introduced more than three decade back, represents accurately 3d space, efficient indexing of volume 
  • 102. Octomap - representation 102  Octree representatation  Every voxels/volume divided into 8 subvolume, allowing fast indexing of voxels  Advantageous in comparison to point clouds, surface maps, elevation/2.5d representations  Used widely across computer science  Hardware friendly (cpu, gpu, fpga)  Octomap [a] proposed in 2013  Probabilistic representation of occupied, free and unknown regions  Based on octree based 3d representation  Demonstrated to map large areas though fusion of depth estimates. [a] O Armin Hornung, ctoMap: An efficient probabilistic 3D mapping framework based on octrees. Autonomous Robots, 2013.
  • 103. Multi-resolution approaches in Computer vision 103  Multi-resolution approach used for recognition, classification detection  Information at pixel level, pair of pixels or group of pixels combined together  Robust PN model [1] - penalised label inconsistency over a group of pixels.  Grouping determined through unsupervised image segmentation  Here we extend the multi-resolution image based classification approach to 3D volume indexed through an octree [1], P. Kohli et at. Robust Higher Order Potentials for Enforcing Label Consistency
  • 104. Semantic Octree - framework 104  Input stereo images Chap 6, Sec 6.3
  • 105. Semantic Octree - framework 105  Generate point clouds and class hypothesis for every pixel Chap 6, Sec 6.3
  • 106. Semantic Octree - framework 106  Fuse into an octree through estimated camera  Octree – each volume subdivided in 8 sub-volumes  Leaf- nodes (xi) are the smallest sized voxels  Any internal node (xc) gives a natural grouping of 3D space Chap 6, Sec 6.3
  • 107.  Perform inference over 3D voxels to give labelled scene. Semantic Octree - framework 107 Chap 6, Sec 6.3
  • 108. CRF graph on Octree voxels  Octree divides the space into subvolumes indexed through tree with nodes  τint : Internal nodes in the tree (xc)  τleaf : leaf level voxels (xi)  Random variable for every leaf voxel  Every internal node is associated with a set of leaf voxels resulting in a clique  Label set defined as  Final energy : 108 Chap 6, Sec 6.3
  • 109.  Octree Volume update  All voxels initially set unknown and occupancy probability P(xi) = 0.5 and log odds  For each 3D point (obtained from stereo pairs), voxels’ log odds updated in a ray casting manner  Log odds are updated for all 3D points for every stereo pairs  Final occupancy probability obtained as Unary score for leaf voxels 109 Chap 6, Sec 6.3.1
  • 110. Unary score for leaf voxels  Each occupied voxel xi is associated with a set of 3D pts  The corresponding image pixels denoted as  Pixel scores combined together  Given the initial occupancy P(xi), the unary is given as:  Thus, for every initially estimated occupied voxels have low cost for free label and vice verca 110 Chap 6, Sec 6.3.1
  • 111. Hierarchical tree potential  Robust PN potential applied over hierarchical groupings of voxels  Penalise label inconsistency within the grouping of voxels  Takes the form  Maximum cost truncated to ϒmax  Grouping of voxels correspond to internals nodes in the octree 111 Chap 6, Sec 6.3.2
  • 112. Experiments 112  Octree defined of 16 levels  Smallest resolution of voxels = (8x8x8)cm3  Maximum mapped volume (216 x 8 )3cm ~ 5.243 km3  Hierarchical grouping of voxels corresponding to internal nodes 13-15 considered
  • 113. Results 113  Higherarchial grouping while inference vs leaf level voxel labelling (much sparser) Chap 6, Sec 6.4
  • 114.  Quantitative evaluation :  Performed by projecting into image domain  Observations  Small objects tend to get decimated due to octree quantization hence reduced accuracy  Mesh based representation better in representing surface.  Non-uniform Grouping of volumes (k-d tree) can be used to improve results Results 114
  • 115. Occupancy mapping 115  Grouping of voxels hierarchically increases the occupied volume reducing the sparsity
  • 116. Chapter Conclusion 116  A method to infer jointly object class labels and occupancy mapping proposed  Efficient representation of 3D space for further operations like navigation and manipulation  Octree poses a quantization error which can be approached through grouping of volumes through k-d tree
  • 117. Thesis - Conclusions 117  This thesis covered the aspects of scene understanding and proposed solutions for dense semantic mapping and reconstruction  Chapter 3 – Large scale Dense semantic mapping  Overhead semantic view of an urban region  Experiments to generate ~15km map  One of the first large scale semantic map  Presented as oral in IEEE IROS 2012 Chap 7, Sec 7.1
  • 118. Thesis - Conclusions 118  Chapter 4 – Dense semantic reconstruction  Dense semantic reconstruction from kms of stereo images.  Online volumetric reconstruction to accommodate arbitrarily long road scenes.  Presented as oral in IEEE ICRA 2013  Chapter 5 – Mesh based inference for scene labelling  Improved labelling accuracy (pairwise connections respect mesh connectivity) and consistency.  Depth sensitive classifier fusion.  25x faster in inference time  Presented as poster in CVPR 2013
  • 119. Conclusions 119  Chapter 6 – Hierarchical CRF on an Octree Graph  Unified framework to determine 3D volume occupancy and with object class labels in the scene.  Efficient representation  Robust PN potential over octree volumes  Datasets (available publicly)  Yotta labelled dataset: multiview street images (urban, rural, highway) containing 8000+ images, with object class labellings  Kitti Labelled dataset: Object class labelling for publicly available KITTI dataset
  • 120. Way forward 120  Transfer learning – so many datasets with so many labellings. Should aim to learn from multiple source and apply in test cases.  Life long learning – an agent needs to identify the object irrespective of changes in environment  Exploit High level attributes  Need to investigate for an end-to-end real-time pipeline for dense recognition, reconstruction  Exploit scene dynamics – DVS (dynamic vision systems) give only modified pixels through efficient sensors. Chap 7, sec 7.2
  • 121. Thank you 121  Acknowledgements  Supervisors: Philip Torr and David Duce  Thesis Examiners: Gabriel Brostow and Nigel Crook  Collaborators: Paul Sturgess, Lubor Ladicky, Ali Shahrokni, Eric Greeveson, Julien Valentin, Ziming Zhang, Johnathan Warrell, Chris Russell, Yalin Bastanlar, William Clocksin, Vibhav Vineet, Mike Sapi.
  • 122. References 122  Lubor Ladicky et. al. Associative hierarchical crfs for object class image segmentation. ICCV, 2009, PAM13  Pushmeet Kohli et. Al Robust Higher Order Potentials for Enforcing Label Consistency, IJCV 09  Paul Sturgess et. Al. Combining Appearance and Structure from Motion Features for Road Scene Understanding, BMVC 09  Lubor Ladicky et. al. Joint optimisation for object class segmentation and dense stereo reconstruction. BMVC, 2010, IJCV 12  Richard A. Newcombe et. al. Kinectfusion: Real-time dense surface mapping and tracking. In IEEE ISMAR 2011.
  • 123. 123