Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Video Object Segmentation in Videos

372 visualizaciones

Publicado el

발표자: 고영준 (고려대 박사과정)
발표일: 2017.6.

Algorithms to segment objects in a video sequence will be presented.
First, I will introduce a primary object segmentation algorithm based on region augmentation and reduction. Second, collaborative detection, tracking, and segmentation for online multiple object segmentation will be presented.

Publicado en: Tecnología
  • Sé el primero en comentar

Video Object Segmentation in Videos

  1. 1. Video Object Segmentation 고려대학교 고영준
  2. 2. Segmentation
  3. 3. • Divide data into meaningful segments Segmentation Superpixel Image segmentation Video segmentation Video object segmentation
  4. 4. Video Object Segmentation • Semi-supervised video object segmentation • Primary object segmentation • Multiple object segmentation
  5. 5. Semi-supervised Video Object Segmentation • Track and segment a target object • Annotated by a user in the first frame First frame & user annotation Segment track
  6. 6. Primary Object Segmentation • Segment a primary object in a video automatically Primary object: Diver Primary object: Tennis player
  7. 7. Multiple Object Segmentation • Extract multiple segment tracks as many as possible
  8. 8. Primary Object Segmentation
  9. 9. Primary Object Segmentation • Primary object segmentation • Initial region estimation • Motion boundaries • Object proposal • Saliency maps • Refinement • Construct models for the primary object and the background, e.g. Gaussian mixture models (GMMs) • Propose augmentation and reduction process (ARP)
  10. 10. Primary Object Segmentation in Videos Based on Region Augmentation and Reduction • Overview • Input: A set of consecutive video frames • Output: A set of pixel-wise segments to delineate the primary object
  11. 11. Candidate Region Generation • Candidate regions • Ultrametric contour map (UCM) • Obtain color-based and motion-based UCMs • Each region in UCM becomes a superpixel
  12. 12. Candidate Region Generation • Candidate regions • Generate candidate regions by merging neighboring superpixels • Determine the pair, 𝑠 𝑚 and 𝑠 𝑛, sharing the weakest boundary • Merge 𝑠 𝑚 and 𝑠 𝑛 in a single superpixel • Repeat this process only one superpixel remains
  13. 13. Candidate Region Generation • Foreground confidence • Measure the foreground confidence of each candidate region • Appearance confidence 𝜙𝑖 (𝑡) • Obtain a saliency map using technique in [1] • Average the saliency values within the candidate region • Edge confidence 𝜓𝑖 (𝑡) • Combine color-based edge map and motion-based edge map 𝑐𝑖 (𝑡) = 𝜙𝑖 (𝑡) + 𝜓𝑖 (𝑡) [1] W.-D. Jang, C. Lee, and C.-S. Kim, “Primary object segmentation in videos via alternate convex optimization of foreground and background distributions,” CVPR, 2016
  14. 14. Candidate Region Generation • Foreground confidence • Select the top 20 candidate regions • Warp the selected candidate regions to neighboring frames • Rearrange the set of candidate regions 𝒬(𝑡) = 𝑞1 𝑡 , 𝑞2 𝑡 , … , 𝑞 𝑁 (𝑡) • Feature description • Describe the feature 𝐟𝑖 (𝑡) of each candidate region 𝑞𝑖 (𝑡) using the bag-of-visual-words approach
  15. 15. Initial Region Estimation • Selecting initial primary object regions • Choose the main region 𝑞 𝛿 (𝑡) among candidate regions • Exploit the recurrence property that a primary object appears repeatedly in a video sequence Input frames Candidate region generation Initial region estimation
  16. 16. Initial Region Estimation • Selecting initial primary object regions • Assume that feature of main region 𝑞 𝛿 (𝑡) should be similar to features of the main regions in the other frames • 𝐩 𝜏 denotes the feature of the main region in frame 𝐼(𝜏) 𝛿 = arg min ෍ 𝜏=1,𝜏≠𝑡 𝑑 𝜒 𝐟𝑖 (𝑡) , 𝐩 𝜏 Input frames Candidate region generation Initial region estimation
  17. 17. Initial Region Estimation • Selecting initial primary object regions • Initialization of 𝐩 𝜏 • Superpose features of all candidate region in 𝒬(𝜏) • Combine features of candidate regions, 𝐅(𝜏) = 𝐟1 𝜏 , … , 𝐟 𝑁 𝜏 , using the foreground confidence vector 𝐜(𝜏) = 𝑐1 𝜏 , … , 𝑐 𝑁 𝜏 𝑇 • Obtain the main region 𝑞 𝛿 (𝑡) by applying 𝐩 𝜏 for each frame • Alternative update of the main regions • Update 𝐩 𝑡 for each frame by 𝐩 𝑡 ← 𝐟𝛿 𝜏 • Choose the main region using the updated features 𝐩 𝜏 = 𝐅(𝜏) 𝐜(𝜏) 𝛿 = arg min ෍ 𝜏=1,𝜏≠𝑡 𝑑 𝜒 𝐟𝑖 (𝑡) , 𝐩 𝜏
  18. 18. Primary Object Region Refinement • Refinement of primary object regions • Initial regions may exclude parts of primary objects or include noisy regions (background or other objects) • Attempt to refine initial regions • Augment initial regions with missing region • Reducing initial regions by removing noisy regions
  19. 19. Primary Object Region Refinement • Augmented regions • Augment initial regions 𝑞 𝛿 𝑡 with candidate region 𝑞𝑖 𝑡 in 𝒬(𝑡) • Reduced regions • Reduce initial regions 𝑞 𝛿 𝑡 using candidate region 𝑞 𝑗 𝑡 in 𝒬(𝑡) 𝑞 𝛿 𝑡 𝑞𝑖 𝑡 𝑞𝑖 𝑡 𝑞 𝛿 𝑡 𝑟𝑖 𝑡 = 𝑞 𝛿 𝑡 ∪ 𝑞𝑖 𝑡 𝑞 𝛿 𝑡 𝑞 𝑗 𝑡 𝑞 𝛿 𝑡 𝑞 𝑗 𝑡 𝑟𝑗 𝑡 = 𝑞 𝛿 𝑡 ∩ 𝑞 𝑗 𝑡
  20. 20. Primary Object Region Refinement • Augmentation and reduction process (ARP) • Determine whether to augment or reduce 𝑞 𝛿 𝑡 by cost function • Data cost • Constrain that the refined region 𝑟𝑖 (𝑡) should be similar to initial regions in all frames • Segmentation cost • Make the refined region 𝑟𝑖 (𝑡) as dissimilar from its nearby background as possible 𝐶 𝑟𝑖 (𝑡) = 𝐶data 𝑟𝑖 (𝑡) + 𝛾 ⋅ 𝐶seg 𝑟𝑖 (𝑡) 𝐶data 𝑟𝑖 (𝑡) = 1 𝑇 ෍ 𝜏=1 𝑑 𝜒 𝐟r,𝑖 (𝑡) , 𝐟𝛿 (𝑡) 𝐶seg 𝑟𝑖 (𝑡) = −𝑑 𝜒 𝐟r,𝑖 (𝑡) , 𝐟b,𝑖 (𝑡)
  21. 21. Primary Object Region Refinement • Augmentation and reduction process (ARP) • Minimize the cost function for the optimal refined region • Perform ARP iteratively • Construct the set of augmented and reduced regions again by employing 𝑟∗ 𝑡 as the initial region • Find the optimal 𝑟∗ 𝑡 by minimizing 𝐶 𝑟𝑖 (𝑡) • Repeat until 𝑟∗ 𝑡 is unchanged 𝑟∗ 𝑡 = arg min 𝐶 𝑟𝑖 (𝑡)
  22. 22. Primary Object Region Refinement • Augmentation and reduction process (ARP)
  23. 23. • DAVIS dataset [2] • 50 video sequences (3,455 annotated frames) • Performance measure • Region similarity 𝒥: Intersection over union • Contour accuracy ℱ: F-measure that is the harmonic mean of the contour precision and recall rates Experimental results [2] F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” CVPR 2016
  24. 24. Experimental results • Impacts of ARP • Compare ARP with the conventional refinement techniques [20, 36] • Apply refinement techniques to our initial regions (IR) [20] A. Papazoglou and V. Ferrari, “Fast object segmentation in unconstrained video,” ICCV,2013. [36] D. Zhang, O. Javed, and M. Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions,” CVPR, 2013.
  25. 25. Experimental results • Quantitative comparison • Semi-supervised: Human annotation at the first frame • Multiple VOS: Output multiple objects • POS: Output primary object objects
  26. 26. Experimental results • Qualitative results
  27. 27. Multiple Object Segmentation
  28. 28. Multiple Object Segmentation • Multiple object segmentation • Motion segmentation • Cluster point trajectories in a video • Video object proposal • Proposal matching • Proposal clustering • Segmentation guided by object detection and tracking
  29. 29. CDTS: Collaborative Detection, Tracking, and Segmentation for Online Multiple Object Segmentation in videos • Overview • Input: A set of consecutive video frames • Output: Multiple segment tracks Input frames Detection and tracking results Joint detection and tracking ASE segmentationObject track generation
  30. 30. Object Track Generation • Joint detection and tracking • Detector [3] • Find object location without manual annotations • Some objects may remain undetected • Tracker [4] • Boost the recall rate of objects using temporal correlations • Three cases • Both detection and tracking boxes • Only detection box • Only tracking box [3] Y. Li, K. He, J. Sun, et al. “R-FCN: Object detection via region-based fully convolutional networks,” NIPS, 2016 [4] H.-U. Kim, D.-Y. Lee, J.-Y. Sim, and C.-S. Kim, “SOWP: Spatially ordered and weighted patch descriptor for visual tracking,” ICCV, 2015.
  31. 31. Object Track Generation • Joint detection and tracking • Both detection and tracking boxes • Match detection and tracking boxes • The Hungarian algorithm • Choose the more accurate box for each matching pair • Link the selected box to the corresponding object track • Unmatched detection box • Regard as newly appearing object • Unmatched tracking box • Link to the corresponding object track
  32. 32. ASE Segmentation • Alternate shrinking and expansion (ASE) • Over-segment frame in to superpixels • Dichotomize each superpixel within and near the box into either foreground or background class
  33. 33. ASE Segmentation • Over-segmentation • Obtain superpixels using UCM • Preliminary classification • Exploit overlap ratio between the box and each superpixel • Refine preliminary foreground regions
  34. 34. ASE Segmentation • Intra-frame refinement • Constrain foreground regions to have intense edge strengths • Boundary cost • Shrink foreground regions by remove superpixels to minimize the boundary cost in a greedy manner 𝐶bnd 𝐹𝑖 (𝑡) = − ෍ 𝐱∈𝜕𝐹𝑖 (𝑡) 𝑈 𝑡 𝐱
  35. 35. ASE Segmentation • Inter-frame refinement • Constrain that the refined region should be similar to the segmentation results in previous frames • Cost function • Expand foreground regions by augmenting superpixels • Perform shrinking in a similar way 𝐶inter 𝐹𝑖 (𝑡) , ℬ𝑖 (𝑡) = 𝛼 ⋅ 𝐶tmp 𝐹𝑖 𝑡 + 𝐶seg 𝐹𝑖 (𝑡) , ℬ𝑖 (𝑡) +𝐶bnd 𝐹𝑖 (𝑡)
  36. 36. ASE Segmentation
  37. 37. Experimental Results • YouTube-Objects dataset • Contain 126 videos for 10 object classes • Performance measure • Intersection over union (IoU) [34] Y.-H. Tsai, G. Zhong, and M.-H. Yang, “Semantic cosegmentation in videos.,” ECCV,2016. [42] Y. Zhang, X. Chen, J. Li, C. Wang, and C. Xia, “Semantic object segmentation via detection in weakly labeled video,” CVPR 2015.
  38. 38. Experimental results • Qualitative results
  39. 39. Q&A • Thank you