Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Computer Vision Crash Course
Jia-Bin Huang
University of Illinois, Urbana-Champaign
www.jiabinhuang.com Jan 12, 2016
Overview
• A little about me
• Introduction to Computer Vision
=== Intermission ===
• Fundamentals and Applications
• Reso...
About me
• Born in Kaohsiung, raised in Taipei
About me
National Chiao-Tung University
B.S. in EE
IIS, Academia Sinica
Research Assistant
UIUC
Ph.D. in ECE
(Expected sum...
My research: Computational Photography
Image Completion Image super-resolution
My Research: Video-based Bird Migration Monitoring
My Research: Visual Tracking
Object Tracking Multi-face Tracking
What is Computer Vision?
• Make computers understand images and videos.
• What kind of scene?
• Where are the cars?
• How ...
What is Computer Vision?
• Make computers understand images and videos.
• What are they doing?
• Why is this happening?
• ...
Computer Vision and Nearby Fields
Images (2D)
Geometry (3D)
Shape
Photometry
Appearance
Digital Image Processing
Computati...
Visual data on the Internet
• Flickr
• 10+ billion photographs
• 60 million images uploaded a month
• Facebook
• 250 billi...
Too big for humans
• Need automatic tools to access and analyze visual data!
http://www.petittube.com/
Slide credit: Alexei A. Efros
Vision is Really Hard
• Vision is an amazing feature of natural intelligence
• Visual cortex occupies about 50% of Macaque...
Why is Computer Vision Hard?
Why is Computer Vision Hard?
What did you see?
• Where this picture was taken?
• How many people are there?
• What are they doing?
• What object the pe...
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Why is Computer Vision Hard?
Computer: okay, it’s a funny picture
Computer Vision Matters
Safety Health Security
Comfort AccessFun
History of Computer Vision
Marvin Minsky, MIT
Turing award, 1969
“In 1966, Minsky hired a first-year
undergraduate student...
Half a century later,
we're still working on it.
History of Computer Vision
Marvin Minsky, MIT
Turing award, 1969
Gerald Sussman, MIT
“You’ll notice that Sussman never wor...
1960’s: interpretation of synthetic worlds
Larry Roberts
“Father of Computer Vision”
Larry Roberts PhD Thesis, MIT, 1963,
...
1970’s: some progress on interpreting selected images
The representation and matching of pictorial structures
Fischler and...
1970’s: some progress on interpreting selected images
The representation and matching of pictorial structures
Fischler and...
1980’s: ANNs come and go; shift toward geometry
and increased mathematical rigor
Image credit: Rick Szeliski
Goodbye
science
1990’s: face recognition; statistical analysis in vogue
Image credit: Rick Szeliski
2000’s: broader recognition; large annotated
datasets available; video processing starts
Image credit: Rick Szeliski
2010’s: resurgence of deep learning
[AlexNet NIPS 2012] [DeepFace CVPR 2014]
[DeepPose CVPR 2014] [Show, Attend and Tell I...
2020’s: autonomous vehicles
2030’s: robot uprising?
5-mins break
Examples of Computer Vision Applications
• How is computer vision used today?
Face detection
• Most digital cameras and smart phones detect faces (and more)
• Canon, Sony, Fuji, …
• For smart focus, e...
Face recognition
Facebook face auto-tagging
Face Landmark Alignment
TAAZ.com: Virtual makeover Face morphing
Jia-Bin Huang
Miss Daegu 2013 Contestants Face Morphing
Face Landmark Alignment- Pose Estimation
Pitch: ~ 15 degree
Yaw: ~ +20, -20 degree
Jia-Bin Huang What's the Best Pose for ...
Face Landmark Alignment – 3D Persona
What Makes Tom Hanks Look Like Tom Hanks ICCV 2015
Video-based face replacement
Smile Detection
Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
Eye contact detection
Gaze locking, UIST 2013
Vision-based Biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia
Slide credit: S...
Vision-based Biometrics
Optical Character Recognition (OCR)
Digit recognition, AT&T labs
http://www.research.att.com/~yann/
License plate readers
...
Computer vision in sports
Hawk-Eye: helping/improving referee decisions
Computer vision in sports
SportVision: improving viewer experiences
Computer vision in sports
Replay Technologies: improving viewer experiences
Computer vision in sports
Play tracking
Computer vision in sports
Second Spectrum: visual analytics
Visual recognition for photo organization
Google photo
Matching street clothing photos in online Shops
Street2shop, ICCV 2015
3D scanner from a mobile camera
MobileFusion
Earth viewers (3D modeling)
Image from Microsoft’s Virtual Earth
(see also: Google Earth)
Slide credit: Steve Seitz
3D from thousands of images
Building Rome in a Day: Agarwal et al. 2009
3D from thousands of images
[Furukawa et al. CVPR 2010]
Microsoft PhotoSynth: Photo Tourism
MS PhotoSynth in CSI
Indoor Scene Reconstruction
Structured Indoor Modeling, ICCV2015
Tour into picture
Adobe Affer Effect
First-person Hyperlapse Videos
[Kopf et al. SIGGRAPH 2014]
3D Time-lapse from Internet Photos
3D Time-lapse from Internet Photos, ICCV 2015
Special effects: matting and composition
Kylie Minogue - Come Into My World
Style transfer
A Neural Algorithm of Artistic Style [Gatys et al. 2015]
Target image (Content)Source image (Style) Output ...
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Special effects: shape capture
Slide credit: Steve Seitz
Pirates of the Carribean, Industrial Light and Magic
Special effects: motion capture
Slide credit: Steve Seitz
Google cars
Google in talks with Ford, Toyota and Volkswagen to realise driverless cars
http://www.theatlantic.com/technol...
Interactive Games: Kinect
• Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
• Mario: http://www....
Vision in space
Vision systems (JPL) used for several tasks
• Panorama stitching
• 3D terrain modeling
• Obstacle detectio...
Industrial robots
Vision-guided robots position nut runners on wheels
http://www.automationworld.com/computer-vision-oppor...
Mobile robots
http://www.robocup.org/NASA’s Mars Spirit Rover
Saxena et al. 2008
STAIR at Stanford
http://www.youtube.com/...
Medical imaging
Image guided surgery
Grimson et al., MIT3D imaging
MRI, CT
Computer vision for healthcare
BiliCam, Ubicomp 2014
Computer vision for healthcare
Autism Screening
Computer vision for healthcare
Video magnification
Computer vision for the mass
Counting cells
Analyzing social effects of green space
20-mins coffee break
Fundamentals of Computer Vision
• Light
– What an image records
• Matching
– How to measure the similarity of two regions
...
Image Formation
Digital camera
A digital camera replaces film with a sensor array
• Each cell in the array is light-sensitive diode that c...
Sensor Array
CMOS sensor
CCD sensor
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0....
Perception of Intensity
from Ted Adelson
• A is brighter?
• B is brighter?
• They are equally bright
Perception of Intensity
from Ted Adelson
Digital Color Images
Color Image R
G
B
Image filtering
• Linear filtering: function is a weighted sum/difference of pixel
values
• Really important!
• Enhance im...
111
111
111
Slide credit: David Lowe (UBC)
],[g 
Example: box filter
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 9...
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 9...
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 9...
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 9...
0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 ...
0 10 20 30 30
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 ...
0 10 20 30 30
50
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 ...
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 90 90 90 90 90 0 0
0 0 0 9...
What does it do?
• Replaces each pixel with an
average of its neighborhood
• Achieve smoothing effect
(remove sharp featur...
Smoothing with box filter
Practice with linear filters
000
010
000
Original
?
Source: D. Lowe
Practice with linear filters
000
010
000
Original Filtered
(no change)
Source: D. Lowe
Practice with linear filters
000
100
000
Original
?
Source: D. Lowe
Practice with linear filters
000
100
000
Original Shifted left
By 1 pixel
Source: D. Lowe
Practice with linear filters
Original
111
111
111
000
020
000
- ?
(Note that filter sums to 1)
Source: D. Lowe
Practice with linear filters
Original
111
111
111
000
020
000
-
Sharpening filter
- Accentuates differences with local
ave...
Sharpening
Source: D. Lowe
Other filters
-101
-202
-101
Vertical Edge
(absolute value)
Sobel
Other filters
-1-2-1
000
121
Horizontal Edge
(absolute value)
Sobel
Basic gradient filters
000
10-1
000
010
000
0-10
10-1
or
Horizontal Gradient Vertical Gradient
1
0
-1
or
Demo - image filtering
Correspondence and alignment
• Correspondence: matching points, patches, edges, or regions across
images
≈
Correspondence and alignment
• Alignment: solving the transformation that makes two things match
better
A
Example: fitting an 2D shape template
Slide from Silvio Savarese
Example: fitting a 3D object model
Slide from Silvio Savarese
Example: estimating “fundamental matrix” that
corresponds two views
Slide from Silvio Savarese
Example: tracking points
frame 0 frame 22 frame 49
x x
x
Overview of Keypoint Matching
K. Grauman, B. Leibe
Af Bf
A1
A2 A3
Tffd BA ),(
1. Find a set of
distinctive key-points
3. ...
Goals for Keypoints
Detect points that are repeatable and distinctive
Key trade-offs
More Repeatable More Points
A1
A2 A3
Detection
More Distinctive More Flexible
Description
Robust to occlusi...
Choosing interest points
Where would you tell your
friend to meet you?
Choosing interest points
Where would you tell your
friend to meet you?
Local Descriptors: SIFT Descriptor
[Lowe, ICCV 1999]
Histogram of oriented
gradients
• Captures important texture
informat...
Examples of recognized objects
Demo – interest point detection
Single-view Geometry
How tall is this woman?
Which ball is closer?
How high is the camera?
What is the camera
rotation?
Wh...
Single-view Geometry
Mapping between image and world coordinates
• Pinhole camera model
• Projective geometry
• Vanishing ...
Image formation
Let’s design a camera
– Idea 1: put a piece of film in front of an object
– Do we get a reasonable image?
...
Pinhole camera
Idea 2: add a barrier to block off most of the rays
• This reduces blurring
• The opening known as the aper...
Pinhole camera
Figure from David Forsyth
f
f = focal length
c = center of the camera
c
Figures © Stephen E. Palmer, 2002
Dimensionality Reduction Machine (3D to 2D)
Point of observation
3D world 2D image
Projection can be tricky…
Slide source: Seitz
Projection can be tricky…
Slide source: Seitz
Making of 3D sidewalk art: http://www.youtube.com/watch?v=3SNYtd0Ayt0
Vanishing points and lines
Parallel lines in the world intersect in the image at a “vanishing point”
Vanishing points and lines
Vanishing
point
Vanishing
line
Vanishing
point
Vertical vanishing
point
(at infinity)
Slide fro...
Comparing heights
Vanishing
Point
Slide by Steve Seitz
142
Measuring height
1
2
3
4
5
5.3
2.8
3.3
Camera height
Slide by Steve Seitz
143
Image Stitching
• Combine two or more overlapping images to make one larger image
Add example
Slide credit: Vaibhav Vaish
Illustration
Camera Center
RANSAC for Homography
Initial Matched Points
Final Matched Points
RANSAC for Homography
RANSAC for Homography
Bundle Adjustment
• New images initialised with rotation, focal length of best matching
image
Bundle Adjustment
• New images initialised with rotation, focal length of best matching
image
Details to make it look good
• Choosing seams
• Blending
Choosing seams
• Better method: dynamic program to find seam along well-matched
regions
Illustration: http://en.wikipedia....
Gain compensation
• Simple gain adjustment
• Compute average RGB intensity of each image in overlapping region
• Normalize...
Multi-band Blending
• Burt & Adelson 1983
• Blend frequency bands over range  l
Blending comparison (IJCV 2007)
Demo – feature matching and image stitching
Depth from Stereo
• Goal: recover depth by finding image coordinate x’ that corresponds
to x
f
x x’
Baseline
B
z
C C’
X
f
...
Correspondence Problem
• We have two images taken from cameras with different intrinsic and extrinsic
parameters
• How do ...
Potential matches for x have to lie on the corresponding line l’.
Potential matches for x’ have to lie on the correspondin...
Basic stereo matching algorithm
•If necessary, rectify the two stereo images to transform epipolar lines into
scanlines
•F...
800 most confident matches among
10,000+ local features.
Epipolar lines
Keep only the matches at are “inliers” with
respect to the “best” fundamental matrix
The Fundamental Matrix Song
5-mins break
What do you see in this image?
Can I put stuff in it?
Forest
Trees
Bear
Man
Rabbit Grass
Camera
Describe, predict, or interact with the
object based on visual cues
Is it alive?
Is it dangerous?
How fast does it run? Is...
Why do we care about categories?
• From an object’s category, we can make predictions about its behavior in the future,
be...
Image categorization: Fine-grained recognition
Visipedia Project
Place recognition
Places Database [Zhou et al. NIPS 2014]
Image style recognition
[Karayev et al. BMVC 2014]
Training phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Trained
Classifier
Testing phase
Training
Labels
Training
Images
Classifier
Training
Training
Image
Features
Image
Features
Testing
Test Imag...
Q: What are good features for…
• recognizing a beach?
Q: What are good features for…
• recognizing cloth fabric?
Q: What are good features for…
• recognizing a mug?
What are the right features?
Depend on what you want to know!
• Object: shape
• Local shape info, shading, shadows, textur...
“Shallow” vs. “deep” architectures
Hand-designed
feature extraction
Trainable
classifier
Image/ Video
Pixels Object
Class
...
Object Detection
Search by Sliding Window Detector
• May work well for rigid objects
• Key idea: simple alignment for simp...
Object Detection
Search by Parts-based model
• Key idea: more flexible alignment for articulated objects
• Defined by mode...
Demo – face detection and alignment
Vision as part of an intelligent system
3D Scene
Feature
Extraction
Interpretation
Action
Texture Color
Optical
Flow
Stere...
Well-Established
(patch matching)
Major Progress
(pattern matching++)
New Opportunities
(interpretation/tasks)
Multi-view ...
This course has provided fundamentals
• What is computer vision?
• Examples of computer vision applications
• Basic princi...
How do you learn more?
Explore!
Resources – where to learn more
• Awesome Computer Vision
• A curated list of awesome computer vision resources
• Computer...
Thank You!
Image: www.clubtuzki.com
Computer Vision Crash Course
Computer Vision Crash Course
Computer Vision Crash Course
Computer Vision Crash Course
Computer Vision Crash Course
Próxima SlideShare
Cargando en…5
×

Computer Vision Crash Course

14.394 visualizaciones

Publicado el

電腦視覺一二三

電腦視覺旨在發展演算法使得電腦能理解影像的內容,近年來電腦視覺相關的技術已廣泛應用於我們生活中,舉凡物件偵測,識別,追蹤,三維重建,多媒體分析以及檢索,監控系統,醫療影像,以及電視電影中的許多視覺效果都可以看到電腦視覺技術的應用。

這場演講的目的在於介紹電腦視覺中的基本觀念和核心技術,透過大量實際的範例幫助大家快速了解這些技術如何被應用在日常生活中,以期讓聽眾有效率地了解這個領域,最新的發展以及未來展望。

Publicado en: Tecnología
  • The #1 Woodworking Resource With Over 16,000 Plans, Download 50 FREE Plans... ▲▲▲ http://tinyurl.com/y3hc8gpw
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • There are over 16,000 woodworking plans that comes with step-by-step instructions and detailed photos, Click here to take a look  http://tinyurl.com/yy9yh8fu
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • http://applite.eu/page-views-or-unique-visitors-whats-more-important-817.html
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Computer Vision Crash Course

  1. 1. Computer Vision Crash Course Jia-Bin Huang University of Illinois, Urbana-Champaign www.jiabinhuang.com Jan 12, 2016
  2. 2. Overview • A little about me • Introduction to Computer Vision === Intermission === • Fundamentals and Applications • Resources
  3. 3. About me • Born in Kaohsiung, raised in Taipei
  4. 4. About me National Chiao-Tung University B.S. in EE IIS, Academia Sinica Research Assistant UIUC Ph.D. in ECE (Expected summer 2016) UC, Merced Visiting Student Microsoft Research Research Intern 2012, 2013 Disney Research Research Intern 2014
  5. 5. My research: Computational Photography Image Completion Image super-resolution
  6. 6. My Research: Video-based Bird Migration Monitoring
  7. 7. My Research: Visual Tracking Object Tracking Multi-face Tracking
  8. 8. What is Computer Vision? • Make computers understand images and videos. • What kind of scene? • Where are the cars? • How far is the building?
  9. 9. What is Computer Vision? • Make computers understand images and videos. • What are they doing? • Why is this happening? • What is important? • What will I see?
  10. 10. Computer Vision and Nearby Fields Images (2D) Geometry (3D) Shape Photometry Appearance Digital Image Processing Computational Photography Computer Graphics Computer Vision Machine learning: Vision = Machine learning applied to visual data
  11. 11. Visual data on the Internet • Flickr • 10+ billion photographs • 60 million images uploaded a month • Facebook • 250 billion+ • 300 million a day • Instagram • 55 million a day • YouTube • 100 hours uploaded every minute 90% of net traffic will be visual! Mostly about cats
  12. 12. Too big for humans • Need automatic tools to access and analyze visual data! http://www.petittube.com/
  13. 13. Slide credit: Alexei A. Efros
  14. 14. Vision is Really Hard • Vision is an amazing feature of natural intelligence • Visual cortex occupies about 50% of Macaque brain • More human brain devoted to vision than anything else Is that a queen or a bishop?
  15. 15. Why is Computer Vision Hard?
  16. 16. Why is Computer Vision Hard?
  17. 17. What did you see? • Where this picture was taken? • How many people are there? • What are they doing? • What object the person on the left standing on? • Why this is a funny picture?
  18. 18. Why is Computer Vision Hard?
  19. 19. Why is Computer Vision Hard?
  20. 20. Why is Computer Vision Hard?
  21. 21. Why is Computer Vision Hard?
  22. 22. Why is Computer Vision Hard?
  23. 23. Why is Computer Vision Hard?
  24. 24. Computer: okay, it’s a funny picture
  25. 25. Computer Vision Matters Safety Health Security Comfort AccessFun
  26. 26. History of Computer Vision Marvin Minsky, MIT Turing award, 1969 “In 1966, Minsky hired a first-year undergraduate student and assigned him a problem to solve over the summer: connect a camera to a computer and get the machine to describe what it sees.” Crevier 1993, pg. 88
  27. 27. Half a century later, we're still working on it.
  28. 28. History of Computer Vision Marvin Minsky, MIT Turing award, 1969 Gerald Sussman, MIT “You’ll notice that Sussman never worked in vision again!” – Berthold Horn
  29. 29. 1960’s: interpretation of synthetic worlds Larry Roberts “Father of Computer Vision” Larry Roberts PhD Thesis, MIT, 1963, Machine Perception of Three-Dimensional Solids Input image 2x2 gradient operator computed 3D model rendered from new viewpoint Slide credit: Steve Seitz
  30. 30. 1970’s: some progress on interpreting selected images The representation and matching of pictorial structures Fischler and Elschlager, 1973
  31. 31. 1970’s: some progress on interpreting selected images The representation and matching of pictorial structures Fischler and Elschlager, 1973
  32. 32. 1980’s: ANNs come and go; shift toward geometry and increased mathematical rigor Image credit: Rick Szeliski
  33. 33. Goodbye science
  34. 34. 1990’s: face recognition; statistical analysis in vogue Image credit: Rick Szeliski
  35. 35. 2000’s: broader recognition; large annotated datasets available; video processing starts Image credit: Rick Szeliski
  36. 36. 2010’s: resurgence of deep learning [AlexNet NIPS 2012] [DeepFace CVPR 2014] [DeepPose CVPR 2014] [Show, Attend and Tell ICML 2015]
  37. 37. 2020’s: autonomous vehicles
  38. 38. 2030’s: robot uprising?
  39. 39. 5-mins break
  40. 40. Examples of Computer Vision Applications • How is computer vision used today?
  41. 41. Face detection • Most digital cameras and smart phones detect faces (and more) • Canon, Sony, Fuji, … • For smart focus, exposure compensation, and cropping Slide credit: Steve Seitz
  42. 42. Face recognition Facebook face auto-tagging
  43. 43. Face Landmark Alignment TAAZ.com: Virtual makeover Face morphing Jia-Bin Huang Miss Daegu 2013 Contestants Face Morphing
  44. 44. Face Landmark Alignment- Pose Estimation Pitch: ~ 15 degree Yaw: ~ +20, -20 degree Jia-Bin Huang What's the Best Pose for a Selfie?
  45. 45. Face Landmark Alignment – 3D Persona What Makes Tom Hanks Look Like Tom Hanks ICCV 2015
  46. 46. Video-based face replacement
  47. 47. Smile Detection Sony Cyber-shot® T70 Digital Still Camera Slide credit: Steve Seitz
  48. 48. Eye contact detection Gaze locking, UIST 2013
  49. 49. Vision-based Biometrics “How the Afghan Girl was Identified by Her Iris Patterns” Read the story wikipedia Slide credit: Steve Seitz
  50. 50. Vision-based Biometrics
  51. 51. Optical Character Recognition (OCR) Digit recognition, AT&T labs http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/Automatic_number_plate_recognition • Technology to convert scanned docs to text • If you have a scanner, it probably came with OCR software Slide credit: Steve Seitz
  52. 52. Computer vision in sports Hawk-Eye: helping/improving referee decisions
  53. 53. Computer vision in sports SportVision: improving viewer experiences
  54. 54. Computer vision in sports Replay Technologies: improving viewer experiences
  55. 55. Computer vision in sports Play tracking
  56. 56. Computer vision in sports Second Spectrum: visual analytics
  57. 57. Visual recognition for photo organization Google photo
  58. 58. Matching street clothing photos in online Shops Street2shop, ICCV 2015
  59. 59. 3D scanner from a mobile camera MobileFusion
  60. 60. Earth viewers (3D modeling) Image from Microsoft’s Virtual Earth (see also: Google Earth) Slide credit: Steve Seitz
  61. 61. 3D from thousands of images Building Rome in a Day: Agarwal et al. 2009
  62. 62. 3D from thousands of images [Furukawa et al. CVPR 2010]
  63. 63. Microsoft PhotoSynth: Photo Tourism
  64. 64. MS PhotoSynth in CSI
  65. 65. Indoor Scene Reconstruction Structured Indoor Modeling, ICCV2015
  66. 66. Tour into picture Adobe Affer Effect
  67. 67. First-person Hyperlapse Videos [Kopf et al. SIGGRAPH 2014]
  68. 68. 3D Time-lapse from Internet Photos 3D Time-lapse from Internet Photos, ICCV 2015
  69. 69. Special effects: matting and composition Kylie Minogue - Come Into My World
  70. 70. Style transfer A Neural Algorithm of Artistic Style [Gatys et al. 2015] Target image (Content)Source image (Style) Output (deepart)
  71. 71. The Matrix movies, ESC Entertainment, XYZRGB, NRC Special effects: shape capture Slide credit: Steve Seitz
  72. 72. Pirates of the Carribean, Industrial Light and Magic Special effects: motion capture Slide credit: Steve Seitz
  73. 73. Google cars Google in talks with Ford, Toyota and Volkswagen to realise driverless cars http://www.theatlantic.com/technology/archive/2014/05/all-the-world-a-track- the-trick-that-makes-googles-self-driving-cars-work/370871/
  74. 74. Interactive Games: Kinect • Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o • Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg • 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A • Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
  75. 75. Vision in space Vision systems (JPL) used for several tasks • Panorama stitching • 3D terrain modeling • Obstacle detection, position tracking • For more, read “Computer Vision on Mars” by Matthies et al. NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007.
  76. 76. Industrial robots Vision-guided robots position nut runners on wheels http://www.automationworld.com/computer-vision-opportunity-or-threat
  77. 77. Mobile robots http://www.robocup.org/NASA’s Mars Spirit Rover Saxena et al. 2008 STAIR at Stanford http://www.youtube.com/w atch?v=DF39Ygp53mQ
  78. 78. Medical imaging Image guided surgery Grimson et al., MIT3D imaging MRI, CT
  79. 79. Computer vision for healthcare BiliCam, Ubicomp 2014
  80. 80. Computer vision for healthcare Autism Screening
  81. 81. Computer vision for healthcare Video magnification
  82. 82. Computer vision for the mass Counting cells Analyzing social effects of green space
  83. 83. 20-mins coffee break
  84. 84. Fundamentals of Computer Vision • Light – What an image records • Matching – How to measure the similarity of two regions • Alignment – How to align points/patches – How to recover transformation parameters based on matched points • Geometry – How to relate world coordinates and image coordinates • Grouping – What points/regions/lines belong together? • Categorization – What similarities are important? Slide credit: Derek Hoiem
  85. 85. Image Formation
  86. 86. Digital camera A digital camera replaces film with a sensor array • Each cell in the array is light-sensitive diode that converts photons to electrons • Two common types: • Charge Coupled Device (CCD): larger yet slower, better quality • Complementary Metal Oxide Semiconductor (CMOS): high bandwidth, lower quality • http://electronics.howstuffworks.com/digital-camera.htm Slide by Steve Seitz
  87. 87. Sensor Array CMOS sensor CCD sensor
  88. 88. 0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99 0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91 0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92 0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95 0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85 0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33 0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74 0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93 0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99 0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97 0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
  89. 89. Perception of Intensity from Ted Adelson • A is brighter? • B is brighter? • They are equally bright
  90. 90. Perception of Intensity from Ted Adelson
  91. 91. Digital Color Images
  92. 92. Color Image R G B
  93. 93. Image filtering • Linear filtering: function is a weighted sum/difference of pixel values • Really important! • Enhance images • Denoise, smooth, increase contrast, etc. • Extract information from images • Texture, edges, distinctive points, etc. • Detect patterns • Template matching
  94. 94. 111 111 111 Slide credit: David Lowe (UBC) ],[g  Example: box filter
  95. 95. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk   [.,.]h[.,.]f Image filtering 111 111 111 ],[g 
  96. 96. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  97. 97. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  98. 98. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  99. 99. 0 10 20 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  100. 100. 0 10 20 30 30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ? ],[],[],[ , lnkmflkgnmh lk  
  101. 101. 0 10 20 30 30 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ? ],[],[],[ , lnkmflkgnmh lk  
  102. 102. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 90 0 90 90 90 0 0 0 0 0 90 90 90 90 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 20 30 30 30 20 10 0 20 40 60 60 60 40 20 0 30 60 90 90 90 60 30 0 30 50 80 80 90 60 30 0 30 50 80 80 90 60 30 0 20 30 50 50 60 40 20 10 20 30 30 30 30 20 10 10 10 10 0 0 0 0 0 [.,.]h[.,.]f Image filtering 111 111 111 ],[g  Credit: S. Seitz ],[],[],[ , lnkmflkgnmh lk  
  103. 103. What does it do? • Replaces each pixel with an average of its neighborhood • Achieve smoothing effect (remove sharp features) 111 111 111 Slide credit: David Lowe (UBC) ],[g  Box Filter
  104. 104. Smoothing with box filter
  105. 105. Practice with linear filters 000 010 000 Original ? Source: D. Lowe
  106. 106. Practice with linear filters 000 010 000 Original Filtered (no change) Source: D. Lowe
  107. 107. Practice with linear filters 000 100 000 Original ? Source: D. Lowe
  108. 108. Practice with linear filters 000 100 000 Original Shifted left By 1 pixel Source: D. Lowe
  109. 109. Practice with linear filters Original 111 111 111 000 020 000 - ? (Note that filter sums to 1) Source: D. Lowe
  110. 110. Practice with linear filters Original 111 111 111 000 020 000 - Sharpening filter - Accentuates differences with local average Source: D. Lowe
  111. 111. Sharpening Source: D. Lowe
  112. 112. Other filters -101 -202 -101 Vertical Edge (absolute value) Sobel
  113. 113. Other filters -1-2-1 000 121 Horizontal Edge (absolute value) Sobel
  114. 114. Basic gradient filters 000 10-1 000 010 000 0-10 10-1 or Horizontal Gradient Vertical Gradient 1 0 -1 or
  115. 115. Demo - image filtering
  116. 116. Correspondence and alignment • Correspondence: matching points, patches, edges, or regions across images ≈
  117. 117. Correspondence and alignment • Alignment: solving the transformation that makes two things match better
  118. 118. A Example: fitting an 2D shape template Slide from Silvio Savarese
  119. 119. Example: fitting a 3D object model Slide from Silvio Savarese
  120. 120. Example: estimating “fundamental matrix” that corresponds two views Slide from Silvio Savarese
  121. 121. Example: tracking points frame 0 frame 22 frame 49 x x x
  122. 122. Overview of Keypoint Matching K. Grauman, B. Leibe Af Bf A1 A2 A3 Tffd BA ),( 1. Find a set of distinctive key-points 3. Extract and normalize the region content 2. Define a region around each keypoint 4. Compute a local descriptor from the normalized region 5. Match local descriptors
  123. 123. Goals for Keypoints Detect points that are repeatable and distinctive
  124. 124. Key trade-offs More Repeatable More Points A1 A2 A3 Detection More Distinctive More Flexible Description Robust to occlusion Works with less texture Minimize wrong matches Robust to expected variations Maximize correct matches Robust detection Precise localization
  125. 125. Choosing interest points Where would you tell your friend to meet you?
  126. 126. Choosing interest points Where would you tell your friend to meet you?
  127. 127. Local Descriptors: SIFT Descriptor [Lowe, ICCV 1999] Histogram of oriented gradients • Captures important texture information • Robust to small translations / affine deformations K. Grauman, B. Leibe
  128. 128. Examples of recognized objects
  129. 129. Demo – interest point detection
  130. 130. Single-view Geometry How tall is this woman? Which ball is closer? How high is the camera? What is the camera rotation? What is the focal length of the camera?
  131. 131. Single-view Geometry Mapping between image and world coordinates • Pinhole camera model • Projective geometry • Vanishing points and lines • Projection matrix
  132. 132. Image formation Let’s design a camera – Idea 1: put a piece of film in front of an object – Do we get a reasonable image? Slide credit: Steve Seitz
  133. 133. Pinhole camera Idea 2: add a barrier to block off most of the rays • This reduces blurring • The opening known as the aperture Slide credit: Steve Seitz
  134. 134. Pinhole camera Figure from David Forsyth f f = focal length c = center of the camera c
  135. 135. Figures © Stephen E. Palmer, 2002 Dimensionality Reduction Machine (3D to 2D) Point of observation 3D world 2D image
  136. 136. Projection can be tricky… Slide source: Seitz
  137. 137. Projection can be tricky… Slide source: Seitz Making of 3D sidewalk art: http://www.youtube.com/watch?v=3SNYtd0Ayt0
  138. 138. Vanishing points and lines Parallel lines in the world intersect in the image at a “vanishing point”
  139. 139. Vanishing points and lines Vanishing point Vanishing line Vanishing point Vertical vanishing point (at infinity) Slide from Efros, Photo from Criminisi
  140. 140. Comparing heights Vanishing Point Slide by Steve Seitz 142
  141. 141. Measuring height 1 2 3 4 5 5.3 2.8 3.3 Camera height Slide by Steve Seitz 143
  142. 142. Image Stitching • Combine two or more overlapping images to make one larger image Add example Slide credit: Vaibhav Vaish
  143. 143. Illustration Camera Center
  144. 144. RANSAC for Homography Initial Matched Points
  145. 145. Final Matched Points RANSAC for Homography
  146. 146. RANSAC for Homography
  147. 147. Bundle Adjustment • New images initialised with rotation, focal length of best matching image
  148. 148. Bundle Adjustment • New images initialised with rotation, focal length of best matching image
  149. 149. Details to make it look good • Choosing seams • Blending
  150. 150. Choosing seams • Better method: dynamic program to find seam along well-matched regions Illustration: http://en.wikipedia.org/wiki/File:Rochester_NY.jpg
  151. 151. Gain compensation • Simple gain adjustment • Compute average RGB intensity of each image in overlapping region • Normalize intensities by ratio of averages
  152. 152. Multi-band Blending • Burt & Adelson 1983 • Blend frequency bands over range  l
  153. 153. Blending comparison (IJCV 2007)
  154. 154. Demo – feature matching and image stitching
  155. 155. Depth from Stereo • Goal: recover depth by finding image coordinate x’ that corresponds to x f x x’ Baseline B z C C’ X f X x x'
  156. 156. Correspondence Problem • We have two images taken from cameras with different intrinsic and extrinsic parameters • How do we match a point in the first image to a point in the second? How can we constrain our search? x ?
  157. 157. Potential matches for x have to lie on the corresponding line l’. Potential matches for x’ have to lie on the corresponding line l. Key idea: Epipolar constraint x x’ X x’ X x’ X
  158. 158. Basic stereo matching algorithm •If necessary, rectify the two stereo images to transform epipolar lines into scanlines •For each pixel x in the first image • Find corresponding epipolar scanline in the right image • Search the scanline and pick the best match x’ • Compute disparity x-x’ and set depth(x) = fB/(x-x’)
  159. 159. 800 most confident matches among 10,000+ local features.
  160. 160. Epipolar lines
  161. 161. Keep only the matches at are “inliers” with respect to the “best” fundamental matrix
  162. 162. The Fundamental Matrix Song
  163. 163. 5-mins break
  164. 164. What do you see in this image? Can I put stuff in it? Forest Trees Bear Man Rabbit Grass Camera
  165. 165. Describe, predict, or interact with the object based on visual cues Is it alive? Is it dangerous? How fast does it run? Is it soft? Does it have a tail? Can I poke with it?
  166. 166. Why do we care about categories? • From an object’s category, we can make predictions about its behavior in the future, beyond of what is immediately perceived. • Pointers to knowledge • Help to understand individual cases not previously encountered • Communication
  167. 167. Image categorization: Fine-grained recognition Visipedia Project
  168. 168. Place recognition Places Database [Zhou et al. NIPS 2014]
  169. 169. Image style recognition [Karayev et al. BMVC 2014]
  170. 170. Training phase Training Labels Training Images Classifier Training Training Image Features Trained Classifier
  171. 171. Testing phase Training Labels Training Images Classifier Training Training Image Features Image Features Testing Test Image Trained Classifier Outdoor PredictionTrained Classifier
  172. 172. Q: What are good features for… • recognizing a beach?
  173. 173. Q: What are good features for… • recognizing cloth fabric?
  174. 174. Q: What are good features for… • recognizing a mug?
  175. 175. What are the right features? Depend on what you want to know! • Object: shape • Local shape info, shading, shadows, texture • Scene : geometric layout • linear perspective, gradients, line segments • Material properties: albedo, feel, hardness • Color, texture • Action: motion • Optical flow, tracked points
  176. 176. “Shallow” vs. “deep” architectures Hand-designed feature extraction Trainable classifier Image/ Video Pixels Object Class Layer 1 Layer N Simple classifier Object Class Image/ Video Pixels Traditional recognition: “Shallow” architecture Deep learning: “Deep” architecture …
  177. 177. Object Detection Search by Sliding Window Detector • May work well for rigid objects • Key idea: simple alignment for simple deformations Object or Background?
  178. 178. Object Detection Search by Parts-based model • Key idea: more flexible alignment for articulated objects • Defined by models of part appearance, geometry or spatial layout, and search algorithm
  179. 179. Demo – face detection and alignment
  180. 180. Vision as part of an intelligent system 3D Scene Feature Extraction Interpretation Action Texture Color Optical Flow Stereo Disparity Grouping Surfaces Bits of objects Sense of depth Objects Agents and goals Shapes and properties Open paths Words Walk, touch, contemplate, smile, evade, read on, pick up, … Motion patterns
  181. 181. Well-Established (patch matching) Major Progress (pattern matching++) New Opportunities (interpretation/tasks) Multi-view Geometry Face Detection/Recognition Object Tracking / Flow Category Detection Human Pose 3D Scene Layout Vision for Robots Life-long Learning Entailment/Prediction Slide credit: Derek Hoiem
  182. 182. This course has provided fundamentals • What is computer vision? • Examples of computer vision applications • Basic principles of computer vision: • Image formation • Filtering • Correspondence and alignment • Geometry • Recognition.
  183. 183. How do you learn more? Explore!
  184. 184. Resources – where to learn more • Awesome Computer Vision • A curated list of awesome computer vision resources • Computer Vision Taiwanese Group • > 2,200 members in facebook • Videolectures and ML&CV Talks • Lectures, Keynotes, Panel Discussions • OpenCV – C++/Python/Java
  185. 185. Thank You! Image: www.clubtuzki.com

×