SlideShare a Scribd company logo
1 of 36
Download to read offline
Attentional Object
    Detection
  Why look for everything everywhere?




             Sergey Karayev
          for UC Berkeley Computer Vision Retreat 2011
Problem:
Recognition and localization of objects
          of multiple classes
         in cluttered scenes.
Proposals




 Detectors     Object Detection




Post-process
Proposals




 Detectors     Object Detection




Post-process
etc.

                Sliding window      Proposals




                                 ...with priors/
Voting                               pruning




                   Efficient
                    search
etc.

              Sliding window    Proposals




•Too slow: quadratic in number of search
dimensions (x,y,scale,class).
•Speed-ups:
 •Parallelization.
 ★Priors/Pruning with non-detector
 features.
 ★Algorithmic efficiency.
Proposals




Priors/pruning




 •Usesnon-detector features (location,
 geometry, context, depth, “objectness”)
 •Often done in post-processing.
Proposals

Currently only works for local features.


Voting



              Efficient
             subwindow
               search
• Priority ordered? How?
Proposals      • Pruned / Exhaustive?

               • Class-specific?




 Detectors




Post-process
• Priority ordered? How?
Proposals      • Pruned / Exhaustive?

               • Class-specific?




 Detectors




Post-process
Detector
                                                    Template/Parts



                                                                   single feature [2]. As a result each stage of the boosting

Local features                                                     process, which selects a new weak classifier, can be viewed
                                                                   as a feature selection process. AdaBoost provides an effec-
                                                                   tive learning algorithm and strong bounds on generalization
                                                                                                                                                    A
                                                                   performance [13, 9, 10].
                                                                       The third major contribution of this paper is a method
                                                                   for combining successively more complex classifiers in a
                                     single feature [2]. As a result eachstructure which dramatically increases the speed of
                                                                   cascade stage of the boosting
                                   single feature [2]. Asa a resultdetector by focusingboosting on promising regions of
                                                                   the each stage of be attention
                                     process, which selects new weak classifier, can theviewed
                                   process, which selects a new weak classifier, can behind focus of attention approaches
                                                                   the image. The notion be viewed
                                     as a feature selection process. AdaBoost provides an effec-                                                    C
                                   as a feature selection process.thatbounds on generalization
                                     tive learning algorithm and strong it is often possible to rapidly determine where in an
                                                                   is AdaBoost provides an effec-
                                                                                                                    A                                 B
                                   tive learning algorithm and image an object might occur [17, 8, 1]. More complex pro-
                                     performance [13, 9, 10].      strong bounds on generalization




                                                                                                                                                            ct
                                                                                                                                                             B
                                   performance [13, 9, contribution of this paper isonly for these promising regions. The
                                         The third major 10]. cessing is reserved a method                               A
                                                                                                                                      Figure 1: Example rectangle features shown r
                                     for combining successivelykey measure of such is a method is the “false negative” rate
                                                                     more complex classifiers in a
                                        The third major contribution of this paper an approach                                        enclosing detection window. The sum of the
                                     cascade structure which dramatically increases process. of must be the case that all, or
                                                                       the complex the speed in
                                                                   ofmoreattentional classifiersIt a




                                                                                                                                                      je
                                   for combining successively

                                                 Decision stumps
                                                                                                                                      lie within the white rectangles are subtracted f
  single feature [2]. As a result the detector by focusing dramaticallypromising regions of of
                                      each stage of the boosting
                                   cascade structure which
                                                                attention on object instances are selected by the attentional
                                                                   almost all,increases the speed                                     of pixels in the grey rectangles. Two-rectangle
  process, which selects a new weak image. Thecan be viewedfocus of attention approaches of
                                     the classifier, notion behind  filter. on promising regions
                                   the that it is often possible attention determine where in an
                                         detector by focusing to rapidly
                                                                                                                    C                                 D
                                                                                                                                      shown in (A) and (B). Figure (C) shows a th
                                     is




                                                                                                                                               re
  as a feature selection process. the image. The notion behind focus ofdescribe a approaches training an extremely sim-
                                    AdaBoost provides an effec- We will attention process for                            C            feature, and (D) a four-rectangle feature.
                                                                                                                                                             D
                                     image an object might occur [17, 8, 1]. More complex pro-
  tive learning algorithm and strong bounds reserved only forple and efficient regions. The an can be used as a “super-
                                                  on generalizationrapidly determine wherewhich
                                   iscessing is often possible tothese promising
                                       that it is                                    classifier in
  performance [13, 9, 10].                                         vised” focus of attention operator.Figure 1: Example rectangle features shown relative to the
                                                                                         A               The term supervised B
                                   image an objectsuch an occur [17, 8, 1]. More complex pro- enclosing detection window. The sum of the pixels which
                                     key measure of might approach is the “false negative” rate
      The third major contribution of this paper isprocess. refers tobe the case regions. or
                                   cessing attentional a methodthese promising that all, The lieoperator 1: white rectangles are subtracted shown relative to the
                                     of the is reserved only for must the fact that the attentional within the trained to rectangleusing features rather than the pixels direct
                                                                   It                                              is                 for
                                                                                                           Figure Example                  features from the sum
  for combining successively more measureobject instancesdetect is the “false negative” rate of In the domain of face
                                   key complexof such an approach examples the a particular class. pixels in thedetection window. The reason is that are which act to en
                                     almost all, classifiers in are selected by of attentional
                                                                    a                                                                 common                    features can
                                                                                                                       grey rectangles. Two-rectangleof the pixels
                                                                                                           enclosing false neg-                 sum features is difficult to learn u
  cascade structure which dramatically attentionalthe speed of must be is possiblethatachieve fewer than (A) and (B). Figure (C) shows a three-rectangle
                                   of the increases process. detection it the case to all, or shown in 1% the white rectangles are subtractedthat the sum
                                     filter.                         It                                                                domain knowledge
                                                                                                           lie within                                       from
  the detector by focusing attention We all, object instances are training anby thepositives usingfeature, and (D) a four-rectangle feature. of training data. For this system th
                                   almost will describe a process for selected false attentional a classifier constructed rectangles. Two-rectangle features are
                                          on promising regions of  atives and 40% extremely sim-                                      quantity
                                                                                                           of pixels in the grey
  the image. The notion behind focusandattention classifier which two be used asfeatures. The effect of this filter isD to
                                   filter. of efficient approaches
                                     ple                           from can Harr-like a “super-
                                                                                         C
                                                                                                                                      second critical motivation for features: the f
                                                                                                           shown in where the
                                                                                                                      (A) and (B). Figure (C) showsmuch faster than a pixel-based
                                                                                                                                      system operates a three-rectangle
  is that it is often possible to rapidly determine attention operator.byThe term supervised
                                     vised” focus of where in an   reduce      over one half the number of locations
                                        We will describe a process for training an extremely sim-          feature, and (D) a four-rectangle feature.
• Priority ordered? How?
Proposals      • Pruned / Exhaustive?

               • Class-specific?




               • Local or global feature?
               • Shared parts across classes?
 Detectors     • Cascaded?

               • Confidence ≈ likelihood?




Post-process
• Priority ordered? How?
Proposals      • Pruned / Exhaustive?

               • Class-specific?




               • Local or global feature?
               • Shared parts across classes?
 Detectors     • Cascaded?

               • Confidence ≈ likelihood?




Post-process
Post-process
• Priority ordered? How?
Proposals      • Pruned / Exhaustive?

               • Class-specific?




               • Local or global feature?
               • Shared parts across classes?
 Detectors     • Cascaded?

               • Confidence ≈ likelihood?




               • NMS/Meanshift?
Post-process   • Context? (Inter-object?)
• Priority ordered? How?
Proposals      • Pruned / Exhaustive?

               • Class-specific?




               • Local or global feature?
               • Shared parts across classes?
 Detectors     • Cascaded?

               • Confidence ≈ likelihood?




               • NMS/Meanshift?
Post-process   • Context? (Inter-object?)
Where we are


Cascaded Deformable Part Models.
Per class, ~1 sec / medium-sized image.
Where we are

• PASCAL: ~5K test images, 20 classes. 28
  hours to process.
• ImageNet ’11: ~450K test images, 3000
  classes. 375,000 hours to process.
Where we are


• Standard movie: ~130K frames. 36 hours
 per object class.
So what can we do?
Not look for everything
      everywhere!
New Performance
             Evaluation
• Goal: Be able to stop detection and have the
  most correct detections and the fewest
  incorrect detections at any time.

     AP                      AP


                      vs.
              time                    time
How?
Attention
• Natural bottleneck in animal vision.
• Two kinds:
   • Bottom-up: rapid, driven by
     featurization.
  • Top-down: secondary, driven by task.
 • Eye fixations are a good proxy for implicit
   attention. Necessary because of the fovea.
Tilke Judd
 tjudd@mit.edu          Krista Ehinger
                       kehinger@mit.edu              Fr´ do Durand
                                                       e
                                                  fredo@csail.mit.edu                Anton
                                                                                  torralba
tjudd@mit.edu     kehinger@mit.edu        fredo@csail.mit.edu     torralba

                          Basic ideas
MIT Computer Science Artificial Intelligence Laboratory and MIT Brain and Co
MIT Computer Science Artificial Intelligence Laboratory and MIT Brain and Co

                   Abstract
                   Abstract
      •     Single saliency map from
or many applications in graphics, design, and human
or many applicationsis essential todesign, and human
puter interaction, it in graphics, understand where
ans look in a scene. isfoci eye toattention
            which Where oftracking devices are
puter interaction, it essential        understand where
            are selected.
a viable option, models of saliency can be used to pre-
ans look in a scene. Where eye tracking devices are
 fixation option, models of saliency can be used to pre-
   viable locations. Most saliency approaches are based
      •
fixation locations. Most saliency not consider are based
 ottom-up computation that does approaches top-down
            Sequential selection due
ge semantics and often that doesmatch actual eye move-
ottom-up computation does not not consider top-down
 s. To address “inhibition of return,”
            to this problem, we collected eye tracking
 e semantics and often does not match actual eye move-
  of To viewers on 1003 images and use thiseye tracking
     15 address this problem, we collected database as
 ing and testinginformationmodeldatabase as
            or onexamples to learn use this of saliency
 s.
 of 15 viewers 1003 images and       a
 d on low, maximization.model of saliency
             middle and high-level image features. This
 ing and testing examples to learn a
e databasemiddle and high-level image features. This
d on low, of eye tracking data is publicly available
      •
  this paper.
            Influenced from the top.
   database of eye tracking data is publicly available
 this paper.
                                                          Figure 1. Eye tracking data. We co
ntroduction                                               on 1003 images from 15 viewers to us
                                                          Figure 1. Eye tracking data. We co
model. On average, images contained 4.6 cars and 2.1 pedestrians.   targets (cars or pedestrians) and press a key to indicate co




x




                   d
given in Eqs. (1)–(5) induced by the three main assumptions.



rmined by the scene description S (e.g., vectorial
perties such as global illumination, scene iden-
resent). The product of the likelihood P(IjS) and
Attentional Object Detector

  Assume we have a powerful but expensive
           per-class classifier.
• How should we pick locations to consider?
• What should we look for at a location?
Attentional Object Detector


          Proposals




          Detector
Some related work
Vogel and Freitas. Target-directed attention:
Sequential decision-making for gaze planning. ICRA
                        2008.

                      • GIST and a simple
                        regressor to compute
                        likelihood map.
                      • Reinforcement learning
                        to find best gaze
                        sequence.
                      • “Heavier” feature and
                        regressor to evaluate
                        the fixation locations.
Vogel and Freitas. Target-directed attention:
Sequential decision-making for gaze planning. ICRA
                        2008.



• Evaluated only on Caltech Office scenes.
• Gaze planning improves over just using
  bottom-up saliency while being only slightly
  slower.
• Detection rate is lower than full image, but
  maximum precision is higher.
Gualdi et al. Multi-stage Sampling with Boosting
    200 CascadesPrati, and R. Cucchiara
          G. Gualdi, A. for Pedestrian Detection in Images and
                             Videos. ECCV 2010.

                                                            • LogitBoost classifier
                                                                 with covariance
                                                                 descriptors.
                                                            • Score falls off over
                                                           some region of
Multi-stage Sampling with Boosting Cascades for Pedestrian Detection   203

    Fig. 1. Region of support for the cascade of LogitBoost classifiers trained on INRIA
                                                           support. to 48x144),
    pedestrian dataset, averaged over a total 62 pedestrian patches; (a) a positive patch
    (pedestrian is 48x144); (b-d) response of the classifier: (b) fixed w (equal
                                                                             s
    sliding wx , wy ; (c) fixed wx (equal to x of patch center), sliding ws , wy ; (d) fixed wy
                                                            • Sample points in image
    (equal to y of patch center), sliding wx , ws ; (e) 3D plot of the response in (b).

                                                                 to estimate P(O|I).
    scale variations, i.e. the response of the classifier in the close neighborhood (both
                                                                 Resample close to
    in position and scale) of the window encompassing a pedestrian, remains positive
    (“region of support ” of a positive detection). Having a sufficiently wide region of
                                                                 promising points.
    support allows to uniformly prune the SW S, up to the point of having at least
    one window targeting the region of support of each pedestrian in the frame. V ice
    versa, a too wide region of support could generate de-localized detections [4].
   Distribution of samples important advantage of =
       O n this regard, an across the stages: m         the 5 and
                                                             covariance descriptors is its
Gualdi et al. Multi-stage Sampling with Boosting
 Cascades for Pedestrian Detection in Images and
                 Videos. ECCV 2010.




• Evaluated on INRIA Pedestrians, Graz02, and
  some videos.
• Always reduces miss rate over sliding
  window, while being 2-6x faster.
fewer than 25 successive fixations, this foveated approach
                                                                    provide a useful way to improve the search efficiency of
will be faster than exhaustively applying object detection to
                                                                    specific object detectors, i.e., most regions without objects
   Butko and Movellan. Optimal Scanning for Faster
a high resolution image.
   Two particular challenges are: (1) sequentially picking
                                                                    tend to have low visual saliency [5]. Unfortunately visual
                                                                    saliency filters are computationally expensive [17] and need
            Object Detection. CVPR 2009.
the fixation locations; (2) integrating the information ac-          to be applied to entire images, making them less attractive
                                                                    for scanning very high resolution images.
                                                                        Our work also relates to recent work on optimal image
                                                                    search, like the Efficient Subwindow Search [10]. Our ap-
                                                                    proach is data driven and detector independent, where the
                                                                    ESS approach is more analytic. Our approach requires a
                                                                    dataset of labeled images to build a statistical model of
                                                                    the performance of a given object detector. The ESS ap-
                                                                                                 ˆ
                                                                    proach requires a function f that must be constructed ana-

                                                                    • Digital fovea placed
                                                                    lytically for each specific object detector for the guarantees
                                                                    of the algorithm to hold, but only some object detectors are
                                                                    amenable to such a construction. The efficiency of the al-
                                                                         sequentially to maximize
                                                                    gorithm depends on the tightness of the upper bound that f
                                                                    computes and the computational overhead of evaulating f .  ˆ
                                                                                                                                 ˆ


                                                                         expected of Eye-Movement
                                                                    2. I-POMDP: A Model
                                                                                        information gain.
                                                                    • Liken it to stochastic
                                                                        Najemnik & Geisler developed an information maxi-
                                                                    mization (Infomax) model of eye-movements and applied
                                                                    it to explain visual search of simple objects in pink noise
                                                                         optimal control, and use a
                                                                    image backgrounds [12]. The model uses a greedy search
                                                                    approach: saccades are planned one at a time with the next
                                                                         “multinomial infomax
                                                                    saccade made to the location in the image plane that is ex-
                                                                    pected to yield the highest chance of correctly guessing the
                                                                         POMDP” to pick the
                                                                    target location. The Najemnik & Geisler model success-
                                                                    fully captured some aspects of human saccades but it has

                                                                         sequence.
                                                                    two important limitations: (1) Its fixation policy is greedy,
                                                                    i.e., it maximizes the instantaneous information gain rather
                                                                    than the long term gathering of information. (2) It is appli-
                                                                    cable only to artificially constructed images.
                                                                        Butko & Movellan [4] proposed the I-POMDP frame-
                                                                    work for modeling visual search. The framework ex-
Figure 1. A digital fovea: Several concentric Image Patches (IPs)   tends the Najemnik & Geisler model by applying long-term
(Top) are arranged around a point of fixation. The image por-        POMPDP planning methods. They showed that long-term
tions contained within each rectangle are reduced to a common       information maximization reduces search time. Moreover
Butko and Movellan. Optimal Scanning for Faster
                  Object Detection. CVPR 2009.




    Fixation 1             Fixation 2              Fixation 3                                 4

                                                                                             3.5


                                                                           • Evaluate on own faces
                                                                                              3
                                                                                                                              I!POMDP
                                                                                                                              Viola Jones




                                                                        Error (grid cells)
                                                                                                   dataset against V-J. 2x
                                                                                             2.5

                                                                                              2
    Fixation 4             Fixation 5              Fixation 6
                                                                                             1.5   speedup, but small
                                                                                              1    decrease in accuracy.
                                                                                             0.5

                                                                                              0
                                                                                               0     0.02     0.04     0.06       0.08      0.1
                                                                                                            Runtime (seconds)
Figure 6. Successive fixation choices by the MI-POMDP policy.
The face is found in six fixations. The final estimation of the face     Figure 8. By changing the Viola Jones scaling factor, both Viola
location is one grid-cell diagonal from the labeled location, giving   Jones and I-POMDP become faster and less accurate. MI-POMDP
a euclidean distance error of 1.4 grid-cells.                          is usually closer to the origin on a time-error curve, showing that
                                                                       it gives a better speed-accuracy tradeoff than just applying Viola
                                                                       Jones.
crease in accuracy, as shown in the Table below. Both meth-
ods on average placed the face between one and two grid-
cells off the true face location.                                      4.2. Speed-Accuracy Tradeoff
Vijayanarasimhan and Kapoor. Visual Recognition and
    Detection Under Bounded Computational Resources.
                        CVPR 2010.
                                                                                                       Computation
                                                               Feature          Channel        Dim
                                                                                                         time (ms)
                                                                SIFT          R, G, B, Gray 128             0.21
                                                           P64 Figure 3. 17 grid weights learnt for each category in the ETHZ
                                                               T1a S2 The         Gray          68           1.2
                                                               shape dataset.
                                                            P18 T2 S2 9           Gray          36          0.09
                                                 Table 2. Attributes of theresources. used in the experiments.
                                                            of computational features
                                                        • Hough voting with multiplethe INRIA
                                                             Datasets: We use two challenging object detection
                                                          datasets namely, the ETHZ shape dataset and
                                                  (five in our feature types.to compare against several state-of-
                                                                  experiments) order generate an initial set of
                                                                   horses dataset in and
                                                  potheses. Then, we run each selection strategy iterativ
                                                                   the-art hough based detection approaches [21, 24, 11, 10].
             Figure 2. A summary of our algorithm.
                                                  updating• Uses Value ofisInformationweighted a fix
                                                                   hypotheses as dataset contains 255 the to for five
                                                               the The ETHZ probability then modeled asgiraffes, mugssum
                                                                                   shape                       images
                                                                                            features get added until and
                                                                   shape-based classes (applelogos, bottles,
                                                                    conditional
        |f ). This term depends on the feature f which is timethe probabilities(1 its lookneighbors: the
                                                  amount of pick region of horsesin our case). 170 images
                                                                   swans). lapsed to nearest atcontains
                                                                    of has The INRIA sec dataset and
(O,x)
i
to be extracted.
However, since we are only trying to determine the
                                                      In type best feature to qualitative 170 imagescomp
                                                                    5 we or more some extract. results with-
                                                          Figure withthe category.side-views of horses and objects occur in
                                                                   out
                                                                         one
                                                                               show In both the datasets,
                                                  ing the first highly cluttered natural scenes with large variations in both
eature to extract, we instead estimate the expected value          1000 points selected by our p(h|f ) select
                                                                                p(gi
                                                                                     (O,x)
                                                                                           |f, l) =        qi active
                                                                                                            h
                                                                                                                            (2)
he term p(gi
                (O,x)
                                                              • scale passive selection h∈N (f
                                                  approach Active approach extracts less ob-
                      |f ) for every feature type t. We do this and theand appearance, and sometimes) contain The first r
                                                                                                    baseline. multiple
                                                  contains example imagesfeature inlessfair comparisons. qih ET
                                                                  features, takeseverydatabase FOand = time, and
considering all the features in the training database that         jects per image. We use the same training and testing setup
of type t and obtain the average value of the term. The            used by h[10] on both datasets for category in the
                                                                    where       is a from the
                                                  shape, the has higher accuracy on ETHZpoints
ure type with the largest value can be interpreted as the second row refers third conditional probability for part
                                                                        (O,x)
                                                                              |h, l) and to the rows show the
                                                                       Implementation Details: Parameter learning of the
                                                                    p(gi
  that is expected to provide the best evidence for object the grid model is performed by first scaling strategies, truth
                                                  lected by and Horses. fixed selectionisallmodelour exper-   a the ground resp
                                                                   active and random height term pixels in parameter
                                                                    presence given the features. This
   gi . For example, for the “body” of a giraffe, texture-         bounding boxes to a                (100
                                                                    that needs to be estimated from the training data for every
                                                  tively. Brightiments) denote selectedgaspect ratio. Then the points
ed features could provide the best evidence and there-              feature h and every grid part feature points.
                                                                     dots while preserving the i . And,
                                    (O,x)                      are uniformly sampled along the edges (using a Canny edge
Image Attributions
•   Girschick et al. - Cascaded deformable
    part models.
•   Viola & Jones - Rapid object detection.
•   Judd et al. - Learning to predict where
    humans looks.
•   Chikkerur et al. - What and where? A
    Bayesian theory of attention.
•   ...and the papers reviewed.

More Related Content

What's hot

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Seonho Park
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesKen Chatfield
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesDmytro Mishkin
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Dmytro Mishkin
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015Jia-Bin Huang
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IWanjin Yu
 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper ReviewLEE HOSEONG
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Vishal Mishra
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionLEE HOSEONG
 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadJunsik Whang
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for BeginnersSanghamitra Deb
 

What's hot (20)

Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
 
Devil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet FeaturesDevil in the Details: Analysing the Performance of ConvNet Features
Devil in the Details: Analysing the Performance of ConvNet Features
 
CNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent AdvancesCNNs: from the Basics to Recent Advances
CNNs: from the Basics to Recent Advances
 
Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...Convolutional neural networks for image classification — evidence from Kaggle...
Convolutional neural networks for image classification — evidence from Kaggle...
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
 
AlexNet
AlexNetAlexNet
AlexNet
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015Lecture 29 Convolutional Neural Networks -  Computer Vision Spring2015
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
 
Architecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks IArchitecture Design for Deep Neural Networks I
Architecture Design for Deep Neural Networks I
 
"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review"Revisiting self supervised visual representation learning" Paper Review
"Revisiting self supervised visual representation learning" Paper Review
 
Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.Convolutional Neural Network and RNN for OCR problem.
Convolutional Neural Network and RNN for OCR problem.
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
Unsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-SupervisionUnsupervised visual representation learning overview: Toward Self-Supervision
Unsupervised visual representation learning overview: Toward Self-Supervision
 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_upload
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 

Viewers also liked

Reese ICT4902 Information Overload
Reese ICT4902 Information OverloadReese ICT4902 Information Overload
Reese ICT4902 Information OverloadMichael Reese
 
PEShare.co.uk Shared Resource
PEShare.co.uk Shared ResourcePEShare.co.uk Shared Resource
PEShare.co.uk Shared Resourcepeshare.co.uk
 
Session 4 concentration lesson 2
Session 4 concentration lesson 2Session 4 concentration lesson 2
Session 4 concentration lesson 2neilmcgraw
 
Computational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cuesComputational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cuesAkisato Kimura
 
FW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and AnxietyFW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and AnxietyMatt Sanders
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
FW279 Intro to Sport Psychology
FW279 Intro to Sport PsychologyFW279 Intro to Sport Psychology
FW279 Intro to Sport PsychologyMatt Sanders
 
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...Trevor van Gorp
 
The Psychology of Sport & Exercise
The Psychology of Sport & Exercise The Psychology of Sport & Exercise
The Psychology of Sport & Exercise PsychFutures
 
Intrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivationIntrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivationBrandon Lum
 
Intrinsic and Extrinsic Motivation
Intrinsic and Extrinsic MotivationIntrinsic and Extrinsic Motivation
Intrinsic and Extrinsic MotivationTantri Sundari
 
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...NephroTube - Dr.Gawad
 
Extrinsic motivation and goal-setting
Extrinsic motivation and goal-settingExtrinsic motivation and goal-setting
Extrinsic motivation and goal-settingJames Neill
 
Types of Motivation
Types of MotivationTypes of Motivation
Types of Motivationvirrajill
 

Viewers also liked (18)

What is attentional blink
What is attentional blinkWhat is attentional blink
What is attentional blink
 
Reese ICT4902 Information Overload
Reese ICT4902 Information OverloadReese ICT4902 Information Overload
Reese ICT4902 Information Overload
 
PEShare.co.uk Shared Resource
PEShare.co.uk Shared ResourcePEShare.co.uk Shared Resource
PEShare.co.uk Shared Resource
 
Session 4 concentration lesson 2
Session 4 concentration lesson 2Session 4 concentration lesson 2
Session 4 concentration lesson 2
 
Face Detection
Face DetectionFace Detection
Face Detection
 
Computational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cuesComputational models of human visual attention driven by auditory cues
Computational models of human visual attention driven by auditory cues
 
FW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and AnxietyFW279 Arousal, Stress, and Anxiety
FW279 Arousal, Stress, and Anxiety
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
FW279 Intro to Sport Psychology
FW279 Intro to Sport PsychologyFW279 Intro to Sport Psychology
FW279 Intro to Sport Psychology
 
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
Emotion, Arousal, Attention and Flow: Chaining Emotional States to Improve Hu...
 
The Psychology of Sport & Exercise
The Psychology of Sport & Exercise The Psychology of Sport & Exercise
The Psychology of Sport & Exercise
 
Intrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivationIntrinsic vs extrinsic motivation
Intrinsic vs extrinsic motivation
 
Intrinsic and Extrinsic Motivation
Intrinsic and Extrinsic MotivationIntrinsic and Extrinsic Motivation
Intrinsic and Extrinsic Motivation
 
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
Renal Physiology (IV) - Osmoregulation(Urine Dilution & Concentration) - Dr. ...
 
Extrinsic motivation and goal-setting
Extrinsic motivation and goal-settingExtrinsic motivation and goal-setting
Extrinsic motivation and goal-setting
 
Types of Motivation
Types of MotivationTypes of Motivation
Types of Motivation
 
Motivation ppt
Motivation pptMotivation ppt
Motivation ppt
 

Similar to Attentional Object Detection - introductory slides.

A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...IRJET Journal
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slideswolf
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback dannyijwest
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshiftirisshicat
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_papershanullah3
 
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...IDES Editor
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
 
Paper on experimental setup for verifying - "Slow Learners are Fast"
Paper  on experimental setup for verifying  - "Slow Learners are Fast"Paper  on experimental setup for verifying  - "Slow Learners are Fast"
Paper on experimental setup for verifying - "Slow Learners are Fast"Robin Srivastava
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET Journal
 
A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification TechniquesIRJET Journal
 
Review paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractionReview paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractioneSAT Journals
 
Threshold adaptation and XOR accumulation algorithm for objects detection
Threshold adaptation and XOR accumulation algorithm for  objects detectionThreshold adaptation and XOR accumulation algorithm for  objects detection
Threshold adaptation and XOR accumulation algorithm for objects detectionIJECEIAES
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET Journal
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsIRJET Journal
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Batyrbek Ryskhan
 

Similar to Attentional Object Detection - introductory slides. (20)

A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...A Review on Color Recognition using Deep Learning and Different Image Segment...
A Review on Color Recognition using Deep Learning and Different Image Segment...
 
Avihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slidesAvihu Efrat's Viola and Jones face detection slides
Avihu Efrat's Viola and Jones face detection slides
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
 
Pami meanshift
Pami meanshiftPami meanshift
Pami meanshift
 
Efficient de cvpr_2020_paper
Efficient de cvpr_2020_paperEfficient de cvpr_2020_paper
Efficient de cvpr_2020_paper
 
PR12-CAM
PR12-CAMPR12-CAM
PR12-CAM
 
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
Land Cover Feature Extraction using Hybrid Swarm Intelligence Techniques - A ...
 
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
 
Paper on experimental setup for verifying - "Slow Learners are Fast"
Paper  on experimental setup for verifying  - "Slow Learners are Fast"Paper  on experimental setup for verifying  - "Slow Learners are Fast"
Paper on experimental setup for verifying - "Slow Learners are Fast"
 
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
IRJET - Factors Affecting Deployment of Deep Learning based Face Recognition ...
 
A Review of Image Classification Techniques
A Review of Image Classification TechniquesA Review of Image Classification Techniques
A Review of Image Classification Techniques
 
Review paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extractionReview paper on segmentation methods for multiobject feature extraction
Review paper on segmentation methods for multiobject feature extraction
 
Threshold adaptation and XOR accumulation algorithm for objects detection
Threshold adaptation and XOR accumulation algorithm for  objects detectionThreshold adaptation and XOR accumulation algorithm for  objects detection
Threshold adaptation and XOR accumulation algorithm for objects detection
 
D04402024029
D04402024029D04402024029
D04402024029
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Gc2005vk
Gc2005vkGc2005vk
Gc2005vk
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its Variants
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
Face detection ppt by Batyrbek
Face detection ppt by Batyrbek Face detection ppt by Batyrbek
Face detection ppt by Batyrbek
 

More from Sergey Karayev

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningSergey Karayev
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningSergey Karayev
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningSergey Karayev
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSergey Karayev
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningSergey Karayev
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019Sergey Karayev
 

More from Sergey Karayev (14)

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
 
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
Lecture 11: ML Deployment & Monitoring (Full Stack Deep Learning - Spring 2021)
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep LearningTroubleshooting Deep Neural Networks - Full Stack Deep Learning
Troubleshooting Deep Neural Networks - Full Stack Deep Learning
 
Setting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep LearningSetting up Machine Learning Projects - Full Stack Deep Learning
Setting up Machine Learning Projects - Full Stack Deep Learning
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
 

Recently uploaded

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 

Recently uploaded (20)

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 

Attentional Object Detection - introductory slides.

  • 1. Attentional Object Detection Why look for everything everywhere? Sergey Karayev for UC Berkeley Computer Vision Retreat 2011
  • 2. Problem: Recognition and localization of objects of multiple classes in cluttered scenes.
  • 3. Proposals Detectors Object Detection Post-process
  • 4. Proposals Detectors Object Detection Post-process
  • 5. etc. Sliding window Proposals ...with priors/ Voting pruning Efficient search
  • 6. etc. Sliding window Proposals •Too slow: quadratic in number of search dimensions (x,y,scale,class). •Speed-ups: •Parallelization. ★Priors/Pruning with non-detector features. ★Algorithmic efficiency.
  • 7. Proposals Priors/pruning •Usesnon-detector features (location, geometry, context, depth, “objectness”) •Often done in post-processing.
  • 8. Proposals Currently only works for local features. Voting Efficient subwindow search
  • 9. • Priority ordered? How? Proposals • Pruned / Exhaustive? • Class-specific? Detectors Post-process
  • 10. • Priority ordered? How? Proposals • Pruned / Exhaustive? • Class-specific? Detectors Post-process
  • 11. Detector Template/Parts single feature [2]. As a result each stage of the boosting Local features process, which selects a new weak classifier, can be viewed as a feature selection process. AdaBoost provides an effec- tive learning algorithm and strong bounds on generalization A performance [13, 9, 10]. The third major contribution of this paper is a method for combining successively more complex classifiers in a single feature [2]. As a result eachstructure which dramatically increases the speed of cascade stage of the boosting single feature [2]. Asa a resultdetector by focusingboosting on promising regions of the each stage of be attention process, which selects new weak classifier, can theviewed process, which selects a new weak classifier, can behind focus of attention approaches the image. The notion be viewed as a feature selection process. AdaBoost provides an effec- C as a feature selection process.thatbounds on generalization tive learning algorithm and strong it is often possible to rapidly determine where in an is AdaBoost provides an effec- A B tive learning algorithm and image an object might occur [17, 8, 1]. More complex pro- performance [13, 9, 10]. strong bounds on generalization ct B performance [13, 9, contribution of this paper isonly for these promising regions. The The third major 10]. cessing is reserved a method A Figure 1: Example rectangle features shown r for combining successivelykey measure of such is a method is the “false negative” rate more complex classifiers in a The third major contribution of this paper an approach enclosing detection window. The sum of the cascade structure which dramatically increases process. of must be the case that all, or the complex the speed in ofmoreattentional classifiersIt a je for combining successively Decision stumps lie within the white rectangles are subtracted f single feature [2]. As a result the detector by focusing dramaticallypromising regions of of each stage of the boosting cascade structure which attention on object instances are selected by the attentional almost all,increases the speed of pixels in the grey rectangles. Two-rectangle process, which selects a new weak image. Thecan be viewedfocus of attention approaches of the classifier, notion behind filter. on promising regions the that it is often possible attention determine where in an detector by focusing to rapidly C D shown in (A) and (B). Figure (C) shows a th is re as a feature selection process. the image. The notion behind focus ofdescribe a approaches training an extremely sim- AdaBoost provides an effec- We will attention process for C feature, and (D) a four-rectangle feature. D image an object might occur [17, 8, 1]. More complex pro- tive learning algorithm and strong bounds reserved only forple and efficient regions. The an can be used as a “super- on generalizationrapidly determine wherewhich iscessing is often possible tothese promising that it is classifier in performance [13, 9, 10]. vised” focus of attention operator.Figure 1: Example rectangle features shown relative to the A The term supervised B image an objectsuch an occur [17, 8, 1]. More complex pro- enclosing detection window. The sum of the pixels which key measure of might approach is the “false negative” rate The third major contribution of this paper isprocess. refers tobe the case regions. or cessing attentional a methodthese promising that all, The lieoperator 1: white rectangles are subtracted shown relative to the of the is reserved only for must the fact that the attentional within the trained to rectangleusing features rather than the pixels direct It is for Figure Example features from the sum for combining successively more measureobject instancesdetect is the “false negative” rate of In the domain of face key complexof such an approach examples the a particular class. pixels in thedetection window. The reason is that are which act to en almost all, classifiers in are selected by of attentional a common features can grey rectangles. Two-rectangleof the pixels enclosing false neg- sum features is difficult to learn u cascade structure which dramatically attentionalthe speed of must be is possiblethatachieve fewer than (A) and (B). Figure (C) shows a three-rectangle of the increases process. detection it the case to all, or shown in 1% the white rectangles are subtractedthat the sum filter. It domain knowledge lie within from the detector by focusing attention We all, object instances are training anby thepositives usingfeature, and (D) a four-rectangle feature. of training data. For this system th almost will describe a process for selected false attentional a classifier constructed rectangles. Two-rectangle features are on promising regions of atives and 40% extremely sim- quantity of pixels in the grey the image. The notion behind focusandattention classifier which two be used asfeatures. The effect of this filter isD to filter. of efficient approaches ple from can Harr-like a “super- C second critical motivation for features: the f shown in where the (A) and (B). Figure (C) showsmuch faster than a pixel-based system operates a three-rectangle is that it is often possible to rapidly determine attention operator.byThe term supervised vised” focus of where in an reduce over one half the number of locations We will describe a process for training an extremely sim- feature, and (D) a four-rectangle feature.
  • 12. • Priority ordered? How? Proposals • Pruned / Exhaustive? • Class-specific? • Local or global feature? • Shared parts across classes? Detectors • Cascaded? • Confidence ≈ likelihood? Post-process
  • 13. • Priority ordered? How? Proposals • Pruned / Exhaustive? • Class-specific? • Local or global feature? • Shared parts across classes? Detectors • Cascaded? • Confidence ≈ likelihood? Post-process
  • 15. • Priority ordered? How? Proposals • Pruned / Exhaustive? • Class-specific? • Local or global feature? • Shared parts across classes? Detectors • Cascaded? • Confidence ≈ likelihood? • NMS/Meanshift? Post-process • Context? (Inter-object?)
  • 16. • Priority ordered? How? Proposals • Pruned / Exhaustive? • Class-specific? • Local or global feature? • Shared parts across classes? Detectors • Cascaded? • Confidence ≈ likelihood? • NMS/Meanshift? Post-process • Context? (Inter-object?)
  • 17. Where we are Cascaded Deformable Part Models. Per class, ~1 sec / medium-sized image.
  • 18. Where we are • PASCAL: ~5K test images, 20 classes. 28 hours to process. • ImageNet ’11: ~450K test images, 3000 classes. 375,000 hours to process.
  • 19. Where we are • Standard movie: ~130K frames. 36 hours per object class.
  • 20. So what can we do? Not look for everything everywhere!
  • 21. New Performance Evaluation • Goal: Be able to stop detection and have the most correct detections and the fewest incorrect detections at any time. AP AP vs. time time
  • 22. How?
  • 23. Attention • Natural bottleneck in animal vision. • Two kinds: • Bottom-up: rapid, driven by featurization. • Top-down: secondary, driven by task. • Eye fixations are a good proxy for implicit attention. Necessary because of the fovea.
  • 24. Tilke Judd tjudd@mit.edu Krista Ehinger kehinger@mit.edu Fr´ do Durand e fredo@csail.mit.edu Anton torralba tjudd@mit.edu kehinger@mit.edu fredo@csail.mit.edu torralba Basic ideas MIT Computer Science Artificial Intelligence Laboratory and MIT Brain and Co MIT Computer Science Artificial Intelligence Laboratory and MIT Brain and Co Abstract Abstract • Single saliency map from or many applications in graphics, design, and human or many applicationsis essential todesign, and human puter interaction, it in graphics, understand where ans look in a scene. isfoci eye toattention which Where oftracking devices are puter interaction, it essential understand where are selected. a viable option, models of saliency can be used to pre- ans look in a scene. Where eye tracking devices are fixation option, models of saliency can be used to pre- viable locations. Most saliency approaches are based • fixation locations. Most saliency not consider are based ottom-up computation that does approaches top-down Sequential selection due ge semantics and often that doesmatch actual eye move- ottom-up computation does not not consider top-down s. To address “inhibition of return,” to this problem, we collected eye tracking e semantics and often does not match actual eye move- of To viewers on 1003 images and use thiseye tracking 15 address this problem, we collected database as ing and testinginformationmodeldatabase as or onexamples to learn use this of saliency s. of 15 viewers 1003 images and a d on low, maximization.model of saliency middle and high-level image features. This ing and testing examples to learn a e databasemiddle and high-level image features. This d on low, of eye tracking data is publicly available • this paper. Influenced from the top. database of eye tracking data is publicly available this paper. Figure 1. Eye tracking data. We co ntroduction on 1003 images from 15 viewers to us Figure 1. Eye tracking data. We co
  • 25. model. On average, images contained 4.6 cars and 2.1 pedestrians. targets (cars or pedestrians) and press a key to indicate co x d given in Eqs. (1)–(5) induced by the three main assumptions. rmined by the scene description S (e.g., vectorial perties such as global illumination, scene iden- resent). The product of the likelihood P(IjS) and
  • 26. Attentional Object Detector Assume we have a powerful but expensive per-class classifier. • How should we pick locations to consider? • What should we look for at a location?
  • 27. Attentional Object Detector Proposals Detector
  • 29. Vogel and Freitas. Target-directed attention: Sequential decision-making for gaze planning. ICRA 2008. • GIST and a simple regressor to compute likelihood map. • Reinforcement learning to find best gaze sequence. • “Heavier” feature and regressor to evaluate the fixation locations.
  • 30. Vogel and Freitas. Target-directed attention: Sequential decision-making for gaze planning. ICRA 2008. • Evaluated only on Caltech Office scenes. • Gaze planning improves over just using bottom-up saliency while being only slightly slower. • Detection rate is lower than full image, but maximum precision is higher.
  • 31. Gualdi et al. Multi-stage Sampling with Boosting 200 CascadesPrati, and R. Cucchiara G. Gualdi, A. for Pedestrian Detection in Images and Videos. ECCV 2010. • LogitBoost classifier with covariance descriptors. • Score falls off over some region of Multi-stage Sampling with Boosting Cascades for Pedestrian Detection 203 Fig. 1. Region of support for the cascade of LogitBoost classifiers trained on INRIA support. to 48x144), pedestrian dataset, averaged over a total 62 pedestrian patches; (a) a positive patch (pedestrian is 48x144); (b-d) response of the classifier: (b) fixed w (equal s sliding wx , wy ; (c) fixed wx (equal to x of patch center), sliding ws , wy ; (d) fixed wy • Sample points in image (equal to y of patch center), sliding wx , ws ; (e) 3D plot of the response in (b). to estimate P(O|I). scale variations, i.e. the response of the classifier in the close neighborhood (both Resample close to in position and scale) of the window encompassing a pedestrian, remains positive (“region of support ” of a positive detection). Having a sufficiently wide region of promising points. support allows to uniformly prune the SW S, up to the point of having at least one window targeting the region of support of each pedestrian in the frame. V ice versa, a too wide region of support could generate de-localized detections [4]. Distribution of samples important advantage of = O n this regard, an across the stages: m the 5 and covariance descriptors is its
  • 32. Gualdi et al. Multi-stage Sampling with Boosting Cascades for Pedestrian Detection in Images and Videos. ECCV 2010. • Evaluated on INRIA Pedestrians, Graz02, and some videos. • Always reduces miss rate over sliding window, while being 2-6x faster.
  • 33. fewer than 25 successive fixations, this foveated approach provide a useful way to improve the search efficiency of will be faster than exhaustively applying object detection to specific object detectors, i.e., most regions without objects Butko and Movellan. Optimal Scanning for Faster a high resolution image. Two particular challenges are: (1) sequentially picking tend to have low visual saliency [5]. Unfortunately visual saliency filters are computationally expensive [17] and need Object Detection. CVPR 2009. the fixation locations; (2) integrating the information ac- to be applied to entire images, making them less attractive for scanning very high resolution images. Our work also relates to recent work on optimal image search, like the Efficient Subwindow Search [10]. Our ap- proach is data driven and detector independent, where the ESS approach is more analytic. Our approach requires a dataset of labeled images to build a statistical model of the performance of a given object detector. The ESS ap- ˆ proach requires a function f that must be constructed ana- • Digital fovea placed lytically for each specific object detector for the guarantees of the algorithm to hold, but only some object detectors are amenable to such a construction. The efficiency of the al- sequentially to maximize gorithm depends on the tightness of the upper bound that f computes and the computational overhead of evaulating f . ˆ ˆ expected of Eye-Movement 2. I-POMDP: A Model information gain. • Liken it to stochastic Najemnik & Geisler developed an information maxi- mization (Infomax) model of eye-movements and applied it to explain visual search of simple objects in pink noise optimal control, and use a image backgrounds [12]. The model uses a greedy search approach: saccades are planned one at a time with the next “multinomial infomax saccade made to the location in the image plane that is ex- pected to yield the highest chance of correctly guessing the POMDP” to pick the target location. The Najemnik & Geisler model success- fully captured some aspects of human saccades but it has sequence. two important limitations: (1) Its fixation policy is greedy, i.e., it maximizes the instantaneous information gain rather than the long term gathering of information. (2) It is appli- cable only to artificially constructed images. Butko & Movellan [4] proposed the I-POMDP frame- work for modeling visual search. The framework ex- Figure 1. A digital fovea: Several concentric Image Patches (IPs) tends the Najemnik & Geisler model by applying long-term (Top) are arranged around a point of fixation. The image por- POMPDP planning methods. They showed that long-term tions contained within each rectangle are reduced to a common information maximization reduces search time. Moreover
  • 34. Butko and Movellan. Optimal Scanning for Faster Object Detection. CVPR 2009. Fixation 1 Fixation 2 Fixation 3 4 3.5 • Evaluate on own faces 3 I!POMDP Viola Jones Error (grid cells) dataset against V-J. 2x 2.5 2 Fixation 4 Fixation 5 Fixation 6 1.5 speedup, but small 1 decrease in accuracy. 0.5 0 0 0.02 0.04 0.06 0.08 0.1 Runtime (seconds) Figure 6. Successive fixation choices by the MI-POMDP policy. The face is found in six fixations. The final estimation of the face Figure 8. By changing the Viola Jones scaling factor, both Viola location is one grid-cell diagonal from the labeled location, giving Jones and I-POMDP become faster and less accurate. MI-POMDP a euclidean distance error of 1.4 grid-cells. is usually closer to the origin on a time-error curve, showing that it gives a better speed-accuracy tradeoff than just applying Viola Jones. crease in accuracy, as shown in the Table below. Both meth- ods on average placed the face between one and two grid- cells off the true face location. 4.2. Speed-Accuracy Tradeoff
  • 35. Vijayanarasimhan and Kapoor. Visual Recognition and Detection Under Bounded Computational Resources. CVPR 2010. Computation Feature Channel Dim time (ms) SIFT R, G, B, Gray 128 0.21 P64 Figure 3. 17 grid weights learnt for each category in the ETHZ T1a S2 The Gray 68 1.2 shape dataset. P18 T2 S2 9 Gray 36 0.09 Table 2. Attributes of theresources. used in the experiments. of computational features • Hough voting with multiplethe INRIA Datasets: We use two challenging object detection datasets namely, the ETHZ shape dataset and (five in our feature types.to compare against several state-of- experiments) order generate an initial set of horses dataset in and potheses. Then, we run each selection strategy iterativ the-art hough based detection approaches [21, 24, 11, 10]. Figure 2. A summary of our algorithm. updating• Uses Value ofisInformationweighted a fix hypotheses as dataset contains 255 the to for five the The ETHZ probability then modeled asgiraffes, mugssum shape images features get added until and shape-based classes (applelogos, bottles, conditional |f ). This term depends on the feature f which is timethe probabilities(1 its lookneighbors: the amount of pick region of horsesin our case). 170 images swans). lapsed to nearest atcontains of has The INRIA sec dataset and (O,x) i to be extracted. However, since we are only trying to determine the In type best feature to qualitative 170 imagescomp 5 we or more some extract. results with- Figure withthe category.side-views of horses and objects occur in out one show In both the datasets, ing the first highly cluttered natural scenes with large variations in both eature to extract, we instead estimate the expected value 1000 points selected by our p(h|f ) select p(gi (O,x) |f, l) = qi active h (2) he term p(gi (O,x) • scale passive selection h∈N (f approach Active approach extracts less ob- |f ) for every feature type t. We do this and theand appearance, and sometimes) contain The first r baseline. multiple contains example imagesfeature inlessfair comparisons. qih ET features, takeseverydatabase FOand = time, and considering all the features in the training database that jects per image. We use the same training and testing setup of type t and obtain the average value of the term. The used by h[10] on both datasets for category in the where is a from the shape, the has higher accuracy on ETHZpoints ure type with the largest value can be interpreted as the second row refers third conditional probability for part (O,x) |h, l) and to the rows show the Implementation Details: Parameter learning of the p(gi that is expected to provide the best evidence for object the grid model is performed by first scaling strategies, truth lected by and Horses. fixed selectionisallmodelour exper- a the ground resp active and random height term pixels in parameter presence given the features. This gi . For example, for the “body” of a giraffe, texture- bounding boxes to a (100 that needs to be estimated from the training data for every tively. Brightiments) denote selectedgaspect ratio. Then the points ed features could provide the best evidence and there- feature h and every grid part feature points. dots while preserving the i . And, (O,x) are uniformly sampled along the edges (using a Canny edge
  • 36. Image Attributions • Girschick et al. - Cascaded deformable part models. • Viola & Jones - Rapid object detection. • Judd et al. - Learning to predict where humans looks. • Chikkerur et al. - What and where? A Bayesian theory of attention. • ...and the papers reviewed.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. This is related to current research work in ML on anytime algorithms.\nI think that the only solution to this goal is attentional detection.\n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. Sequential decision problem.\nNo post-process because at any point detection can be cut off.\n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n