SlideShare una empresa de Scribd logo
1 de 76
Human centered 2D/3D multiview
            video analysis and description

                               Nikos Nikolaidis
Greek SP Jam, AUTH              Ioannis Pitas*
17/5/2012                         AIIA Lab
                     Aristotle University of Thessaloniki
                                   Greece
Anthropocentric video content analysis
tasks
  Face detection and tracking
  Body or body parts (head/hand/torso) detection
  Eye/mouth detection
  3D face model reconstruction
  Face clustering/recognition
  Facial expression analysis
  Activity/gesture recognition
  Semantic video content description/annotation
Anthropocentric video content description
  Applications in video content search (e.g. in YouTube)
  Applications in film and games postproduction:
    Semantic annotation of video
    Video indexing and retrieval
    Matting
    3D reconstruction
  Anthropocentric (human centered) approach
    humans are the most important video entity.
Examples

 Face detection and tracking

  1st frame       6th frame    11th frame   16th frame
Examples
Facial expression recognition
Face detection and tracking

    1st frame       6th frame     11th frame      16th frame




 Problem statement:
   To detect the human faces that appear in each video
   frame and localize their Region-Of-Interest (ROI).
   To track the detected faces over the video frames.
Face detection and tracking

  Tracking associates each detected face in the current
  video frame with one in the next video frame.

  Therefore, we can describe the face (ROI) trajectory
  in a shot in (x,y) coordinates.

  Actor instance definition: face region of interest
  (ROI) plus other info
  Actor appearance definition: face trajectory plus
  other info
Face detection and tracking




   Overall system diagram
Face detection and tracking

Face detection results
Face detection and tracking

                       Face
Feature based face                     Select Features
                     Detection
tracking module
                                          Track
                                         Features



                                 YES    Occlusion
                                        Handling

                                              NO

                                          Result
Face detection and tracking
   Tracking failure may occur, i.e., when a face disappears.

   In such cases, face re-detection is employed.
   However, if any of the detected faces coincides with any of the
 faces already being tracked, the former ones are kept, while the latter
 ones are discarded from any further processing.

   Periodic face re-detection can be applied to account for new faces
 entering the camera's field-of-view (typically every 5 video frames).
   Forward and backward tracking, when the entire video is
 available.
Face detection and tracking
 LSK tracker (template-based tracking)
Face detection and tracking
 LSK tracker results
Bayesian tracking in stereo video
Face detection and tracking

 Performance measures:
   Detection rate
   False alarm rate
   Localization precision in the range [0,…,1]
Face detection and tracking
 Overall Tracking Accuracy
Multiview human detection
 Problem statement:
   Use information from multiple cameras to detect
   bodies or body parts, e.g. head.




           Camera 4                           Camera 6
   Applications:
      Human detection/localization in postproduction.
      Matting / segmentation initialization.
Multiview human detection
  The retained voxels are projected to all views.
  For every view we reject ROIs that have small overlap
with the regions resulting from the projection.




     Camera 2                          Camera 4
Eye detection
Problem statement:
   To detect eye regions in
an image.
   Input: A facial ROI produced by face detection
or tracking
   Output: eye ROIs / eye center location.
   Applications:
     face analysis e.g. expression recognition, face recognition,
     etc
     gaze tracking.
Eye detection
  Eye detection based on edges and distance maps from
  the eye and the surrounding area.

  Robustness to illumination conditions variations.

  The Canny edge detector is used.

  For each pixel, a vector pointing to the closest edge
  pixel is calculated. Its magnitude (length) and the slope
  (angle) are used.
Eye detection



 Training
 data
Eye detection

 Experimental results
Eye detection

Eye detection
performance
3D face reconstruction from
  uncalibrated video

Problem statement:



  Input: facial images or facial video frames, taken from different view
  angles, provided that the face neither changes expression nor speaks.
  Output: 3D face model (saved as a VRML file) and its calibration in
  relation to each camera. Facial pose estimation.
  Applications:
      3D face reconstruction, facial pose estimation, face recognition,
      face verification.
3D face reconstruction from
  uncalibrated video
Method overview
 Manual selection of characteristic feature points on the
 input facial images.
  Use of an uncalibrated 3-D reconstruction algorithm.
  Incorporation of the CANDIDE generic face model.
  Deformation of the generic face model based on the 3-D
 reconstructed feature points.
 Re-projection of the face model grid onto the images and
 manual refinement.
3D face reconstruction from
uncalibrated video




 Input: three images with a number of matched
   characteristic feature points.
3D face reconstruction from
uncalibrated video
The CANDIDE face model has 104
nodes and 184 triangles.

Its nodes correspond to
characteristic points of the human
face, e.g. nose tip, outline of the
eyes, outline of the mouth etc.
3D face reconstruction from
uncalibrated video
 Example
3D face reconstruction from uncalibrated video




           Selected features




       CANDIDE grid reprojection       3D face reconstruction
3D face reconstruction from
uncalibrated video
Performance evaluation:

Reprojection error
Face clustering
Problem statement:

  To cluster a set of facial ROIs
  Input: a set of face image ROIs
  Output: several face clusters, each containing faces of
  only one person.

  Applications
     Cluster the actor images, even if they belong in
    different shots.
    Cluster varying views of the same actor.
Face clustering
 A four step clustering algorithm is used:

   face tracking;

   calculation of mutual information between the tracked
   facial ROIs to form a similarity matrix (graph);

   application of face trajectory heuristics, to improve
   clustering performance;

   similarity graph clustering.
Face clustering
Performance evaluation



  85.4% correct clustering rate, 6.6% clustering errors
  An error of 7.9% is introduced by the face detector.




   Face detector errors have been manually removed
  93% correct clustering rate, 7% clustering errors
  Cluster 2 results are improved
Face recognition

Problem statement:

  To identify a face identity
  Input: a facial ROI
  Output: the face id
                                              Hugh
                                    Sandra    Grant
  Applications:                     Bullock
    Finding the movie cast
    Retrieving movies of an actor
    Surveillance applications
Face recognition
  Two general approaches:

      Subspace methods

      Elastic graph matching methods.
Face recognition
Subspace methods

  The original high-dimensional image space is projected
 onto a low-dimensional one.

   Face recognition according to a simple distance measure in
 the low dimensional space.

   Subspace methods: Eigenfaces (PCA), Fisherfaces (LDA),
 ICA, NMF.

   Main limitation of subspace methods: they require perfect
 face alignment (registration).
Face recognition
 Elastic graph matching (EGM) methods

 Elastic graph matching is a simplified implementation of the
 Dynamic Link Architecture (DLA).

 DLA represents an object by a rectangular elastic grid.
 A Gabor wavelet bank response is measured at each grid node.

 Multiscale dilation-erosion at each grid node can be used,
 leading to Morphological EGM (MEGM).
Face recognition




  Output of normalized multi-scale dilation-erosion for
   nine scales.
Face recognition
 Performance evaluation:

   Performance metric: Equal Error Rate (EER).
   M2VTS and XM2VTS facial image data base for
   training/testing

   MEGM EER: 2%
   Subspace techniques EER 5-10%
   Kernel subspace techniques: 2%

   Elastic graph matching techniques have the best overall
   performance.
Facial expression analysis

Problem statement:


   To identify a facial expression
in a facial image or 3D point cloud.
   Input: a face ROI or a point cloud
   Output: the facial expression label
(e.g. neutral, anger, disgust, sadness, happiness, surprise, fear).

   Applications
     Affective video content description
Facial expression analysis
 Two approaches for expression analysis on images /
 videos:
   Feature based ones
      They use texture or geometrical information as features for
      expression information extraction.
   Template based ones
      They use 3D or 2D head and facial models as templates for facial
      expression recognition.
Facial expression analysis

   Feature based methods
     The features used can be:
       geometric positions of a set of fiducial points on a
       face;
       multi-scale and multi-orientation Gabor wavelet
       coefficients at the fiducial points.
       Optical flow information
        Transform (e.g. DCT, ICA, NMF) coefficients
       Image pixels reduced in dimension by using PCA
       or LDA.
Facial expression analysis

Fusion of geometrical
  and texture
  information
Facial expression analysis

 Confusion matrices:
   Discriminant Non-
   negative Matrix
   Factorization

    CANDIDE grid
   deformation results

   Fusion results
3D facial expression recognition
  Experiments on the BU-3DFE database
     100 subjects.
     6 facial expressions x 4 different intensities +neutral per subject.
     83 landmark points (used in our method).
     Recognition rate: 93.6%
Action recognition
Problem statement:
  To identify the action (elementary activity) of a person
  Input: a single-view or multi-view video or a sequence of 3D
  human body models (or point clouds).
  Output: An activity label (walk, run, jump,…) for each frame or
  for the entire sequence.




       run           walk          jump p.          jump f.     bend
  Applications:
      Semantic video content description, indexing, retrieval
Action recognition
 Action recognition on video data
   Input feature vectors: binary masks resulting from coarse body
   segmentation on each frame.




   run          walk        jump p.        jump f.       bend
Action recognition
 Perform fuzzy c-means (FCM) clustering on the feature
 vectors of all frames of training action videos
Action recognition


   Find the cluster centers, “dynemes”: key human
   body poses
   Characterize each action video by its distance
   from the dynemes.

   Apply Linear Discriminant Analysis (LDA) to
   further decrease dimensionality.
Action recognition

  Characterize each test video by its distances from the
  dynemes
  Perform LDA
  Perform classification, e.g. by SVM
  The method can operate on videos containing multiple
  actions performed sequentially.
  96% correct action recognition on a database containing 10
  actions (walking, running, jumping, waving,..)
3D action recognition

 Action recognition on 3D data
   Extension of the video-based activity recognition
   algorithm
   Input to the algorithm: binary voxel-based representation
   of frames
Visual speech detection



Problem statement:

   To detect video frames where
a persons speaks using only visual information

  Input: A mouth ROI produced by mouth detection
  Output: yes/no visual speech indication.

  The mouth is detected and localized by a similar technique
  to eye detection.
Visual speech detection
  Applications
    Detection of important video frames (when someone
    speaks)
    Lip reading applications
    Speech recognition applications
    Speech intent detection and speaker determination
       human--computer interaction applications
       video telephony and video conferencing systems.
    Dialogue detection system for movies and TV programs
Visual speech detection

   A statistical approach for visual speech detection,
   using mouth region luminance will be presented.

   The method employs face and mouth region
   detectors, applying signal detection algorithms to
   determine lip activity.
Visual speech detection




Increase in the number of low intensity pixels, in the mouth region when mouth is open.
Visual speech detection
Visual speech detection




   The number of low grayscale intensity pixels of the mouth
   region in a video sequence. The rectangle encompasses the
   frames where a person is speaking.
Anthropocentric video content description
structure based on MPEG-7
Problem statement:

To organize and store the extracted semantic information for
  video content description.

  Input: video analysis results
  Output: MPEG-7 compatible XML file.

  Applications
      Video indexing and retrieval
      Video metadata analysis
Anthropocentric video content description

Anthropos-7 metadata description profile

   Humans and their status/actions are the most important
   metadata items.
   Anthropos-7 presents an anthropocentric Perspective for Movie
   annotation.
   New video content descriptors (on top of MPEG7 entities) are
   introduced, in order to store in a better way intermediate and
   high level video metadata related to humans.
    Human ids, actions, status descriptions are supported.
    Information like:
“Actor H. Grant appears in this shot and he is smiling”
can be described in the proposed Anthropos-7 profile.
Anthropocentric video content description


Main Anthropos-7 Characteristics:

   Anthropos-7 Descriptors gather semantic (intermediate level)
  video metadata (tags).

  Tags are selected to suit research areas like face detection,
  object tracking, motion detection, facial expression extraction
  etc.

  High level semantic video metadata (events, e.g. dialogues) are
  supported.
Anthropocentric video content description

           Ds & DSs Name
               Movie Class
               Version Class
               Scene Class
               Shot Class
               Take Class
               Frame Class
               Sound Class
               Actor Class
               Actor Appearance Class
               Object Appearance Class
               Actor Instance Class
               Object Instance Class
               High Semantic Class
               Camera Class
               Camera Use Class
               Lens Class
Anthropocentric video content description

 Movie Class     Actor Class
 Version Class   Object Appearance Class
 Scene Class     Actor Appearance Class
 Shot Class      Object Instance Class
 Take Class      Actor Instance Class
 Frame Class     High Semantic Class
 Sound Class     Camera Class
                 Camera Use Class
                 Lens Class
Anthropocentric video content description
                                           Shot Class



                                                                ID
 Movie Class     Actor Class
                                                             Shot S/N
 Version Class   Object Appearance Class
 Scene Class     Actor Appearance Class                  Shot Description
 Shot Class      Object Instance Class                       Take Ref
 Take Class      Actor Instance Class
 Frame Class     High Semantic Class                        Start Time

 Sound Class     Camera Class                               End Time
                 Camera Use Class
                 Lens Class                                  Duration

                                                           Camera Use


                                                            Key Frame

                                                        Number Of Frames

                                                         Color Information

                                                        Object Appearances
Anthropocentric video content description

 Movie Class     Actor Class
 Version Class   Object Appearance Class
 Scene Class     Actor Appearance Class
 Shot Class      Object Instance Class
 Take Class      Actor Instance Class      Actor Class
 Frame Class     High Semantic Class
 Sound Class     Camera Class
                                                             ID
                 Camera Use Class
                 Lens Class
                                                           Name

                                                            Sex

                                                         Nationality

                                                         Role Name
Anthropocentric video content
description
Movie Class     Actor Class
Version Class   Object Appearance Class
Scene Class     Actor Appearance Class    Actor Appearance Class
Shot Class      Object Instance Class
Take Class      Actor Instance Class
                                                                         ID
Frame Class     High Semantic Class
Sound Class     Camera Class
                Camera Use Class                                     Time In List
                Lens Class
                                                                    Time Out List


                                                                   Color Information


                                                                       Motion


                                                                    Actor Instances
Anthropocentric video content description

Examples of Actor Instances in one Actor Appearence


     1st frame     6th frame    11th frame    16th frame
Anthropocentric video content description
                                           Actor Instance
                                               Class

                                                                ID

                                                       ROIDescriptionModel
 Movie Class     Actor Class
 Version Class   Object Appearance Class                                   ID
 Scene Class     Actor Appearance Class                                Geometry
 Shot Class      Object Instance Class
                                                                                         ID
 Take Class      Actor Instance Class
 Frame Class     High Semantic Class                                               Feature Points
 Sound Class     Camera Class                                                        Bounding Box
                 Camera Use Class                                                    Convex Hull
                 Lens Class                                                              Grid

                                                                         Color

                                                            Status

                                                                 Position In Frame
                                                                     Expressions
                                                                       Activity
Anthropos-7 Video Annotation software

 A set of video annotation software tools have been created for
 anthropocentric video content description.

 Anthropocentric Video Content Description Structure based on
 MPEG-7.

 These software tools perform (semi)-automatic video content
 annotation.

 The video metadata results (XML file) can be edited manually by a
 software tool called Anthropos-7 editor.
Multi-view Anthropos-7

   The goal is to link the frame-based information
   instantiated by the ActorInstanceType DS at each
   video channel (e.g. L+R channel in stereo videos)
   In other words to link the instances of the
   ActorInstanceType DS belonging in different
   channels, due to disparity or color similarity
Multi-view Anthropos-7
Multi-view Anthropos-7

   The proposed structure organizes the data in 3 levels
     Level 1
   Refers to the Correspondences tag
     Level 2
   Refers to the CorrespondingInstancesType DS
     Level 3
   Refers to the CorrespondingInstanceType DS
Multi-view Anthropos-7
Multi-view Anthropos-7
Conclusions

   Anthropocentric movie description is very important
   because in most cases the movies are based on human
   characters, their actions and status.
   We presented a number of anthropocentric video
   analysis and descrition techniques that have
   applications in film/games postproduction as well as
   in semantic video search
   Similar descriptions can be devised for stereo and
   multiview video.
3DTVS project

   European project: 3DTV Content Search (3DTVS)



   Start: 1/11/2011
   Duration: 3 years
   www.3dtvs-project.eu

   Stay tuned!
Thank you very much for your attention!

Further info:
www.aiia.csd.auth.gr
pitas@aiia.csd.auth.gr
Tel: +30-2310-996304

Más contenido relacionado

La actualidad más candente

Techniques for Face Detection & Recognition Systema Comprehensive Review
Techniques for Face Detection & Recognition Systema Comprehensive ReviewTechniques for Face Detection & Recognition Systema Comprehensive Review
Techniques for Face Detection & Recognition Systema Comprehensive ReviewIOSR Journals
 
Facial expression recognition
Facial expression recognitionFacial expression recognition
Facial expression recognitionSachin Mangad
 
A03640106
A03640106A03640106
A03640106theijes
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Face Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones AlgorithmFace Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones Algorithmpaperpublications3
 
Ijarcet vol-2-issue-4-1352-1356
Ijarcet vol-2-issue-4-1352-1356Ijarcet vol-2-issue-4-1352-1356
Ijarcet vol-2-issue-4-1352-1356Editor IJARCET
 
Face Recognition by Sumudu Ranasinghe
Face Recognition by Sumudu RanasingheFace Recognition by Sumudu Ranasinghe
Face Recognition by Sumudu Ranasinghebiitsumudu
 
IRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATOR
IRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATORIRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATOR
IRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATORcsitconf
 
A Spectral Domain Local Feature Extraction Algorithm for Face Recognition
A Spectral Domain Local Feature Extraction Algorithm for Face RecognitionA Spectral Domain Local Feature Extraction Algorithm for Face Recognition
A Spectral Domain Local Feature Extraction Algorithm for Face RecognitionCSCJournals
 
Ieeepro techno solutions ieee embedded project secure and robust iris recog...
Ieeepro techno solutions   ieee embedded project secure and robust iris recog...Ieeepro techno solutions   ieee embedded project secure and robust iris recog...
Ieeepro techno solutions ieee embedded project secure and robust iris recog...srinivasanece7
 
Face Recognition using OpenCV
Face Recognition using OpenCVFace Recognition using OpenCV
Face Recognition using OpenCVVasile Chelban
 
Human Segmentation Using Haar-Classifier
Human Segmentation Using Haar-ClassifierHuman Segmentation Using Haar-Classifier
Human Segmentation Using Haar-ClassifierIJERA Editor
 

La actualidad más candente (19)

[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
[IJET-V1I1P4]Author :Juhi Raut, Snehal Patil, Shraddha Gawade, Prof. Mamta Me...
 
Human Iris Biometry
Human Iris BiometryHuman Iris Biometry
Human Iris Biometry
 
Techniques for Face Detection & Recognition Systema Comprehensive Review
Techniques for Face Detection & Recognition Systema Comprehensive ReviewTechniques for Face Detection & Recognition Systema Comprehensive Review
Techniques for Face Detection & Recognition Systema Comprehensive Review
 
Facial expression recognition
Facial expression recognitionFacial expression recognition
Facial expression recognition
 
A03640106
A03640106A03640106
A03640106
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Face Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones AlgorithmFace Detection Using Modified Viola Jones Algorithm
Face Detection Using Modified Viola Jones Algorithm
 
Ijarcet vol-2-issue-4-1352-1356
Ijarcet vol-2-issue-4-1352-1356Ijarcet vol-2-issue-4-1352-1356
Ijarcet vol-2-issue-4-1352-1356
 
Face Recognition by Sumudu Ranasinghe
Face Recognition by Sumudu RanasingheFace Recognition by Sumudu Ranasinghe
Face Recognition by Sumudu Ranasinghe
 
Kb gait-recognition
Kb gait-recognitionKb gait-recognition
Kb gait-recognition
 
Face recognition
Face recognition Face recognition
Face recognition
 
Iris Recognition
Iris RecognitionIris Recognition
Iris Recognition
 
IRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATOR
IRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATORIRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATOR
IRIS BIOMETRIC RECOGNITION SYSTEM EMPLOYING CANNY OPERATOR
 
A010320106
A010320106A010320106
A010320106
 
A Spectral Domain Local Feature Extraction Algorithm for Face Recognition
A Spectral Domain Local Feature Extraction Algorithm for Face RecognitionA Spectral Domain Local Feature Extraction Algorithm for Face Recognition
A Spectral Domain Local Feature Extraction Algorithm for Face Recognition
 
Ieeepro techno solutions ieee embedded project secure and robust iris recog...
Ieeepro techno solutions   ieee embedded project secure and robust iris recog...Ieeepro techno solutions   ieee embedded project secure and robust iris recog...
Ieeepro techno solutions ieee embedded project secure and robust iris recog...
 
E045052731
E045052731E045052731
E045052731
 
Face Recognition using OpenCV
Face Recognition using OpenCVFace Recognition using OpenCV
Face Recognition using OpenCV
 
Human Segmentation Using Haar-Classifier
Human Segmentation Using Haar-ClassifierHuman Segmentation Using Haar-Classifier
Human Segmentation Using Haar-Classifier
 

Destacado

Model Based Emotion Detection using Point Clouds
Model Based Emotion Detection using Point CloudsModel Based Emotion Detection using Point Clouds
Model Based Emotion Detection using Point CloudsLakshmi Sarvani Videla
 

Destacado (20)

Sparse and Low Rank Representations in Music Signal Analysis
 Sparse and Low Rank Representations in Music Signal  Analysis Sparse and Low Rank Representations in Music Signal  Analysis
Sparse and Low Rank Representations in Music Signal Analysis
 
Nonlinear Communications: Achievable Rates, Estimation, and Decoding
Nonlinear Communications: Achievable Rates, Estimation, and DecodingNonlinear Communications: Achievable Rates, Estimation, and Decoding
Nonlinear Communications: Achievable Rates, Estimation, and Decoding
 
Tribute to Nicolas Galatsanos
Tribute to Nicolas GalatsanosTribute to Nicolas Galatsanos
Tribute to Nicolas Galatsanos
 
Data Quality: Not Your Typical Database Problem
Data Quality: Not Your Typical Database ProblemData Quality: Not Your Typical Database Problem
Data Quality: Not Your Typical Database Problem
 
Influence Propagation in Large Graphs - Theorems and Algorithms
Influence Propagation in Large Graphs - Theorems and AlgorithmsInfluence Propagation in Large Graphs - Theorems and Algorithms
Influence Propagation in Large Graphs - Theorems and Algorithms
 
The Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System ArchitectureThe Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System Architecture
 
Co-evolution, Games, and Social Behaviors
Co-evolution, Games, and Social BehaviorsCo-evolution, Games, and Social Behaviors
Co-evolution, Games, and Social Behaviors
 
Web Usage Miningand Using Ontology for Capturing Web Usage Semantic
Web Usage Miningand Using Ontology for Capturing Web Usage SemanticWeb Usage Miningand Using Ontology for Capturing Web Usage Semantic
Web Usage Miningand Using Ontology for Capturing Web Usage Semantic
 
A Classification Framework For Component Models
 A Classification Framework For Component Models A Classification Framework For Component Models
A Classification Framework For Component Models
 
Compressive Spectral Image Sensing, Processing, and Optimization
Compressive Spectral Image Sensing, Processing, and OptimizationCompressive Spectral Image Sensing, Processing, and Optimization
Compressive Spectral Image Sensing, Processing, and Optimization
 
From Programs to Systems – Building a Smarter World
From Programs to Systems – Building a Smarter WorldFrom Programs to Systems – Building a Smarter World
From Programs to Systems – Building a Smarter World
 
Sparsity Control for Robustness and Social Data Analysis
Sparsity Control for Robustness and Social Data AnalysisSparsity Control for Robustness and Social Data Analysis
Sparsity Control for Robustness and Social Data Analysis
 
Opening Second Greek Signal Processing Jam
Opening Second Greek Signal Processing JamOpening Second Greek Signal Processing Jam
Opening Second Greek Signal Processing Jam
 
State Space Exploration for NASA’s Safety Critical Systems
State Space Exploration for NASA’s Safety Critical SystemsState Space Exploration for NASA’s Safety Critical Systems
State Space Exploration for NASA’s Safety Critical Systems
 
Jamming in Wireless Sensor Networks
Jamming in Wireless Sensor NetworksJamming in Wireless Sensor Networks
Jamming in Wireless Sensor Networks
 
Mixture Models for Image Analysis
Mixture Models for Image AnalysisMixture Models for Image Analysis
Mixture Models for Image Analysis
 
Sparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and ApplicationsSparse and Redundant Representations: Theory and Applications
Sparse and Redundant Representations: Theory and Applications
 
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
Networked 3-D Virtual Collaboration in Science and Education: Towards 'Web 3....
 
Model Based Emotion Detection using Point Clouds
Model Based Emotion Detection using Point CloudsModel Based Emotion Detection using Point Clouds
Model Based Emotion Detection using Point Clouds
 
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
Machine Learning Tools and Particle Swarm Optimization for Content-Based Sear...
 

Similar a Semantic 3DTV Content Analysis and Description

Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Showrav Mazumder
 
Report Face Detection
Report Face DetectionReport Face Detection
Report Face DetectionJugal Patel
 
A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...
A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...
A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...IOSR Journals
 
Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning cscpconf
 
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGMULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGcsandit
 
Research and Development of DSP-Based Face Recognition System for Robotic Reh...
Research and Development of DSP-Based Face Recognition System for Robotic Reh...Research and Development of DSP-Based Face Recognition System for Robotic Reh...
Research and Development of DSP-Based Face Recognition System for Robotic Reh...IJCSES Journal
 
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)IRJET Journal
 
A study of techniques for facial detection and expression classification
A study of techniques for facial detection and expression classificationA study of techniques for facial detection and expression classification
A study of techniques for facial detection and expression classificationIJCSES Journal
 
IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...
IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...
IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...IRJET Journal
 
IRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
IRJET- A Survey on Facial Expression Recognition Robust to Partial OcclusionIRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
IRJET- A Survey on Facial Expression Recognition Robust to Partial OcclusionIRJET Journal
 
Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...
Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...
Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...adewole63
 
Deep learning for understanding faces
Deep learning for understanding facesDeep learning for understanding faces
Deep learning for understanding facessieubebu
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technologyranjit banshpal
 

Similar a Semantic 3DTV Content Analysis and Description (20)

Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015Face and Eye Detection Varying Scenarios With Haar Classifier_2015
Face and Eye Detection Varying Scenarios With Haar Classifier_2015
 
Face Recognition
Face RecognitionFace Recognition
Face Recognition
 
Report Face Detection
Report Face DetectionReport Face Detection
Report Face Detection
 
A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...
A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...
A Fast Recognition Method for Pose and Illumination Variant Faces on Video Se...
 
Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning Multimodal Biometrics Recognition from Facial Video via Deep Learning
Multimodal Biometrics Recognition from Facial Video via Deep Learning
 
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNINGMULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
MULTIMODAL BIOMETRICS RECOGNITION FROM FACIAL VIDEO VIA DEEP LEARNING
 
Research and Development of DSP-Based Face Recognition System for Robotic Reh...
Research and Development of DSP-Based Face Recognition System for Robotic Reh...Research and Development of DSP-Based Face Recognition System for Robotic Reh...
Research and Development of DSP-Based Face Recognition System for Robotic Reh...
 
Infarec
InfarecInfarec
Infarec
 
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
IRJET- Human Face Recognition in Video using Convolutional Neural Network (CNN)
 
A study of techniques for facial detection and expression classification
A study of techniques for facial detection and expression classificationA study of techniques for facial detection and expression classification
A study of techniques for facial detection and expression classification
 
inam
inaminam
inam
 
IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...
IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...
IRJET- Face Spoofing Detection Based on Texture Analysis and Color Space Conv...
 
IRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
IRJET- A Survey on Facial Expression Recognition Robust to Partial OcclusionIRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
IRJET- A Survey on Facial Expression Recognition Robust to Partial Occlusion
 
Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...
Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...
Optimization of Facial Landmark for Sentiment Analysis on Images with Human F...
 
Real time facial expression analysis using pca
Real time facial expression analysis using pcaReal time facial expression analysis using pca
Real time facial expression analysis using pca
 
Deep learning for understanding faces
Deep learning for understanding facesDeep learning for understanding faces
Deep learning for understanding faces
 
Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
 
Face recognition
Face recognitionFace recognition
Face recognition
 
184
184184
184
 
Biometric
BiometricBiometric
Biometric
 

Más de Distinguished Lecturer Series - Leon The Mathematician (7)

Defying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital ConversionDefying Nyquist in Analog to Digital Conversion
Defying Nyquist in Analog to Digital Conversion
 
Artificial Intelligence and Human Thinking
Artificial Intelligence and Human ThinkingArtificial Intelligence and Human Thinking
Artificial Intelligence and Human Thinking
 
Farewell to Disks: Efficient Processing of Obstinate Data
Farewell to Disks: Efficient Processing of Obstinate DataFarewell to Disks: Efficient Processing of Obstinate Data
Farewell to Disks: Efficient Processing of Obstinate Data
 
Artificial Intelligence and Human Thinking
Artificial Intelligence and Human ThinkingArtificial Intelligence and Human Thinking
Artificial Intelligence and Human Thinking
 
Descriptive Granularity - Building Foundations of Data Mining
Descriptive Granularity - Building Foundations of Data MiningDescriptive Granularity - Building Foundations of Data Mining
Descriptive Granularity - Building Foundations of Data Mining
 
Success Factors in Industry - Academia Collaboration - An Empirical Study
 Success Factors in Industry - Academia Collaboration - An Empirical Study   Success Factors in Industry - Academia Collaboration - An Empirical Study
Success Factors in Industry - Academia Collaboration - An Empirical Study
 
Compressed Sensing In Spectral Imaging
Compressed Sensing In Spectral Imaging  Compressed Sensing In Spectral Imaging
Compressed Sensing In Spectral Imaging
 

Último

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Último (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Semantic 3DTV Content Analysis and Description

  • 1. Human centered 2D/3D multiview video analysis and description Nikos Nikolaidis Greek SP Jam, AUTH Ioannis Pitas* 17/5/2012 AIIA Lab Aristotle University of Thessaloniki Greece
  • 2. Anthropocentric video content analysis tasks Face detection and tracking Body or body parts (head/hand/torso) detection Eye/mouth detection 3D face model reconstruction Face clustering/recognition Facial expression analysis Activity/gesture recognition Semantic video content description/annotation
  • 3. Anthropocentric video content description Applications in video content search (e.g. in YouTube) Applications in film and games postproduction: Semantic annotation of video Video indexing and retrieval Matting 3D reconstruction Anthropocentric (human centered) approach humans are the most important video entity.
  • 4. Examples Face detection and tracking 1st frame 6th frame 11th frame 16th frame
  • 6. Face detection and tracking 1st frame 6th frame 11th frame 16th frame Problem statement: To detect the human faces that appear in each video frame and localize their Region-Of-Interest (ROI). To track the detected faces over the video frames.
  • 7. Face detection and tracking Tracking associates each detected face in the current video frame with one in the next video frame. Therefore, we can describe the face (ROI) trajectory in a shot in (x,y) coordinates. Actor instance definition: face region of interest (ROI) plus other info Actor appearance definition: face trajectory plus other info
  • 8. Face detection and tracking Overall system diagram
  • 9. Face detection and tracking Face detection results
  • 10. Face detection and tracking Face Feature based face Select Features Detection tracking module Track Features YES Occlusion Handling NO Result
  • 11. Face detection and tracking Tracking failure may occur, i.e., when a face disappears. In such cases, face re-detection is employed. However, if any of the detected faces coincides with any of the faces already being tracked, the former ones are kept, while the latter ones are discarded from any further processing. Periodic face re-detection can be applied to account for new faces entering the camera's field-of-view (typically every 5 video frames). Forward and backward tracking, when the entire video is available.
  • 12. Face detection and tracking LSK tracker (template-based tracking)
  • 13. Face detection and tracking LSK tracker results
  • 14. Bayesian tracking in stereo video
  • 15. Face detection and tracking Performance measures: Detection rate False alarm rate Localization precision in the range [0,…,1]
  • 16. Face detection and tracking Overall Tracking Accuracy
  • 17. Multiview human detection Problem statement: Use information from multiple cameras to detect bodies or body parts, e.g. head. Camera 4 Camera 6 Applications: Human detection/localization in postproduction. Matting / segmentation initialization.
  • 18. Multiview human detection The retained voxels are projected to all views. For every view we reject ROIs that have small overlap with the regions resulting from the projection. Camera 2 Camera 4
  • 19. Eye detection Problem statement: To detect eye regions in an image. Input: A facial ROI produced by face detection or tracking Output: eye ROIs / eye center location. Applications: face analysis e.g. expression recognition, face recognition, etc gaze tracking.
  • 20. Eye detection Eye detection based on edges and distance maps from the eye and the surrounding area. Robustness to illumination conditions variations. The Canny edge detector is used. For each pixel, a vector pointing to the closest edge pixel is calculated. Its magnitude (length) and the slope (angle) are used.
  • 24. 3D face reconstruction from uncalibrated video Problem statement: Input: facial images or facial video frames, taken from different view angles, provided that the face neither changes expression nor speaks. Output: 3D face model (saved as a VRML file) and its calibration in relation to each camera. Facial pose estimation. Applications: 3D face reconstruction, facial pose estimation, face recognition, face verification.
  • 25. 3D face reconstruction from uncalibrated video Method overview Manual selection of characteristic feature points on the input facial images. Use of an uncalibrated 3-D reconstruction algorithm. Incorporation of the CANDIDE generic face model. Deformation of the generic face model based on the 3-D reconstructed feature points. Re-projection of the face model grid onto the images and manual refinement.
  • 26. 3D face reconstruction from uncalibrated video Input: three images with a number of matched characteristic feature points.
  • 27. 3D face reconstruction from uncalibrated video The CANDIDE face model has 104 nodes and 184 triangles. Its nodes correspond to characteristic points of the human face, e.g. nose tip, outline of the eyes, outline of the mouth etc.
  • 28. 3D face reconstruction from uncalibrated video Example
  • 29. 3D face reconstruction from uncalibrated video Selected features CANDIDE grid reprojection 3D face reconstruction
  • 30. 3D face reconstruction from uncalibrated video Performance evaluation: Reprojection error
  • 31. Face clustering Problem statement: To cluster a set of facial ROIs Input: a set of face image ROIs Output: several face clusters, each containing faces of only one person. Applications Cluster the actor images, even if they belong in different shots. Cluster varying views of the same actor.
  • 32. Face clustering A four step clustering algorithm is used: face tracking; calculation of mutual information between the tracked facial ROIs to form a similarity matrix (graph); application of face trajectory heuristics, to improve clustering performance; similarity graph clustering.
  • 33. Face clustering Performance evaluation 85.4% correct clustering rate, 6.6% clustering errors An error of 7.9% is introduced by the face detector. Face detector errors have been manually removed 93% correct clustering rate, 7% clustering errors Cluster 2 results are improved
  • 34. Face recognition Problem statement: To identify a face identity Input: a facial ROI Output: the face id Hugh Sandra Grant Applications: Bullock Finding the movie cast Retrieving movies of an actor Surveillance applications
  • 35. Face recognition Two general approaches: Subspace methods Elastic graph matching methods.
  • 36. Face recognition Subspace methods The original high-dimensional image space is projected onto a low-dimensional one. Face recognition according to a simple distance measure in the low dimensional space. Subspace methods: Eigenfaces (PCA), Fisherfaces (LDA), ICA, NMF. Main limitation of subspace methods: they require perfect face alignment (registration).
  • 37. Face recognition Elastic graph matching (EGM) methods Elastic graph matching is a simplified implementation of the Dynamic Link Architecture (DLA). DLA represents an object by a rectangular elastic grid. A Gabor wavelet bank response is measured at each grid node. Multiscale dilation-erosion at each grid node can be used, leading to Morphological EGM (MEGM).
  • 38. Face recognition Output of normalized multi-scale dilation-erosion for nine scales.
  • 39. Face recognition Performance evaluation: Performance metric: Equal Error Rate (EER). M2VTS and XM2VTS facial image data base for training/testing MEGM EER: 2% Subspace techniques EER 5-10% Kernel subspace techniques: 2% Elastic graph matching techniques have the best overall performance.
  • 40. Facial expression analysis Problem statement: To identify a facial expression in a facial image or 3D point cloud. Input: a face ROI or a point cloud Output: the facial expression label (e.g. neutral, anger, disgust, sadness, happiness, surprise, fear). Applications Affective video content description
  • 41. Facial expression analysis Two approaches for expression analysis on images / videos: Feature based ones They use texture or geometrical information as features for expression information extraction. Template based ones They use 3D or 2D head and facial models as templates for facial expression recognition.
  • 42. Facial expression analysis Feature based methods The features used can be: geometric positions of a set of fiducial points on a face; multi-scale and multi-orientation Gabor wavelet coefficients at the fiducial points. Optical flow information Transform (e.g. DCT, ICA, NMF) coefficients Image pixels reduced in dimension by using PCA or LDA.
  • 43. Facial expression analysis Fusion of geometrical and texture information
  • 44. Facial expression analysis Confusion matrices: Discriminant Non- negative Matrix Factorization CANDIDE grid deformation results Fusion results
  • 45. 3D facial expression recognition Experiments on the BU-3DFE database 100 subjects. 6 facial expressions x 4 different intensities +neutral per subject. 83 landmark points (used in our method). Recognition rate: 93.6%
  • 46. Action recognition Problem statement: To identify the action (elementary activity) of a person Input: a single-view or multi-view video or a sequence of 3D human body models (or point clouds). Output: An activity label (walk, run, jump,…) for each frame or for the entire sequence. run walk jump p. jump f. bend Applications: Semantic video content description, indexing, retrieval
  • 47. Action recognition Action recognition on video data Input feature vectors: binary masks resulting from coarse body segmentation on each frame. run walk jump p. jump f. bend
  • 48. Action recognition Perform fuzzy c-means (FCM) clustering on the feature vectors of all frames of training action videos
  • 49. Action recognition Find the cluster centers, “dynemes”: key human body poses Characterize each action video by its distance from the dynemes. Apply Linear Discriminant Analysis (LDA) to further decrease dimensionality.
  • 50. Action recognition Characterize each test video by its distances from the dynemes Perform LDA Perform classification, e.g. by SVM The method can operate on videos containing multiple actions performed sequentially. 96% correct action recognition on a database containing 10 actions (walking, running, jumping, waving,..)
  • 51. 3D action recognition Action recognition on 3D data Extension of the video-based activity recognition algorithm Input to the algorithm: binary voxel-based representation of frames
  • 52. Visual speech detection Problem statement: To detect video frames where a persons speaks using only visual information Input: A mouth ROI produced by mouth detection Output: yes/no visual speech indication. The mouth is detected and localized by a similar technique to eye detection.
  • 53. Visual speech detection Applications Detection of important video frames (when someone speaks) Lip reading applications Speech recognition applications Speech intent detection and speaker determination human--computer interaction applications video telephony and video conferencing systems. Dialogue detection system for movies and TV programs
  • 54. Visual speech detection A statistical approach for visual speech detection, using mouth region luminance will be presented. The method employs face and mouth region detectors, applying signal detection algorithms to determine lip activity.
  • 55. Visual speech detection Increase in the number of low intensity pixels, in the mouth region when mouth is open.
  • 57. Visual speech detection The number of low grayscale intensity pixels of the mouth region in a video sequence. The rectangle encompasses the frames where a person is speaking.
  • 58. Anthropocentric video content description structure based on MPEG-7 Problem statement: To organize and store the extracted semantic information for video content description. Input: video analysis results Output: MPEG-7 compatible XML file. Applications Video indexing and retrieval Video metadata analysis
  • 59. Anthropocentric video content description Anthropos-7 metadata description profile Humans and their status/actions are the most important metadata items. Anthropos-7 presents an anthropocentric Perspective for Movie annotation. New video content descriptors (on top of MPEG7 entities) are introduced, in order to store in a better way intermediate and high level video metadata related to humans. Human ids, actions, status descriptions are supported. Information like: “Actor H. Grant appears in this shot and he is smiling” can be described in the proposed Anthropos-7 profile.
  • 60. Anthropocentric video content description Main Anthropos-7 Characteristics: Anthropos-7 Descriptors gather semantic (intermediate level) video metadata (tags). Tags are selected to suit research areas like face detection, object tracking, motion detection, facial expression extraction etc. High level semantic video metadata (events, e.g. dialogues) are supported.
  • 61. Anthropocentric video content description Ds & DSs Name Movie Class Version Class Scene Class Shot Class Take Class Frame Class Sound Class Actor Class Actor Appearance Class Object Appearance Class Actor Instance Class Object Instance Class High Semantic Class Camera Class Camera Use Class Lens Class
  • 62. Anthropocentric video content description Movie Class Actor Class Version Class Object Appearance Class Scene Class Actor Appearance Class Shot Class Object Instance Class Take Class Actor Instance Class Frame Class High Semantic Class Sound Class Camera Class Camera Use Class Lens Class
  • 63. Anthropocentric video content description Shot Class ID Movie Class Actor Class Shot S/N Version Class Object Appearance Class Scene Class Actor Appearance Class Shot Description Shot Class Object Instance Class Take Ref Take Class Actor Instance Class Frame Class High Semantic Class Start Time Sound Class Camera Class End Time Camera Use Class Lens Class Duration Camera Use Key Frame Number Of Frames Color Information Object Appearances
  • 64. Anthropocentric video content description Movie Class Actor Class Version Class Object Appearance Class Scene Class Actor Appearance Class Shot Class Object Instance Class Take Class Actor Instance Class Actor Class Frame Class High Semantic Class Sound Class Camera Class ID Camera Use Class Lens Class Name Sex Nationality Role Name
  • 65. Anthropocentric video content description Movie Class Actor Class Version Class Object Appearance Class Scene Class Actor Appearance Class Actor Appearance Class Shot Class Object Instance Class Take Class Actor Instance Class ID Frame Class High Semantic Class Sound Class Camera Class Camera Use Class Time In List Lens Class Time Out List Color Information Motion Actor Instances
  • 66. Anthropocentric video content description Examples of Actor Instances in one Actor Appearence 1st frame 6th frame 11th frame 16th frame
  • 67. Anthropocentric video content description Actor Instance Class ID ROIDescriptionModel Movie Class Actor Class Version Class Object Appearance Class ID Scene Class Actor Appearance Class Geometry Shot Class Object Instance Class ID Take Class Actor Instance Class Frame Class High Semantic Class Feature Points Sound Class Camera Class Bounding Box Camera Use Class Convex Hull Lens Class Grid Color Status Position In Frame Expressions Activity
  • 68. Anthropos-7 Video Annotation software A set of video annotation software tools have been created for anthropocentric video content description. Anthropocentric Video Content Description Structure based on MPEG-7. These software tools perform (semi)-automatic video content annotation. The video metadata results (XML file) can be edited manually by a software tool called Anthropos-7 editor.
  • 69. Multi-view Anthropos-7 The goal is to link the frame-based information instantiated by the ActorInstanceType DS at each video channel (e.g. L+R channel in stereo videos) In other words to link the instances of the ActorInstanceType DS belonging in different channels, due to disparity or color similarity
  • 71. Multi-view Anthropos-7 The proposed structure organizes the data in 3 levels Level 1 Refers to the Correspondences tag Level 2 Refers to the CorrespondingInstancesType DS Level 3 Refers to the CorrespondingInstanceType DS
  • 74. Conclusions Anthropocentric movie description is very important because in most cases the movies are based on human characters, their actions and status. We presented a number of anthropocentric video analysis and descrition techniques that have applications in film/games postproduction as well as in semantic video search Similar descriptions can be devised for stereo and multiview video.
  • 75. 3DTVS project European project: 3DTV Content Search (3DTVS) Start: 1/11/2011 Duration: 3 years www.3dtvs-project.eu Stay tuned!
  • 76. Thank you very much for your attention! Further info: www.aiia.csd.auth.gr pitas@aiia.csd.auth.gr Tel: +30-2310-996304