IRJET- Survey Paper on Vision based Hand Gesture Recognition
final ppt
1. PROJECT REPORT
Presented by
Mayank Goel 2011AAPS135H
Shridevi Muthkhod 2011A8PS005G
Niharika Gupta 2011A8PS260G
At
CENTRAL ELECTRONICS ENGINEERING RESEARCH INSTITUTE, PILANI
A Practice School station of
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI
3. MACHINE LEARNING
Science of getting computers to act
without being explicitly programmed.
Step towards Artificial Intelligence.
Is used many times everyday
unknowingly.
For example - Self-driving cars, Practical
speech recognition, Effective web search,
Understanding of the human genome etc.
4. PROJECTS DONE UNDER MACHINE LEARNING
Minor projects
Neural network implementation
HOJ3D histogram bins
Major project
Gesture recognition using HMM
5. NEURAL NETWORKS
In machine learning, it is a mathematical model
inspired by biological neural networks.
A neural network consists of an interconnected
group of artificial neurons, and it processes
information using a connectionist approach to
computation.
In most cases a neural network is an adaptive
system changing its structure during a learning
phase.
Neural networks are used for modelling complex
relationships between inputs and outputs or to find
patterns in data.
6. IMPLEMENTATION
We developed a code to compute the
average of two numbers using neural
networks.
We used the back propagation algorithm
We made a training file from which user
could select the number of training data.
The code was able to give a close
approximate value of the average of two
numbers.
We were supposed to modify our code
according to other projects but as we found
7. HOJ3D VECTORS
Method for feature
extraction
Conversion to
spherical coordinates
the zenith vector θ
the reference vector
α
Segmentation into
bins
The inclination angle
into 7 bins
The azimuthal angle
into 12 bins
We have developed a program to segregate the 20
major body joints into their respective bins.
8. GESTURE RECOGNITION
Gesture recognition is one of
the most sought after field of
research across the globe.
Branch of Machine Learning
Keyboard interfacing-->GUI-
->NUI
It is not as unrealistic as it
seems and can be made
possible in future using this
field.
Usage of computer vision
and image processing
techniques
10. OUR APPROACH
In past many different algorithms and procedures
like PCA, LDA ,neural network have been
extensively used for gesture recognition.
We have implemented HMM (Hidden Markov
Model). We have combined two famous research
papers to bring out a new method which has
proved to be very efficient.
We took a simpler feature extraction approach from
one research paper and hmm concept from other.
11. obtaining skeleton
image using kinect
extracting x, y and
depth values of
joints in the form of
matrix
Convert the
obtained values to
the spherical
coordinates and
segment them into
bins
apply PCA
algorithm to the
matrix and obtain
eigen vectors.
build a codebook
using the k-
clusterization and
obtain centroids.
using vector
quantization assign
a centroid to each
feature vector
develop the HMM
module for every
gesturee to be
recognized.
train the modules
using input from
various sources
test the input
gesture to see if
correct result is
coming out.
12. obtaining skeleton
image using Kinect
extracting x, y and
depth values of
joints in the form of
matrix
Convert the
obtained values to
the spherical
coordinates and
segment them into
bins
apply PCA
algorithm to the
matrix and obtain
Eigen vectors.
build a codebook
using the k-
clusterization and
obtain centroids.
using vector
quantization assign
a centroids to each
feature vector
develop the HMM
module for every
gesture to be
recognized.
train the modules
using input from
various sources
test the input
gesture to see if
correct result is
coming out.
13. obtaining skeleton
image using Kinect
extracting x, y and
depth values of
joints in the form of
matrix
Convert the
obtained x,y and z
values to the
spherical
coordinates and
segment them into
bins
apply PCA
algorithm to the
matrix and obtain
Eigen vectors.
build a codebook
using the k-
clusterization and
obtain centroids.
using vector
quantization assign
a centroids to each
feature vector
develop the HMM
module for every
gesture to be
recognized.
train the modules
using input from
various sources
test the input
gesture to see if
correct result is
coming out.
14. Kinect can capture
and track up 2
skeletons
Captures data at 30
frames/sec
Captures a
collection of 20
joints
1. JOINT POSITION ESTIMATION
22. THREE BASIC PROBLEMS:
Evaluation
Given : O = O1 O2:::OT and (A; B; )
Compute P(Oj)
Recognition
Given : O = O1 O2:::OT and (A; B; )
Choose Q = q1q2:::qT which is optimal in
some sense
Training
Given : O = O1 O2:::OT
Adjust (A; B; ) to maximize P(Oj)
23. Features
K (clusters)
Attempts
Iterations
Cb_index
Branching
N (states)
Frames
5. ADJUSTING THE PARAMETERS
24. DATASETS
Data set is a collection of large number of text files having x, y
and depth values of body joints in different frames of gestures.
It generally includes gestures reading taken by different
subjects (people) in different manners.
They are require to train hmm modules of different gestures.
More the number of files, better is the training. E.g.. Handclap
Dataset - a challenge. They are very specific to the code being
used for. Making Dataset is a very tedious task
We decided to work on MSR 3D dataset available on internet
as well as made our own dataset.
25. MSR ACTION3D
20 gestures
10 subjects
(9 male, 1 female)
Each gesture
performed 3
times by each
subject
26. OUR DATASET
15 gestures
5 subjects (4 male, 1 female)
Each gesture performed 5
times by each subject
Reference videos of each
gesture also taken.
Bend P.T. Exercise Right Forward Kick Hand Clap Right High Arm Wave
27. OUR DATASET
Right Side Boxing Sitting to Standing UpRight Side BendRelaxingLeft Side Boxing
Left Side BendLeft High Arm WaveLeft Forward KickJumping JackJogging
28. Results with MSR Action3D dataset
Sr.
no.
Gesture Recognition rate
(%)
1 Hand clap 92
2 Hand wave 100
3 Hammer 100
4 Draw circle 83.33
5 Forward punch 100
6 Side boxing 88.4
7 Bend 100
8 Golf swing 100
29. RESULTS WITH OUR DATASET
Sr.
no.
Gesture Recognition rate
(%)
1 Hand clap 100
2 Left side bend 100
3 Right side bend 90
4 Left side boxing 100
5 Right side boxing 100
6 Left forward kick 100
7 Right forward kick 90
30. OUR DATASET CONTD….
Sr.
no.
Gesture Recognition rate
(%)
8 P.T. exercise 100
9 Left high arm wave 100
10 Right high arm wave 100
11 bend 100
12 Relaxing 100
13 Jumping jack 100
14 Standing up 100
15 Jogging 100
31. CONCLUSIONS
Real time gesture recognition has been achieved for 15
gestures with a overall efficiency of 85.66% for MSR
dataset and 98.66% with our dataset compared to 90.92%
efficiency of the paper referred by us. (L. Xia, C. Chen, and J.
Aggarwal. View Invariant Human Action Recognition )
Eigen joints have proved to be a very good option for
feature extraction.
Real-time components:
extraction of 3D skeletal joint locations
computation of Eigen joints
classification
The success rate is found to be considerably high with our
dataset. i.e. efficiency is dependent on dataset used.
No need to apply IOHMM.
32. FUTURE WORK
Increase the number of gestures
Extract features in a view invariant manner
Use the RGB image features to enhance
machine learning as a whole
33. REFERENCES
L. Xia, C. Chen, and J. Aggarwal. View Invariant Human Action
Recognition Using Histograms of 3D Joints. IEEE CVPR Workshop on
Human Activity Understanding from 3D Data, 2012
Xiaodong Yang and YingLi Tian, Effective 3D Action Recognition Using
Eigen Joints, 2013
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore,
A. Kipman, and A. Blake, Real-Time Human Pose Recognition in Parts
from a Single Depth Image, in CVPR, IEEE, June 2011
Feng-Sheng Chen, Chih-Ming Fu, Chung-Lin Huang, Hand gesture
recognition using a real-time tracking method and hidden Markov
models, March 2003
Gerhard Rigoll, Andreas Kosmala, New improved feature extraction
methods for real-time high performance image sequence recognition
Justin Huang, Chun-wei Lee, Junji Ma, Gesture Recognition and
Classification using Microsoft Kinect
34. Now, we will be showing you the implementation of
the project. Thank you for being patient!!
Notas del editor
Machine learning is the science of getting computers to act without being explicitly programmed.
Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.
Many researchers also think it is the best way to make progress towards human-level Artificial intelligence
Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs.
Gesture recognition enables humans to communicate with the machine (HMI) and interact naturally without any mechanical devices. Gesture recognition can be conducted with techniques from computer vision and image processing.
We were supposed to modify our code according to other projects but as we found better approaches, we dropped neural networks
We partition the 3D space into 84 bins as shown. The inclination angle is divided into 7 bins from the zenith vector θ. Similarly, from the reference vector α, the azimuth angle is divided into 12 equal bins with 30 degrees resolution. With our spherical coordinate, any 3D joint can be localized at a unique bin.
Our goal of using gesture recognition is to push the advanced human-computer communications to bring the performance of human computer interaction as close to the human-human interaction
We cannot touch upon the topic of gesture recognition without mentioning the pivot role of Kinect.
Kinect is a motion sensing input device by Microsoft for the Xbox 360 + Windows PCs.
It enables users to control and interact with the X-box without the need to touch a game controller.
Kinect software development kit was released by microsoft in June,2011. This SDK was meant to allow developers to write Kinecting apps in C++,C#, or visual basic .NET. It has the following ha
Hardware
Dual Core > 2.66 GHz
Software
Windows 7
DirectX 9.0c
Visual Studio 2010
Microsoft Speech Platform 11
2 GB Ram (4 Rec’d)
Kinect for Windows
Can use Xbox Kinect with power adapter for development
Using Shotton’s method, we could obtain the corresponding x,y and depth values of all the 20 joints from the depth image by using object recognition scheme.
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers. It is closely related to an earlier work on optimal nonlinear filtering problem (stochastic processes) by Ruslan L. Stratonovich, who was the first to describe the forward-backward procedure.
Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.
Evaluation - Given the observation sequence O = O1 O2:::OT and a model = (A; B; ), how do we efficiently compute P(Oj), i.e., the probability of the observation sequence given the model
Recognition - Given the observation sequence O = O1O2:::OT and a model = (A; B; ), how do we choose a corresponding state sequence Q = q1q2:::qT which is optimal in some sense, i.e., best explains the observations
Training - Given the observation sequence O = O1O2:::OT, how do we adjust the model parameters = (A; B; ) to maximize P(Oj)
A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers. It is closely related to an earlier work on optimal nonlinear filtering problem (stochastic processes) by Ruslan L. Stratonovich, who was the first to describe the forward-backward procedure.
Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.
Neural networks are not well suited for temporal sequence classification problems
Though Eigen joints has a high dimensionality it is found to work real time after dimensionality reduction using PCA.
The major components of our algorithm are real-time, which include the extraction of 3D skeletal joint locations, computation of Eigen joints, and classification.
The success rate is found to be considerably high with our dataset