This document discusses feature extraction, classification, and prediction techniques applied to EEG data to discriminate between left and right hand movements. It first provides background on EEG signals and preprocessing. It then examines feature extraction in depth, evaluating various features like mean, standard deviation, and Hjorth parameters. Classification algorithms like LDA, KNN, and neural networks are also analyzed and compared. The best results were obtained by combining Hjorth features, achieving 74% accuracy. Future work to improve these techniques is also mentioned.
2. About The Author
Olivia Moran is a leading training specialist who specialises in E-Learning instructional design and is a certified
Moodle expert. She has been working as a trainer and course developer for 3 years developing and delivery
training courses for traditional classroom, blended learning and E-learning.
Courses Olivia Moran Has Delivered:
● MOS
● ECDL
● Internet Marketing
● Social Media
● Google [Getting Irish Businesses Online]
● Web Design [FETAC Level 5]
● Adobe Dreamweaver
● Adobe Flash
● Moodle
Specialties:
★Moodle [MCCC Moodle Certified Expert]
★ E Learning Tools/ Technologies [Commercial & Opensource]
★ Microsoft Office Specialist
★ Web Design & Online Content Writer
★ Adobe Dreamweaver, Flash & Photoshop
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 3
3. Feature Extraction, Classification & Prediction
1.. ABSTRACT
1 ABSTRACT
This document will examine issues pertaining to feature extraction, classification and prediction. It will
consider the application of these techniques to unlabelled Electroencephalogram (E.E.G.) data in an
attempt to discriminate between left and right hand imagery movements. It will briefly reflect on the
need for brainwave signal preprocessing. The feature extraction and classification process will be
examined in depth and the results obtained using various classifiers will be illustrated. Classification
algorithms will be given some thought, namely Linear Discriminant Analysis (L.D.A.), K-Nearest
Neighbour (K.N.N.) and Neural Network (N.N.) analysis. This document will explore prediction and
highlight its effect on accuracy. Due to time and knowledge constraints the data could not be tested
using all the desired approaches however, these are briefly addressed. The way in which biology and
nature inspires the design of feature extraction, classification and prediction systems will be explored.
Finally future work will be touched on.
2.. IINTRODUCTIION
2 NTRODUCT ON
The study of E.E.G. data is a very important field of study that according to Ebrahimi et al (2003) has
been ‚Motivated by the hope of creating new communication channels for persons with severe motor
disabilities‛. Advances in this area of research caters for the construction of more advanced Brain
Computer Interfaces (B.C.I.’s). Wolpaw et al (2002) describes such an interface as a ‚Non-muscular
channel for sending messages and commands to the external world‛. The impact that such
technologies could have on the quality of peoples’ everyday lives, namely those who have some form
of physical disability is enormous. ‚Brain-Computer Interfacing is an interesting emerging technology
that translates intentional variations in the Electroencephalogram into a set of particular commands in
order to control a real world machine‛ Atry et al (2005). Improvements to these systems are often
made through an increased understanding of the human body and the way in which it operates.
Feature extraction, classification and prediction are all processes that our bodies carry out on a daily
basis with or without our knowledge. Studying such activities will undoubtedly lead researchers to the
creation of more biologically plausible B.C.I. solutions.
It is not only individuals who will benefit from further studies and understanding of these processes, as
feature extraction, classification and prediction have many other applications. Take for example, the
world of business. Companies everywhere have to deal with a constant bombardment of information
from both their internal and external environments. There seems to be an endless amount of both
useful and useless information. As one can imagine, it is often very difficult to find exactly what you
are looking for. When people eventually locate what they have been seeking it may be in a format
that does not suit them. This is where feature extraction, classification and prediction play their part.
These processes are often the only way in which a business can locate information gems in a sea of
data.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 4
4. This document explores the various issues pertaining to feature extraction, classification and
prediction. The application of these techniques to unlabelled E.E.G. data is examined in an attempt to
discriminate between left and right hand imagery movements. It briefly looks at brainwave signal
preprocessing. An in depth study of the feature extraction and the classification process is carried out
focusing on numerous classifiers. L.D.A., K.N.N. and N.N. classification algorithms are examined. This
document gives thought to prediction and how it could be used to improve accuracy. Due to time and
knowledge constraints the data could not be tested using all the desired approaches, however, these
methods are mentioned in this document. Biology and nature often inspire the computing industry to
produce feature extraction, classification and prediction systems that operate in the same or a similar
way as the human body does. This issue of inspiration is briefly addressed and examples from nature
are given. Finally areas for future work are considered.
3.. BRAIINWAVE SIIGNAL PREPROCESSIING
3 BRA NWAVE S GNAL PREPROCESS NG
E.E.G. data is commonly used for tasks such as discrimination between left and right hand imagery
movements. ‚An E.E.G. is a recording of the very weak electrical potentials generated by the brain on
the scalp‛ Ebrahimi et al (2003). The collection of such signals is non-invasive and they can be ‚Easily
recorded and processed with inexpensive equipment‛ Ebrahimi et al (2003). It also offers many
advantages over other methods as ‚It is based on a much simpler technology and is characterized by
much smaller time constants when compared to other noninvasive approaches such as M.E.G, P.E.T. and
F.M.R.I.‛ Ebrahimi et al (2003).
The E.E.G. data used as input for the analysis carried out during the course of this assignment had been
preprocessed. Ebrahimi et al (2003) points out ‚Some preprocessing is generally performed due to the
high levels of noise and interference usually present‛. Artifacts are factors such as motor movements,
eye blinking, electrode movement etc. that are removed, as these are not required and all the essential
data needed to carry out classification is left behind.
The E.E.G. data was recorded on two different channels, C3 and C4. These correspond to the left and
right hemisphere of the motor cortex and would have been recorded by placing electrodes over the
right and left sides of the motor cortex as shown in the figure 1 below.
Figure 1. – Showing the placing of the electrodes at channels 3 and 4 of the motor cortex.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 5
5. It is important to record signals at these two channels due to the fact that ‚When people execute or
imagine the movement of left and right hand, E.E.G. features differs in two brain hemispheres
corresponding to sensorimotor hand representation area‛ Pei & Zheng (2004). Subsequently, when an
imagined left hand movement is made, there are essentially two signals recorded C3 and C4, with both
being left signals and vice versa for the right hand imagery movements.
4.. FEATURE EXTRACTIION
4 FEATURE EXTRACT ON
A feature is described by Sriraja (2002) as ‚Any structural characteristic, transform, structural
description or graph, extracted from a signal or a part of it, for use in pattern recognition or
interpretation. It is a representation of the signal or pattern, containing only the salient information‛.
Ripley (1996) goes on to argue that a ‚Feature is a measurement on an example, so the training set of
examples has measured features and a class for each‛.
Feature extraction is concerned with the identification of features that are unique or specific to a
particular type of E.E.G. data such as all imagined left hand movements. The aim of this process is the
formation of useful new features by combining existing ones. Using such features facilitates the
process of data classification. There are multiple amounts of these features; some provide useful
information while others none. The next logical step is the elimination of features that produce the
lowest accuracy.
For each test ran the accuracy of the classifier used was calculated. This was important as it allowed
the author to determine which classifiers gave the best results for the data being examined. Wolpert
(1992) points out that ‚Estimating the accuracy of a classier is important not only to predict its future
prediction accuracy, but also for choosing a classifier from a given set (model selection), or combining
classifiers‛.
5.. THE CLASSIIFIICATIION PROCESS
5 THE CLASS F CAT ON PROCESS
5. 1. Descriptive Classifiers
In an effort to find the most appropriate type of classifier for the analysis of the E.E.G. data used in this
assignment, the author turned to descriptive methods. These included basic features like the mean,
standard deviation and kurtosis. Using this descriptive approach allows for the summarisation of the
test and training data. This is useful where the sample contains a large amount of variables.
5. 1. 1. Mean
The mean is ‚Short for arithmetic mean: in descriptive statistics, the average value, calculated for a
finite set of scores by adding the scores together and then dividing the total by the number of scores‛
Coleman (2003). During ‘Descriptive Features – Test 1’ an accuracy of 64% was obtained using the
mean feature. It performed slightly higher than that of the standard deviation, which reached 61%
accuracy.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 6
6. 5. 1. 2. Standard Deviation
Standard Deviation is defined by Coleman (2003) as ‚A measure of the degree of dispersion, variability
or scatter in a set of scores, expressed in the same units as the scores themselves, defined as the square
root of the variance‛. ‘Descriptive Features – Test 2’ attempted to classify the E.E.G. data by utilising
the feature of standard deviation. An accuracy of 61% was achieved.
5. 1. 3. Kurtosis
Kurtosis is useful in that it ‚Provides information about the ‘peakedness’ of the distribution. If the
distribution is perfectly normal you would obtain a skewness and kurtosis value of 0‛ Pallant (2001).
The results obtained during ‘Descriptive Features – Test 3’ using the kurtosis feature were
disappointing with an accuracy of 49%. Kurtosis in this instance was not able to offer a higher level of
separability than with both the mean and standard deviation. Kurtosis is usually more appropriate for
lager samples, with which more satisfactory results could be accomplished. As noted by Tabachnick &
Fidell (1996), ‚Kurtosis can result in an underestimate of the variance, however, this risk is also reduced
with a large sample‛.
5. 1. 4. Combination Of Mean, Standard Deviation And Kurtosis Features
In some instances the combination of features can allow for greater accuracy, however this was not the
case for the E.E.G. data that was examined using the mean, standard deviation and kurtosis. Test
results from ‘Descriptive Features – Test 4’ showed accuracy to be in the region of 49% giving much
lower performance than that of the mean and standard deviation features when used individually.
5. 1. 5. Conclusion Drawn From Mean, Standard Deviation And Kurtosis
Feature Tests
The accuracy of the mean as a classifier was substantially higher than that of the standard deviation
and kurtosis as well as a combination of all three. On the other hand, it still did not offer a satisfactory
level of separation between the imagery left and right signals. These three features it seems are not
appropriate for E.E.G. data and are better suited to more simple forms of data. With this in mind the
author turned to the Hjorth features.
5. 2. Hjorth Features
A number of Hjorth parameters were drawn upon during the course of this assignment. ‚In 1970, Bo
Hjorth derived certain features that described the E.E.G. signal by means of simple time domain
analysis. These parameters, namely Activity, Mobility and Complexity, together characterize the E.E.G.
pattern in terms of amplitude, time scale and complexity‛ Sriraja (2002). These were used in an
attempt to achieve a separation between imagery left and right hand signals.
The Hjorth approach involves the measurement of the E.E.G. signal ‚For successive epochs (or windows)
of one to several seconds. Two of the attributes are obtained from the first and second time derivates
of the amplitude fluctuations in the signal. The first derivative is the rate of change of the signal’s
amplitude. At peaks and troughs the first derivative is zero. At other points it will be positive or
negative depending on whether the amplitude is increasing or decreasing with time. The steeper the
slope of the wave, the greater will be the amplitude of the first derivative. The second derivative is
determined by taking the first derivative of the first derivative of the signal. Peaks and troughs in the
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 7
7. first derivative, which correspond to points of greatest slope in the original signal, result in zero
amplitude in the second derivative, and so forth‛ Miranda & Brouse (2005).
According to Sriraja (2002) mathematically the equation for mobility and complexity resembles the
following if x1, x2, …, xn are the n EEG data values, and the consecutive differences, xn - xn-1 be
denoted as dn
5. 2. 1. Activity Feature
Activity is defined by Miranda & Brouse (2005) as ‚The variance of the amplitude fluctuations in the
epoch‛. This feature during ‘Hjorth Features – Test 1’ was able to achieve only an accuracy of 44% and
therefore offered very poor separability. ‘Hjorth Features – Test 2’ used the same classifier, however
the time interval for sampling was changed from the 6th second to the 7th. This change resulted in an
accuracy of 55%, an increase of 11% on the previous test. ‘Hjorth Features – Test 3’ was also carried
out using the activity feature. This test aimed to determine whether or not changing the number of
neurons used in the N.N. would have a notable effect on the accuracy of the classification. A change in
this instance of the neuron numbers did not have a significant impact on performance.
5. 2. 2. Mobility Feature
‚Mobility is calculated by taking the square root of the variance of the first derivative divided by the
variance of the primary signal‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 4’ utilised this
mobility feature for classification purposes. Results from this test showed that accuracy using this
feature stands at 52%.
5. 2. 3. Complexity Feature
Complexity is described as ‚The ratio of the mobility of the first derivative of the signal to the mobility
of the signal itself‛ Miranda & Brouse (2005). ‘Hjorth Features – Test 5’ examined the complexity
feature and its effect on accuracy. Results for this test showed the level of accuracy using this feature
to be 64%.
5. 2. 4. Combination Of Activity, Mobility And Complexity Features
‘Hjorth Features – Test 6’ combined the activity, mobility and complexity feature in the hope of
increasing accuracy further. This test showed very mediocre results with accuracy at 56%. However,
when the data windows were specified as in ‘Hjorth Features – Test 7’ more promising results were
recorded. Accuracy at 74% was achieved with a greater level of separability of the imagery left and
right hand signals than all other pervious results.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 8
8. Combining multiple features is useful as it can often lead to improved accuracy. Lotte et al (2007)
highlights this point arguing, ‚A combination of similar classifiers is very likely to outperform one of
the classifiers on its own. Actually, combining classifiers is known to reduce the variance and thus the
classification error‛.
6.. CLASSIIFIICATIION ALGORIITHMS
6 CLASS F CAT ON ALGOR THMS
Kohavi (1995) defines a classifier as ‚A function that maps an unlabelled instance to a label using
internal data structures‛. Three different types of algorithms were used for classification. These
included the L.D.A, K.N.N. and the N.N. classification algorithms.
6.1. L.D.A. Classification
L.D.A. also known as Fisher’s L.D.A. is ‚Often used to investigate the difference between various
groups when their relationship is not clear. The goal of a discriminant analysis is to find a set of
features or discriminants whose values are such that the different groups are separated as much as
possible‛ Sriraja (2002). Lotte et al (2007) describes the aim of L.D.A. as being to ‚Use hyperplanes to
separate the data representing the different classes. For a two-class problem, the class of a feature
vector depends on which side of the hyperplane the vector is‛. The L.D.A. is concerned with finding
the features that will maximise the distance between the two classes and reducing the distance that
exists among the interclass. This concept is illustrated in figure 2 below.
Imagery Left Hand
Data
Imagery Right
Hand Data
Imagery Right Hand
Data
Figure 2. – Shows a hyperplane that is used to illustrate graphically the separation of the classes i.e. the
separability of the imagery left hand data from the imagery right hand data
The equation for L.D.A. can be denoted in mathematical terms. Sriraja (2002) discusses the equation of
L.D.A. and the principles on which it works. ‚First, a linear combination of the features x are projected
into a new feature, y. The idea is to have a projection such that the y’s from the two classes would be
as much separated as possible. The measure of separation between the two sets of y’s is evaluated in
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 9
9. terms of the respective means and the variances of the projected classes . . . The objective is therefore
to have a linear combination such that the following ratio is maximised.‛
where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and
where 1 y and 2 y are the means of the two sets y’s, y1and y2 respectively, and n1 and n2 are the
sample sizes for the two sets‛.
During testing the author utilised scatter graphs like figure 3 below to display graphically the results
from the tests. Figure 3 shows the scatter graph that was constructed as part of test, which attempted
classification of the E.E.G. data using the mean feature. The accuracy achieved using this feature was
64%.
0.08
0.06
0.04
0.02
0
-0.02
-0.04
-0.06
-0.08
-0.06 -0.04 -0.02 0 0.02 0.04 0.06 0.08
Figure 3. – Mean Scatter Graph
The next graph Figure 4 illustrates the results of a test examining standard deviation with the accuracy
of this feature standing at 61%.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 10
10. 0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1
Figure 4. – Standard Deviation Scatter Graph
Scatter graphs are described by Fisher & Holtom (1999) as useful for the presentation of ‚The
relationship between two different types of information plotted on horizontal, x, and vertical, y, axis.
You simply plot the point at which the values meet, to get an idea of the overall distribution of your
data‛. Pallant (2001) is keen to point out that ‚The scatter graph also provides a general indication of
the strength of the relationship between your two variables. If the relationship is weak, the points will
be all over the place, in a blob type arrangement. For a strong relationship the points will form a
vague cigar shape with a definite clumping of scores around an imaginary straight line‛.
6.2. K.N.N. Classification
The K.N.N. function is concerned with the computation of the minimum distance between the test data
and the data used for training. Ripley (1996) defines test data as a ‚Set of examples used only to assess
the performance of a fully specified classifier‛ while training data is a ‚Set of examples used for
learning, that is to fit the parameters of the classifier‛. The K.N.N. belongs to the family of
discriminative nonlinear classifiers. According to Lotte et al (2007) the main objective of this method is
‚To assign to an unseen point the dominant class among its k nearest neighbours within the training
set‛. A metric distance may be used to find the nearest neigbour. ‚With a sufficiently high value of k
and enough training samples, K.N.N. can approximate any function which enables it to produce
nonlinear decision boundaries‛ Lotte et al (2007).
6.3. N.N. Classification
N.N.’s are widely used for classification ‚Due to their non-linear model and parallel computation
capabilities‛ Sriraja (2002). N.N.’s are described by Lotte et al (2007) as ‚An Assembly of several
artificial neurons which enables us to produce nonlinear decision boundaries‛. The N.N. used for the
classification tests was the Multilayer Perception (M.L.P.) which is one of the more popular N.N.’s. It
used 10 linear neurons for the first input layer and then 12 for the hidden layer. In this M.L.P. N.N
‚Each neuron’s input is connected with the output of the previous layer’s neurons whereas the neurons
of the output layer determine the class of the input feature vector‛ Lotte et al (2007).
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 11
11. M.L.P. are useful for classification, provided they have a satisfactory amount of neurons and layers
‚They can approximate any continuous function‛ Lotte et al (2007). They are commonly used as they
can quickly adapt to different problems and situations. However, it must be noted, ‚The fact that
M.L.P. are universal approximators makes these classifiers sensitive to overtraining, especially with such
noisy and non-stationary data as E.E.G. therefore, careful architecture selection and regularization is
required‛ Lotte et al (2007).
The greater the amount of neurons available or used, the greater the ability of the N.N. to learn
however, they are susceptible to over learning and therefore sometimes a lower amount of neurons
gives greater accuracy. Cross validation is useful as it is concerned with preventing the N.N. from
learning too much and consequently ignoring new data when it is inputted.
Usually training sets are small in size as it is very time consuming and costly collecting ‚Known cases for
training and testing‛ Masters (1995). These small cases are often broken down further into relatively
small sets for both training and testing, however this is not a desirable approach. Instead of taking this
action one can avail of cross validation. This is a process which ‚Combines training and validation into
one operation‛ Masters (1995).
When constructing a prediction rule reducing the error rate where possible is an important task. Efron
(1983) describes an error rate as the ‚Probability of incorrectly classifying a randomly selected future
case, in other words the exception‛ to the rule. Cross validation is often used to reduce this error rate
and ‚Provides a nearly unbiased estimate, using only the original data‛ Efron (1983).
6. 3. 1. Euclidean Distance
A part of the N.N. algorithm examines the Euclidean distance. This distance refers to the difference
between the coordinates i.e. location of a set of objects squared. This Euclidean distance between two
points where
and can be denoted as
6.. PREDIICTIION
6 PRED CT ON
Frank et al (2001) defines a time series as ‚A sequence of vectors, x(t), t = 0,1,… , where t represents
elapsed time. Theoretically, x may be a value which varies continuously with t, such as a temperature‛.
This time series method can be used in prediction in what is known as time series prediction. It involves
the examination of past performance to predict future performance.
This according to Coyle et al (2004) can be used to improve classification accuracy. Their work uses a
‚Novel feature extraction procedure which carries out self-organising fuzzy neural network based time
series prediction, performing feature extraction in the time domain only‛. Using such a method in
their studies allowed for classification accuracies in the region of 94%. They argue that the main
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 12
12. advantage of this approach is that ‚The problem of specifying the neural network architecture does
not have to be considered‛. Instead of adapting the parameters for individual users, the system can
‚Self-organise the network architecture, adding and pruning neurons as required‛ just like with the
human body.
The author, using 6-step ahead prediction carried out a number of tests. The parameters for these
tests were set at the following, unless otherwise stated.
Data was trained and tested with x (trl3)
Embedding Dimension = 6
Time Lag = 1
Cross Validation was not used
Number of neurons available to the neural network = one layer of 6.
All results were graphically displayed on a chart like that seen in figure 5 below.
Training Vectors
0.15
0.1
0.05
Target and Output
0
-0.05
-0.1
-0.15
-0.2
0 500 1000 1500 2000 2500 3000
Time Step t
Figure 5. – Shows the training data in blue and the test data in red. The difference between these two
lines is referred to as the root square error or simply the error rate.
7. 1. One Layer Neural Network
The first test examined accuracy using a neural network with one layer of 6 neurons. This test was ran
10 times and then the average training root mean square and testing root mean square were
calculated. The training root mean square was recorded at 0.0324 and the testing root mean square at
0.0313.
7. 2. Multi Layer Neural Network
The next test was conducted using the exact same parameters except the neural network was changed
from a single layer network with 6 neurons to one that also has a hidden layer of 8 neurons. The
results from this test were slightly worse than the previous with a training and testing root mean
square of 0.0326 and 0.0314. The difference between the figures from test 1 and test 2 were extremely
minuet.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 13
13. 7. 3. Cross Validation
The next test was exactly the same as test 1 except that cross validation was used to determine whether
or not it has a negative or positive effect. The training data scored slightly better with cross validation
at 0.0293 compared to 0.0324 obtained in test 1. On the other hand the testing data performed better
in test 1 with 0.0313 rather than 0.0317 found with cross validation.
7. 4. Left E.E.G. Data
A test was carried out which used trl3 to train the network and trl4 to test it. The training mean
square root was relatively the same as previous experiments using the same parameters for the training
data. The testing mean square root however, was much improved with a result of 0.0240 compared to
0.0313 using trl3 for training.
7. 5. Right E.E.G. Data
Tests were conducted using the right data. The N.N. was trained and tested with trr3. The error was a
lot less than that found with the tests on the left data using the same parameters. 0.0292 was recorded
for the training mean root square error and 0.0281 for the testing mean root square error. The right
data was also tested to see what effect testing the N.N. with trr4 instead or trr3 would have on the
performance. The training root mean square error stayed more or less the same and the testing root
mean square error increased slightly to 0.0293.
8.. OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACTIION
8 OTHER METHODS THAT COULD BE USED FOR FEATURE EXTRACT ON
There are many other methods that could be used and that offer satisfactory performance when it
comes to feature extraction for B.C.I’s.
8. 1. Amplitude And Phase Coupling Measure
One such approach was created by Wei et al (2007), it is known as the ‘Amplitude and Phase Coupling
Measure’. This method is concerned with ‚Using amplitude and phase coupling measures, quantified
by a nonlinear regressive coefficient and phase locking value respectively‛. Wei and his colleagues
carried out studies utilising this approach. The results obtained from the application of this feature
extraction method were promising. The ‚Averaged classification accuracies of the five subjects ranged
from 87.4% to 92.9%‛ and the ‚Best classification accuracies ranged between 84.4% and 99.6%‛. The
conclusion reached from these studies is that ‚The combination of coupling and autoregressive
features can effectively improve the classification accuracy due to their complementarities‛ Wei et al
(2007).
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 14
14. 8. 2. Combination Of Classifiers
Some researchers in an effort to improve performance and accuracy have begun using multiple
classifiers to achieve the desired results. The author attempted this approach with the combination of
mean, standard deviation and kurtosis as well as activity, mobility and complexity however, there are
various different strategies that can be followed. These include boosting, voting and stacking to name
but a few. Boosting basically operates on the principle of cooperation with ‚Each classifier focusing on
the errors committed by the previous ones‛ Lotte et al (2007).
Voting on the other hand works like a voting system. The different modules of the N.N. are ‚Modeled
as multiple voters electing one candidate in a single ballot election assuming the availability of votes'
preferences and intensities. All modules are considered as candidates as well as voters. Voting bids are
the output-activations of the modules forming the cooperative modular structure‛ Auda et al (1995).
Those candidates who have the majority vote wins. According to Lotte et al (2007) ‚Voting is the most
popular way of combining classifiers in B.C.I. research, probably because it is simple and efficient‛.
Another strategy used for the combining of classifiers is what’s known as ‘Stacking’. This method
according to Ghorbani & Owrangh (2001) ‚Improves classification performance and generalization
accuracy over single level cross-validation model‛.
8. 3. Multivariate Autoregressive Analysis (M.V.A.R.)
Studies have been conducted in the past based on the M.V.A.R. model. Pei et al (2004) carried out such
a study and boasts a classification accuracy of 88.57%. They describe the MVAR model as ‚The
extension form of univariate A.R. model‛ and argue, ‚Using the coefficients of M.V.A.R. model as EEG
features is feasible‛.
9.. IINSPIIRATIION FROM BIIOLOGY
9 NSP RAT ON FROM B OLOGY
There is no doubt that inspiration for some of the classification and prediction techniques that we use
today came from the world of biology. Shadbolt (2004) points out that ‚We see complexity all around
us in the natural world – from the cytology and fine structures of cells to the organization of the
nervous system . . . Biological systems cope with and glory in complexity – they seem to scale, to be
robust and inherently adaptable at the system level . . . Nature might provide the most direct
inspiration‛. The author shares the view of Bamford et al (2006) that ‚An attempt to imitate a
biological phenomenon is spawning innovative system designs in an emerging alternative
computational paradigm with both specific and yet unexplored potential‛.
9. 1. Classification And Object Recognition
Our brains are constantly classifying things in our everyday environment whether we are aware of it or
not. Classification is the process that is responsible for letting us determine what the objects around us
are i.e. a chair, a car, a person. It even allows us to recognise different faces of people with whom we
come in contact with. The brain is able to distinguish each specific object by examining its numerous
features and does so with great speed and accuracy. Many systems seek to reproduce a similar means
of classifying data and can be useful in nearly every kind of industry. Take for example, the medical
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 15
15. industry in which classification plays a crucial role. Classification is used extensively for the
identification of almost every kind of disease and illness. The process of diagnosis would be much
more complex and time consuming if classification techniques were not applied to it.
9. 2. Self-Organisation
Computer systems i.e. neural networks can be constructed on the same principles and concepts of self-
organisation in humans. The term self-organisation is used to describe the process by which ‚Internal
structures can evolve without the intervention of an external designer or the presence of some
centralised form of internal control. If the capacities of the system satisfy a number of constraints, it
can develop a distributed form of internal structure through a process of self-organisation‛ Cilliers
(1998). Self-organising maps are widely used a method for feature extraction and data mapping as
well as prediction. Self-organising neural networks can encompass a time series prediction element
and often with huge success. These can be extremely useful for predicting trends in different areas
such as weather forecasting, marketing, the list is endless.
The various prediction algorithms available work in the same way as the nervous system in humans.
These programs aim to replicate the ‘anticipatory neural activity’ that occurs in the body and reproduce
this in a system. Take for example a financial decisions system recently developed. This system looked
at how using the ‘anticipatory neural activity’ element and taking it into consideration could help
people using this system to make decisions that are more likely to be successful and thus less risky.
When people are making financial decisions, they can often opt for an option that seems like the
irrational one. The reasons for this irrational thought had not previously been known.
Kuhnen & Knutson (2005) examined ‚Whether anticipatory neural activity would predict optimal and
suboptimal choices in a financial decision-making task‛. They observed that the nucleus accumbens
was more active when risky choices were being made and that anterior insula when riskless options
were being followed. From their findings they concluded that particular neural circuits linked to
anticipatory affect would either hinder or encourage an individual to go for either a risky or riskless
choice. They also uncovered the fact that an over activation in these circuits are more likely to cause
investing mistake and ‚Thus, consideration of anticipatory neural mechanisms may add predictive
power to the rational actor model of economic decision making‛. The system was able to replicate
relatively successfully the way in which humans make investment decisions.
10.. FURTURE WORK
10 FURTURE WORK
The combination of classifiers is gaining popularity and becoming more widely used as a means of
improving accuracy and performance. From researching this topic one can see that most publications
deal with one particular classifier with little effort been taken to compare one classifier to the next.
Studies could be undertaken in an attempt to compare these to particular criteria.
There is a lot more room for improvement considering the algorithms that are available at the
moment. A deeper understanding of the human brain and how it classifies and predicts should lead to
the creation of more biologically plausible solutions.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 16
16. 11.. CONCLUSIION
11 CONCLUS ON
This document addressed the various issues pertaining to feature extraction, classification and
prediction. It focused on the application of these techniques to unlabelled E.E.G. data. This was done
in an effort to discriminate between left and right hand imagery movements. It briefly reflected on
the need for brainwave signal preprocessing. An in depth analysis of the feature extraction and
classification process was carried out and the results highlighted. Classification algorithms were
examined, namely L.D.A., K.N.N. and N.N. This document looked at prediction and its effect on
accuracy. Due to time and knowledge constraints the data could not be tested using all the desired
approaches, however, a number of these other methods not tested were dealt with. This document
also highlighted the fact that inspiration for the design of feature extraction, classification and
prediction systems often comes from nature. Finally thought was given to future work.
From studying the E.E.G. data and carrying out various tests using numerous parameters and classifiers,
it has been concluded that a combination of the three Hjorth features, activity, mobility and
complexity gives the highest level of accuracy. The author discovered that the descriptive classifiers
drawn upon are not suitable for E.E.G. data, as they do not provide a satisfactory level of separation,
they work better with simple data. It was found that feature extraction and classification enjoyed
more success by using cross validation and a multiple layer N.N. in contrast to prediction that was best
suited to a single layer N.N. without cross validation.
The greatest level of accuracy recorded using the combined Hjorth features was 74%. Separability of
the left hand imagery motor signal from the right was greater at 7 seconds than it was at 6. Accuracy
was improved by specifying the data window extents of s=680 and e=700. Prediction tests indicated
that left hand data is more easily separated and classified than the right hand data. The author also
realised that the N.N. performed better when different data was used for training and testing.
New methods of feature extraction, classification and prediction will undoubtedly be discovered as the
understanding of the human body evolves. The research of this particular topic can extend over
multiple disciplines and therefore it is likely that ‚Insights from one subject will inform the thinking in
another‛ Shadbolt (2004). Advances made in the field of science often results in complimentary gains
in the area of computing and vice versa.
All the processes discussed in this document can have a huge impact on the lives of individuals,
businesses and society at large. Many people suffering from motor impairments rely heavily on B.C.I.
technologies that incorporate classification and prediction techniques for everyday living. They will
undoubtedly progress society towards the creation of a safer and more inclusive society. Classification
and prediction can often be an integral part of any business decision. A manager may consult with
his/her computer system to make risky business decisions such as, should I invest in this new product?,
how much stock should I buy? etc. Society also benefits from feature extraction, classification and
prediction. These processes are widely used for disease and illness diagnoses and other things such as
weather forecasting and storm prediction to name but a few. Consequently, it is safe to assume that
this field of study will remain a popular one in the years to come and make many more advances.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 17
17. BIIBLIIOGRAPHY
B BL OGRAPHY
ATRY, F. & OMIDVARNIA, A. H. & SETAREHDAN, S. K. (2005) ‚Model Based E.E.G. Signal Purification to
Improve the Accuracy of the B.C.I. Systems‛ Proceedings from the 13th European Signal Processing
Conference.
Auda, G. & Kamel, M. & Raafat, H. (1995) ‚Voting Schemes for Cooperative Neural Network Classifiers‛
Neural Networks 3(3), pp. 1240-1243. Proceedings of the IEEE International Conference on Neural
Networks.
Bamford, S. & Murray, A. & Willshaw, D. J. (2006) ‚Synaptic Rewiring in Neuromorphic VLSI for
Topographic Map Formation‛ [Internet], Date Accessed 15 April 2007, Available From:
http://www.see.ed.ac.uk/~s0454958/interimreport.
pdf.
ColEman, A. M. (2003) ‚Oxford Dictionary of Psychology‛ Oxford: Oxford University Press.
COYLE, D. & PRASAD, g. & MCGINNITY, T. M. (2004) ‚extracting Features for a Brain-Computer
Interface by Self-Organising Fuzzy Neural Network-Based Time Series Prediction‛ Proceedings from the
26th Annual International Conference of the IEEE EMBS.
Cilliers, P. (1998) ‚Complexity and Postmodernism: Understanding Complex Systems‛ London:
Routledge.
EBRAHIMI, T. & VESIN, J. M. & GARCIA, G. (2003) ‚Brain-Computer Interface in Multimedia
Communication‛ IEEE Signal Processing Magazine 20(1), pp. 14-24.
Efron, B. (1983) ‚Estimating the Error
Rate of Prediction Rules: Improvement on Cross Validation‛ Journal of the American Statistical
Association 78(382), pp. 316-331.
FISHER, E. & HOLTOM, D (1999) ‚Enjoy Writing Your Science Thesis or Dissertation – A Step by Step
Guide to Planning and Writing ‛ London: Imperial College Press.
FRANK, R. J. & DAVEY, N. & HUNT, S. P. (2001) ‚Time Series Prediction and Neural Networks‛ Journal of
Intelligent and Robotic Systems 31(1-3), pp. 91-103.
GHORBANI, A. A. & OWRANGH, K. (2001) ‚Stacked Generalization in Neural Networks: Generalization
on Statistically Neutral Problems‛ Neural Networks 3, pp. 1715-1720, Proceedings from the IJCNN
International Joint Conference on Neural Networks.
Kohavi, R. (1995) ‚A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model
Selection‛ IJCAI Proceedings from the International Joint Conference on Artificial Intelligence.
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 18
18. Kuhnen, C. M. & knutson, b. (2005) ‚The Neural Basis of Financial Risk Taking‛
Neuron 47(5), pp. 763-770.
LOTTE, F. & CONGEDO, M. & LECUYER, A. & LAMARCHE, F. & ARNALDI, B. (2007) ‚A Review of
Classification Algorithms for EEG-Based Brain-Computer Interfaces‛ Journal of Neural Engineering 4,
pp. R1-R13.
MASTERS, T. (1995) ‚Neural, Novel & Hybrid Algorithms for Time Series Prediction‛ New York: John
Wiley & Sons Inc.
MIRANDA, E. & BROUSE, A. (2005) ‚Toward Direct Brain-Computer Musical Interfaces‛ Proceedings
from the 2005 Conference on New Interfaces for Musical Expression, pp. 216 - 219.
PALLANT, J. (2001) ‚S.P.S.S. Survival Manual – A Step By Step Guide To Data Analysis Using S.P.S.S.‛
Berkshire: Open University Press.
PEI, X. M. & ZHENG, C. X. (2004) ‚Feature Extraction and Classification of Brain Motor Imagery Task
Based on MVAR Model‛ Machine Learning and Cybernetics 6, pp. 3726 – 3730, Proceedings from the 3rd
International Conference on Machine Learning and Cybernetics.
RIPLEY, B. D. (1996) ‚Pattern Recognition and Neural Networks‛ Cambridge: Cambridge University
Press.
SHADBOLT, N. (2004) ‚From the Editor in Chief: Nature-Inspired Computing‛ IEEE Intelligent Systems
19(1), pp.2-3.
SRIRAJA, Y. (2002) ‚E.E.G. Signal Analysis for Detection of Alzheimer’s Disease‛ PhD Thesis, Texas Tech
University, Data Accessed: 11 April 2007, Available From: http://webpages.acs.ttu.edu
/ysriraja/MSthesis/Thesis.pdf.
TABACHNICK, B. G. & FIDELL, L. S. (1996) ‚Using Multivariate Statistics‛ 3 ed. New York: Harper Collins.
WEI, Q. & WANG, Y. & GAO, X. & GAO, S. (2007) ‚Amplitude and Phase Coupling Measures for Feature
Extraction in an E.E.G.-Based Brain-Computer Interface‛ Journal of Neural Engineering 4, pp. 120-129.
Wolpaw, J. R. & Birbaumer, N. & McFarland, D. J. & Pfurtscheller, G. & Vaughan, T. M. (2002) ‚Brain-
Computer Interfaces for Communication and Control‛ The Journal of Clinical Neurophysiology 113(6),
pp. 767-91.
WOLPERT, D. H. (1992) ‚Stacked Generalization‛ Neural Networks 5(2), pp. 241-259, Pergamon Press.
Heading One
FEATURE EXTRACTION, CLASSIFICATION & PREDICTION Page: 19