2. WHAT IS COVARIANCE
• Degree to which the value of a dependent variable and an
associated independent variable moves in tandem.
• Measures the degree to which two variables are linearly associated.
• A large covariance can mean a strong relationship between variables
3. Consider two random variables ‘x’ and ‘y’ ,
x1 y1
x2 y2
x3 y3
. .
. .
. .
. .
xn yn
The extent to which two vary together can be measured by calculating covariance.
It is defined as: Cov(x,y)=E {[x-E(x)].[y-E(y)]}
Or Cov(x,y)= E(x.y)-E(x).E(y)
where E(x) & E(y) are expected values of x & y.
Y
X
4. INTERPRETING COVARIANCE
COVARIANCE BETWEEN TWO VARIABLES:
COV(X,Y) < 0
COV(X,Y) > 0
Zero covariance:
1. If two variables are independent,it means COV(X,Y) =0 .
2. However, zero covariance not necessarily mean that the variables are independent.
A nonlinear relationship can exist that still would result in a covariance value of zero.
5. Properties of Covariance:
If X , Y are random variables and a , b are constants then,
Cov(X,a)=0
Cov(aX , bY)=ab.Cov(X,Y)
Cov(X+a,Y+b)= Cov(X,Y)
Symmetry: Cov(X, Y ) = Cov(Y, X)
Relation to variance: Variance is a special case of the covariance when the two
variables are identical.
Var(X) = Cov(X, X),
Var(X +Y ) = Var(X)+Var(Y )+ 2 Cov(X, Y )
6. The Covariance Formula for sample:
Cov(X,Y) = Σ E((X-μ)E(Y-ν)) / n-1
where:
X is a random variable
E(X) = μ is the expected value (the mean) of the random variable X and
E(Y) = ν is the expected value (the mean) of the random variable Y
n = the number of items in the data sample
SOLVED EXAMPLE
Question: Calculate covariance for the following data set:
x: 2.1, 2.5, 3.6, 4.0
y: 8, 10, 12, 14
Solution : For given sample, n=4
Substitute the values into the formula and solve:
E(X) = μ = 11 and E(Y) = ν = 3.1
Cov(X,Y) = ΣE((X-μ)(Y-ν)) / n-1
= (2.1-3.1)(8-11)+(2.5-3.1)(10-11)+(3.6-3.1)(12-11)+(4.0-3.1)(14-11) /4-1
= (-1)(-3) + (-0.6)(-1)+(.5)(1)+(0.9)(3) / 3
= 3 + 0.6 + .5 + 2.7 / 3
= 2.267
The result is positive, meaning that the variables are positively related.
7. .
A large covariance can mean a strong relationship
between variables. However, you can’t compare variances over data sets
with different scales (like pounds and inches)
Covariance is affected by changes in the center (i.e. mean) or scale of the
variable i.e. a weak covariance in one data set may be a strong one in a
different data set with different scales.
The main problem with covariance is that the wide range of results that it
takes on makes it hard to interpret .This wide range of values is cause by a
simple fact; The larger the X andY values, the larger the covariance
It is not possible to determine the relative strength of the relationship from