SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
1
SECTION 1
Statistics: is the science of obtaining data, organizing, summarizing, and
presenting, analyzing, interpreting and drawing conclusions based on the data to
give the best decision.
Statistics divided in to two distinct parts:
1- Descriptive Statistics: It is concerned only with the collection, organization,
summarizing, analysis and presentation of an array of numerical qualitative or
quantitative data. Descriptive statistics include the mean, median, mode, standard
deviation, range, etc.
2- Inferential Statistics: it is consist of methods for drawing conclusion based on
the data to give the best decision. Its divide in to two parts also:
A- Estimation
B- Testing Hypothesis
Population: Is the complete collection of all elements to be studied.
Finite (countable) Population: A population is called finite if it is possible to
count its individuals. For example, the number of students in Shaqlawa technical
institute or number of computers in a libratory.
Infinite (uncountable) Population: A population is called infinite if it is
impossible to count its individuals, for example the number of bacteria's in a
garden, number of fishes in a sea.
Census: is the collection of data from every elements of population.
Sample: is a sub- collection of elements drawn from a population.
Sampling: the process of selecting a subset of data from the population is called
Sampling.
2
Sources of collecting the data:
1- Historical Sources
2- Field Sources
Probability (Random) Samples are drawn from populations through several
different sampling methods:
1- Simple Random Sampling
Every member of the population (N) has an equal chance of being selected for
your sample (n). This is arguably the best sampling method, as your samples
almost guaranteed to be representative of your population. However, it is rarely
ever used due to being too impractical.
2- Systematic Sampling
In this method, every nth individual from the population (N) is placed in the
sample (n). For example, if you add every 7th individual to walk out of a
supermarket to your sample, you are performing systematic sampling.
3- Stratified Sampling
A general problem with random sampling is that you could, by chance, miss out a
particular group in the sample. However, if you form the population into groups,
and sample from each group, you can make sure the sample is representative. In
METHODS OF
COLLECTING THE DATA
SAMPLES
PROBABILITY
(RANDOM)
NON
PROBABLITY
CENSUS
3
stratified sampling, the population is divided into groups called strata. A sample is
then drawn from within these strata. Some examples of strata commonly used by
the ABS are States, Age and Sex. Other strata may be religion, academic ability
or marital status.
4- MULTI-STAGE SAMPLING
Multi-stage sampling is like cluster sampling, but involves selecting a sample
within each chosen cluster, rather than including all units in the cluster. Thus,
multi-stage sampling involves selecting a sample in at least two stages. In the first
stage, large groups or clusters are selected. These clusters are designed to contain
more population units than are required for the final sample. In the second stage,
population units are chosen from selected clusters to derive a final sample. If
more than two stages are used, the process of choosing population units within
clusters continues until the final sample is achieved.
Variable: is a characteristic or property of the elements in the population. The
name of variable is derived from the fact that any particular characteristic may
vary among the elements in a population.
Variables
Quantitative variables
Descrete variables
(Number of students)
Continuous variables
(Hieght, Weight)
Qualitative (descriptive)
variables
4
Levels of Measurement
Variables can also be classified by how they are categorized, counted, or
measured. Four common levels of measurement are:-
1. Nominal level of measurement: It is characterized by data that consist of
names, labels, or categories only. The data cannot be arranged in an
ordering scheme (Such as low to high). For example, the genders of
students (male, female).
2. Ordinal level of measurement: It involves data that may be arranged in
some order, but differences between data values either cannot be
determined or are meaningless. For example, the letter graded of students
(A, B, C, and D).
3. Interval level of measurement: It is like the ordinal level, with the
additional property that we can determine meaningful amounts of
differences between data. However, there is no meaningful zero (where
none of the quantity is not present). For example, the temperature (zero
degree Fahrenheit does not mean true zero), the years 1000, 2000, 1776
(time did not begin in the year zero).
4. Ratio level of measurement: It is the interval level modified to include
true zero (where zero indicates that none of the quantity is present). For
example, the weights of peoples.
Some Examples of Levels of Measurements
# Nominal level data Ordinal level data
Interval level
data
Ratio level
data
1
Eye color (blue, brown,
green, black)
Student’s Grade (A, B,
C, D)
Intelligence
level
Height
2
Political affiliation (PDK,
PUK, …)
Rating scale (poor,
good, excellent)
Temperature Weight
3
Religious affiliation
(Muslim, Christian, Jewish)
Ranking of tennis
players.
Years Time
4
Major field (Mathematics,
Arts, Computers, …)
Level of education Salary
5
Section 2
Frequency Distribution (Table):
After a researcher might have gotten a raw data from any source, there is a need
for the raw data (ungrouped) to be arranged and organized in a meaningful way in
order to be able to describe and come up with a useful inference. The method that
is being used for such organization and arrangement is called frequency
distribution. Frequency means the number of times something happens.
Frequency distribution simply means organizing of raw data in table from using
classes and frequencies.
1- Frequency Distribution for Qualitative variables:
Frequency Distribution for Qualitative variables lists all classes and the number of
elements that belong to each of the classes.
Example1: the following list gives the rank of a sample that consists of 25 clerks
in Soran institute:
Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher,
Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant
Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher,
Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher,
Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher.
Create a frequency distribution for the above data.
Solution:
FrequencyFrequency (tally)Classes (rank)
4Lecturer
5Assistant lecturer
6Researcher
10Assistant Researcher
25Total
6
Relative Frequency of a Class:
The relative frequency of a class is obtained by dividing the frequency of class by
the sum of the all frequencies.
Example 2: depending on the previous example, calculate the relative frequency.
Solution:
Relative FrequencyFrequencyClasses (rank)
4/25=0.164Lecturer
5/25=0.25Assistant lecturer
6/25=0.246Researcher
10/25=0.410Assistant Researcher
125Total
2- Frequency Distribution for Quantitative variables
Total Range (T.R): is equal to highest value minus lowest value in the data set.
Number of classes: the appropriate number of classes may be decided by Yules
formula which is as follows:
Number of classes= where n is the total number of observation.
Class Width= T.R/ No. of classes
Class Width (Length) is the difference between two consecutive lower class limit
or two consecutive lower class boundaries. The class width can be found by the
following formula:
Lower class limits (L.C.L) are the smallest numbers that can actually belong to
the different classes.
4
n2.5
7
Upper class limits are the largest numbers that can actually belong to the
different classes.
Class marks (class midpoints): each class midpoint can be found by:
Midpoint of any class (xi) = (L.C.L of this class+ U.C.L. of this class)/2
Frequency (F): is the number of values in a specific class of the distribution.
A- Frequency Distribution for Discrete variables:
The lower and upper limits of the frequency distribution of discrete variables are
as below:
Frequency
Class
Upper limitLower limit
f1Xs+W-1Xs
f2Xs+2W-1Xs+W
f3Xs+3W-1Xs+2W
.
.
.
.
.
.
FmXs+M.W-1Xs+(M-1)W
Where:
Xs: the lowest value
W: class width
M: number of classes
8
Example3: Construct the frequency distribution for the following data:
60 76 80 120 132 82 90 65 68 142 157 164 88
90 98 101 103 110 119 116 120 126 109 114 120 122
111 116 90 78 93 95 98 104 120 113 121 119 125
126 130 131 136 118 120 142 150 154 122 123 139 125
65 154 136 137 110 137 72 150
Total Range (T.R) = 164-60=104
Number of Classes (M) = 2.5(2.783) = 6.958 = 7
Length of Classes (L) = 104/7=14.86 = 15
Class Frequency Midpoint Relative Frequency
60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067
75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083
90 – 104 10 97 =10/60 = 0.167
105 – 119 12 112 =12/60 = 0.200
120 – 134 16 127 =16/60 = 0.267
135 – 149 7 142 =7/60 = 0.117
150 – 164 6 157 =6/60 = 0.100
Total 60 1
9
B- Frequency Distribution for continuous variables:
The lower and upper limits of the frequency distribution of continuous variables
are as below:
frequency
Class
Upper limitLower limit
f1Xs+WXs
f2Xs+2WXs+W
f3Xs+3WXs+2W
.
.
.
.
.
.
fmXs+M.WXs+(M-1)W
Example4: construct a frequency distribution for below data:
1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8
5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3
5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5
3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9
Solution:
11
Cumulative Frequency Distribution
A- Ascending Cumulative Frequency Distribution
Ascending Cumulative Frequency Distribution is the total frequency of all values
less than the upper class boundary of a given class interval.
Example5: Construct an Ascending Cumulative Frequency Distribution
depending on the example 3.
Classes Frequency Upper Limit of Class Ascending Cumulative Frequency
60 - 74 4 74 Less than or equal to 74= 4
75 – 89 5 89 Less than or equal to 89= 9
90 – 104 10 104 Less than or equal to 104= 19
105 – 119 12 119 Less than or equal to 119= 31
120 – 134 16 134 Less than or equal to 134= 47
135 – 149 7 149 Less than or equal to 149= 54
150 – 164 6 164 Less than or equal to 164= 60
Total 60
B- Descending Cumulative Frequency Distribution
Descending Cumulative Frequency Distribution is the total frequency of all values
Greater than the lower class boundary of a given class interval.
Example6: Construct a descending cumulative frequency distribution depending
on the example 4.
Classes Frequency Lower Limit of Class Descending Cumulative Freq.
0 – 2 1 0 Greater than or equal to 0= 50
2 – 4 7 2 Greater than or equal to 2= 49
4 – 6 15 4 Greater than or equal to 4= 42
6 – 8 15 6 Greater than or equal to 6= 27
8 – 10 8 8 Greater than or equal to 8= 12
10 – 12 4 10 Greater than or equal to 10= 4
Total 50
11
Charts
The graphical presentation of statistical data is using statistical charts. There are
several kinds of charts for representing set of data, such as:
Bar- Charts
A bar chart is a chart composed of bars whose heights are the frequencies of the
different classes. (Qualitative Variables)
Example7: Display the below data as a bar chart.
Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green,
Red, Red
Solution:
In the first step we will create a frequency table for this data:
Color Frequency
red 6
green 6
blue 4
Then we use this table for creating a bar chart
0
1
2
3
4
5
6
7
red green blue
Frequency
Color
12
Histogram
A histogram is similar to bar charts, but it is used for representing the quantitative
variable rather than qualitative variables.
Example8: Draw a histogram for the following frequency distribution.
Classes Frequency
60 - 74 4
75 – 89 5
90 – 104 10
105 – 119 12
120 – 134 16
135 – 149 7
150 – 164 6
Total 60
Solution:
0
2
4
6
8
10
12
14
16
18
60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164
Frequency
Classes
13
Pie Chart
A pie chart is a circle divided into sectors, where each sector represents a category
(relative frequency of each class) of data that is proportional to the total amount of
data collected.
We can calculate the angle size of each class by the following rule:
Angle size of class= relative of the class X 360o
Example9: Draw a pie chart for the data in example 1.
Angle SizeRelative FrequencyFrequencyClass
0.16*360=57.64/25=0.164Lecturer
0.2*360=725/25=0.25Assistant lecturer
0.24*360=86.46/25=0.246Researcher
0.4*360=14410/25=0.411Assistant Researcher
360125Total
Lecturer
57.6o
Assistant
lecturer
72o
Researcher
86.4
Assistant
Researcher
1440
14
Frequency Polygon
It is a chart that displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes.
Example10: draw a frequency distribution for the frequency distribution in
example3.
Frequency Curve
Frequency curve is like a frequency polygon, but there is one difference between
them, instead of using lines to connect midpoints a smooth curve will be used.
Example 11: draw a frequency curve for the data in example 4.
0
2
4
6
8
10
12
14
16
18
67 82 97 112 127 142 157
Frequency
Midpoints
1
7
15 15
8
4
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12
Frequency
Midpoints
1 3 5 7 9 11
15
Cumulative Frequency Chart
It is a chart that represents the cumulative frequencies of classes in frequency
distribution.
Example 12: Construct an ascending cumulative frequency chart for the data in
example 4.
Example13: Construct a descending cumulative frequency chart for the data in
example 4.
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Upper Limit of classes
2 4 6 8 10 12
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Lower Limit of Classes
2 4 6 8 10 12
16
Exercise 1: complete the following frequency distributions if the widths of
classes are equal.
Class Midpoint Class Midpoint
3 8 6
18
Class Midpoint frequency
8 2
14 4
20 6
26 5
32 3
Exercise2: the height of 35 students were noted and shown as follows:
170 180 175 165 160 155 180 190 185 170 174 178 165
169 186 179 161 171 159 168 177 164 191 140 173 181
177 173 166 162 168 184 168 158 155
Find the following:
1- Frequency distribution
2- Midpoints
3- Descending cumulative frequency
4- Relative frequency
And draw:
a) Histogram b) frequency polygon
17
SECTION 3
Notations
In this section we will represent some useful notations before explaining the
subjects that related to measures of central tendency and measures of dispersion
(variation).
1- Summation Notation (  )
The symbol
n
i
iX
1
, read as (the summation of X), where n is the number of
observations and (i) is the subscript for the order of values.
Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X
is represent as follow:
2010532
432
1
4
1
1

  
XXXXXX
n
i i
ii
Symbol Operation
n
n
i
i XXXX 
21
1
Sum of observations
22
2
2
1
1
2
n
n
i
i XXXX 
 Sum of Square of observations
 2
21
2
1
n
n
i
i XXXX 






 Square of Sum of observations
Let X and Y are random variables and a is a constant then
ana
XaaX
n
i
n
i
i
n
i
i
.
1
11






 
  





n
i
i
n
i
i
n
i
ii
n
i
i
n
i
i
YXYX
anXaX
111
11
.


18
 
nn
n
i
ii
n
i
i
n
i
i
n
i
ii
YXYXYXYX
YbXabYaX
.... 2211
1
111








 





 n
i
i
n
i
i XX
1
2
2
1
Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
2
c- 
n
i
iX
1
2 d-  

n
i
iX
1
3
Solution:
    11213)4(333)
26)13.(222)
511534
)
131534
)
4
1
4
11
4
11
2222
2
4
2
3
2
2
2
1
4
1
2
1
2
4321
4
11














i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
XXXd
XXc
XXXXXXb
XXXXXXa
19
2- Pie Notation  )(
The symbol 
n
i
iX
1
is used to multiplication of all values of Xi’s, or:
n
n
i
i XXXX .. 21
1








n
i
i
n
n
i
i
n
i
n
XaaX
aa
11
1
.
Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
5
Solution:
1203*5*2*4
...) 4321
4
11

  
XXXXXXa
i
i
n
i
i
b)
75000)120.(5
.55
4
11

  
n
i
i
n
n
i
i XX
Exercise: If
Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following:
a- 
4
1
2
i
iX b- 
4
1
3
i
iY c- 2
4
1
. i
i
i YX
d-  

n
i
ii YX
1
e- 4
4
1
66 i
f- 
4
1i
iX g- 
n
i
iY
1
4 h- i
n
i
i YX .2
1

j- 
4
1i i
i
Y
X
k-   2.3
4
1

i
i
i YX
21
SECTION 4: MEASURES OF CENTRAL TENDENCY
In the previous sections, we have studied how to collect raw data, its classification
and tabulation in a useful form, which contributes in solving many problems of
statistical concern. Yet, this is not sufficient, for in practical purposes, there is
need for further condensation, particularly when we want to compare two or more
different distributions. We may reduce the entire distribution to one number
which represents the distribution.
A single value which can be considered as typical or representative of a set of
observations and around which the observations can be considered as Centered is
called an ’Average’ (or average value) or a Center of location. Since such typical
values tend to lie centrally within a set of observations when arranged according
to magnitudes, averages are called measures of central tendency.
So the measure of central tendency is a value at the center or middle of a data set.
This value represents all data of the group.
The fundamental measures of tendencies are:
(1) Arithmetic Mean
(2) Weighted Mean
(3) Harmonic Mean
(4) Quadratic Mean
(5) Mode
(6) Median
However the most common measures of central tendencies or locations are:
Arithmetic mean, median and mode.
21
1)Arithmetic Mean
The arithmetic mean (generally called mean) is the sum of all observations
(values of all items) together and divides this sum by the number of observations
(or items). The symbol X (pronounced as X bar) represents the sample mean and
 represents the population mean.
Arithmetic mean for ungrouped data
Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the
Arithmetic mean is obviously:
n
XXXX
n
X
X n
n
i
i


 3211
Where: Xi = the ith
observation.
n = the size of the data.
The mean for a population consisting N observations is:
N
XXXX
N
X
N
N
i
i


 3211

Example: Calculate the arithmetic mean of the given values:
98 96 95 98 100 92 96 69
Solution:
93
8
699692100989596981




n
X
X
n
i
i
22
Arithmetic mean for grouped data:
The arithmetic mean of grouped data is found by multiplying every midpoints (i.e.
value of x) by its corresponding frequency (fi) then their total (sum) is found
 ii xf . , and then dividing this sum by the  if .


i
ii
f
xf
X
.
The above formula will be sample data. Similar formulas are used for population data.
Example: Determine the mean for the following set of data.
Classes Frequency
8 - 2
10 3
12 5
14 4
16 1
Solution:


i
ii
f
xf
X
.
Classes )ifFrequency ( )ixMidpoint ( ix.if
8- 2 9 18
10- 3 11 33
12- 5 13 65
14- 4 15 60
16- 1 17 17
Total 15 193
87.12
15
193
X
23
The Properties of the Arithmetic Mean:
1- The sum of the deviations, of all the values of x, from their mean, is zero.
0)(
:
)(
1
1
1
1 1






 



 
XnXnXX
then
XXn
n
X
Xhavewe
XnXXX
n
i
i
n
i
i
n
i
i
n
i
n
i
ii
2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn
observations respectively, the mean of the groups combined is:




 k
i
i
k
i
ii
n
Xn
X
1
1
.
3- The sum of squares of the deviations from the mean is smaller than from
any other value. (prove this property)
Advantage (merits) of Arithmetic mean
1- It is easy to calculate and simple to understand.
2- It is very popular (most widely used).
3- It is based on all the observations; so that it becomes a good representative.
Disadvantage (demerits) of Arithmetic mean
1- It is affected by outliers or extreme values.
2- It cannot be obtained if a single observation is missing or lost;
3- It cannot be calculated in case open-frequency distributions.
4- It cannot be computed for qualitative data.
24
2) Weighted Arithmetic Mean:
One of the limitations of the arithmetic mean is that it gives equal importance
to all the items. But there are cases where the relative importance of the different
items is not the same. When this is so, we compute weighted arithmetic mean.
The formula for computing weighted arithmetic mean in case of ungrouped data
is:
WWW
XWXWXW
W
XW
n
nn
n
i
i
i
n
i
i
WX









21
2211
1
1
Where, Wi is the weight of ith
observation.
The formula for computing weighted arithmetic mean in case of grouped data is:
nn
nnn
n
i
ii
i
n
i
i
W
fff
ff
f
i
X
WWW
xWXWXfW
W
XfW









2211
22211
1
1 1
Example: The marks of a student in the final examination of Statistics department
are as follows:
Subjects (Xi): 98 96 95 98 100 92 96 69
Units (Wi): 2 3 3 1 3 3 2 2
Calculate the weighted mean.
Solution:
3158.93
19
1773
22331332
)2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98(
1
1









W
n
i
i
i
n
i
i
W
X
X
W
XW
25
Remark: If all the weights are equal, then the weighted mean is the same as the
arithmetic mean.
Exercise1: The average marks of three groups of students having 70, 50 and 30
students respectively are 50, 55 and 45. Find the average marks of all the 150
students, taken together.
Exercise2: following frequency distribution showing the marks obtained by 50
students in statistics at Soran institute. Find the arithmetic mean.
Classes )ifFrequency (
20 - 29 1
30 - 39 5
40 - 49 12
50 - 59 15
60 - 69 9
70 - 79 6
80 - 89 2
Exercise3: The mean of a certain number of observations is 40. If two items with
values 50 and 64 are added to this data, the mean rises to 42. Find the number of
items in the original data.
Exercise4: If 

n
i
iX
1
72)4( and 

n
i
iX
1
3)7( , then find the number of
observation (n).
26
3) Harmonic Mean
Harmonic mean is one of the measures of central tendency, which are used less
than other measures (mean, median and mode).
The formula for computing weighted arithmetic mean in case of ungrouped data
is:

 n
i i
h
X
n
X
1
1
And for grouped data is:



 n
i i
i
i
h
X
f
f
X
1
Example: calculate the harmonic mean for the following data:
Xi: 8 2 5 3 4 7 8
Solution:

 n
i i
h
X
n
X
1
1
:
1
iX
0.13 0.5 0.2 0.33 0.25 0.14 0.13
167.4
68.1
7
68.1
1
 h
i
X
X
4) Quadratic mean
n
X
X
n
i
i
q

 1
2
for Ungrouped data




 n
i
i
n
i
ii
q
f
Xf
X
1
1
2
for grouped data
27
5) MODE
The mode (Mo) is the value that occurs most often in a data set.
Mode for ungrouped data:
The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5
because it is repeated more than other numbers (6 times).
Remark: When 2 numbers occur with the same greatest frequency, each one is
mode and the data set is bimodal. When more than 2 numbers occur with the same
greatest frequency, each is a mode and the data set is said to be multimodal. When
no number is repeated, we say that there is no mode.
Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5.
Solution: Number 5 and 7 are both modes. The data set is bimodal.
Mode for grouped data:
Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …,
fn) represent the frequencies. The modal class is that class which has the highest
frequency. The formula of obtaining the mode is as follows:
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
Where:
Lk: lower limit of modal class.
fk: modal class frequency
fk-1: frequency of previous class
fk+1: frequency of next class
Wk: Size of modal class interval (class width).
28
Example: Find the mode for the following frequency distribution:
Solution:
Modal class is 30 – 39 because it has a highest frequency (10).
Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
3610
5
3
30
10
)810()710(
)710(
30




Mo
Remark1: If there are 2 or more modal classes; therefore, to find the model class
we must use assembly method.
Remark2: When we use assembly method, the formula of mode will be:
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
Remark3: If the widths of the classes are not equal, in this case adjusted
frequency must be used instead of real frequency. Where adjusted frequency for
each class is equal to
i
i
W
f
.
Class frequency
10 – 19 5
20 – 29 7
30 – 39 10
40 – 49 8
50 – 59 4
60 – 69 3
70 – 79 1
29
Example: Find the mode for the following frequency distribution:
Solution:
There are 2 modal classes, therefore, to find the model class we must use
assembly method and it is as follows:
From the previous table we can abstract the following table:
Serial No.
Of column
Greatest frequency
appears in the column
Contributor
Class
1 4 1, 2
2 8 1, 2
3 7 2, 3
4 11 1, 2, 3
5 9 2, 3, 4
Then the 2nd
class is the modal class
Class frequency
10 – 19 4
20 – 29 4
30 – 39 3
40 – 49 2
50 – 59 3
60 – 69 3
70 – 79 1
Class frequency
1st
assembly 2nd assembly 3rd
assembly
4th
assembly
10 – 19 4
8
1120 – 29 4
7
930 – 39 3
5
40 – 49 2
5
850 – 59 3
6
760 – 69 3
4
70 – 79 1
31
Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
2010
3444
)44(
20 


Mo
Advantage of Mode
1- It is easy to calculate.
2- It is not affected by extreme values.
3- It can be used for qualitative data.
4- It can be located graphically (Histogram).
5- It can be calculated for distributions with open end classes.
Disadvantage of Mode
1- It is not based upon all the observations.
2- It is not always possible to find a clearly defined mode (2 modes or 3
modes).
3- It is not capable of further mathematical treatment.
Exercise: Find the mode for the following frequency distributions:
Class frequency Class frequency
5 – 2 10 – 30
10 – 6 20 – 12
15 – 10 30 – 16
25 – 22 40 – 28
35 – 27 50 – 26
50 – 60 11 60 – 14
31
6) MEDIAN
The Median (Me) is the value of the middle item in a data set and divides the
dataset in to two equal parts, one part comprising all values greater and the other
all values smaller than the median
Median for ungrouped data:
In the first step we will arrange the data in ascending (increasing) order.
If number of observations (n) is odd, the median is the observation that has





 
2
1n
order.
If number of observations (n) is even, then the median is the average of
observations that have order 





2
n
and 





1
2
n
.
Example: Find the median of the following data set:
55, 62, 53, 70, 68, 65, 63, 79, and 80.
Solution:
Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80.
Since n=9 is odd, then the order of median is 




 
2
1n
5
2
19
2
1





 





 n
Then the 5th
observation is the value of median or Me=65.
Example: Find the median of the following data set:
20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25.
Solution:
Arranging the data in increasing order
18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30
2366
2
12
2
isvalueththe
n












32
25771
2
12
1
2
isvalueththe
n













Then:
24
2
2523


Me
Median for grouped data:
To find the median of a frequency distribution, follow these steps:
Step1: Find cumulative frequency (Ascending or descending).
Step2: Compute the median order that equal to
2
 if
.
Step3: If k
i
k F
f
F 


2
1 , then the median class is the class which its order is K .
Step4: Compute the value of median:
k
k
k
i
k
f
W
F
f
LMe .
2
1 







 
 for ascending cumulative frequency.
k
ki
kk
f
Wf
FLMe .
2
*









 for descending cumulative frequency.
Where:
Lk : Lower Limit of median class.
fk : Frequency of the median class.
W: Median class’s width.
 if : Sum of the frequencies.
Fk–1: Ascending cumulative frequency precede the median class.
*
kF : Descending cumulative frequency of the median class.
33
Example: Find the mode for the following frequency distribution:
Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 -
no. of families 3 7 14 20 18 12 6
Solution:
In the first step we find ascending cumulative frequency
Then we find the median order that equal to:
Compare the median order with ascending cumulative frequency then:
444024
2
1 

 k
i
k F
f
F Then the median class is 4th
class.
Then:
Lk=160, Wk=20, fk=20
4
4
34 .
2 f
W
F
f
LMe
i










176
20
20
.24
2
80
160 





Me
Class frequency
Ascending Cumulative
frequency
100 - 3 3
120 - 7 10
140 - 14 24
160 - 20 44
180 - 18 62
200 - 12 74
220 - 6 80
Total 80
40
2
80
2

 if
34
Merits of Median
1. It is easy to calculate and understand.
2. It is not affected by extreme values like the arithmetic mean
3. It can be found by mere inspection.
4. It can be used for qualitative studies.
5. It can be calculated for distributions with open-end classes.
6. It can be obtained graphically.
Demerits of Median
1. It is not capable of further algebraic treatment.
2. It is not based on all observations.
Exercise: find the median for the following frequency distribution by using
ascending and descending cumulative frequency.
The relationship between Arithmetic Mean, Median and Mode
If the frequency distribution is symmetric then the following relationship between
these measures is true:
Class frequency
18 - 10
28 - 15
36 - 18
50 - 22
70 - 20
100 - 18
130 - 150 13
Total
35
3
o
e
MX
MX


SECTION 5) Measures of Dispersion (Variation)
Measures that describe the spread of a data set are called measures of dispersion.
The main objective is to know the homogeneity of the values for a data set, or to
compare between the values for two or more than two data set.
1-Range
The simplest measure of absolute variation is the range which calculated by
subtracting the smallest value from the largest value of a data set.
R=Largest value – Smallest value
Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15.
Solution:
R= Largest value – Smallest value=15-2=13
Remark: in case of grouped data we calculate the value of Range by subtracting
the lower limit of first class from the upper limit of last class.
2- Mean Deviation
It is the sum of the absolute deviation of observations from a point (A) divided by
the number of observations.
n
AX
DM
n
i
i

 1
. for ungrouped data
n
AXf
DM
n
i
ii

 1
. for grouped data
Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ).
36
Example: find the value of mean deviation for the following data by using mean,
median and mode.
Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19
Solution:
First we find the value of ( X ) and ( eM ) and ( oM ).
X =8, eM =5, oM =6
Xi XX i  ei MX  oi MX 
2 6 3 4
3 5 2 3
4 4 1 2
5 3 0 1
5 3 0 1
6 2 1 0
7 1 2 1
10 2 5 4
13 5 8 7
14 6 9 8
19 11 14 13
Total 48 44 45
367.4
11
48
)(. 1




n
XX
XDM
n
i
i
0909.4
11
45
)(. 1




n
MX
MDM
n
i
ei
o
4
11
44
)(. 1




n
MX
MDM
n
i
oi
e
37
3- Variance
It is one of the most important measures of absolute variation. The variance can
be calculated by taking the average of the square of the distance (deviation) of
each observation from the mean of data set.
The formula for the population variance (𝝈 𝟐
) for raw data is:
N
X
n
i
i

 1
2
2
)( 

Where:
X: individual value
µ: population mean
N: population size (number of observations).
Also the formula for the sample variance (S2
) for raw data is as follows:
1
)(
1
2
2




n
XX
S
n
i
i
On the other hand, the formula for the sample variance for grouped data is:
1
)(
1
2
2




n
XXf
S
n
i
ii
Where  ifn
Example: find the variance for the following dataset:
56, 68, 72, 63, 65, 68, 71, 69, 62, 56.
Solution:
1
)(
1
2
2




n
XX
S
n
i
i
38
65
10
650
10
10
1

i
iX
X
Xi )( XXi  2
)( XXi 
56 -9 81
68 3 9
72 7 49
63 -2 4
65 0 0
68 3 9
71 6 36
69 4 16
62 -3 9
56 -9 81
Total 294
then
667.32
110
2942


S
Properties of variance:
1) 02
S
2) If 222
XYii SaSaXY  , where a is a constant. (Prove that)
3) If 22
XYii SSbXY  , where b is a constant. (Prove that)
4) If X and Y are independent variables and iii YX=Z  , then the variance of Z
is:
222
YXZ SSS 
5) If ),...,,( 22
2
2
1 nSSS represent the variance for k groups based on ),...,,( 21 knnn
observations respectively, then the pooled variance of the groups is as follows:






 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
)1(
)1(
where 30in
39




 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
.
where 30in
4-Standard deviation (S)
Standard deviation is the most important and most widely used measure of
absolute variation. Standard deviation is the square root of variance.
1
)(
1
2
2




n
XX
SS
n
i
i
Example: Find the standard deviation of the following frequency distribution.
Solution:
75.175
80
14060
.
1
1





n
i
i
n
i
ii
f
Xf
X
Class fi Xi fi.Xi )( XXi  2
)( XXi  2
).( XXf ii 
100 - 3 110 330 -65.75 4323.063 12969.19
120 - 7 130 910 -45.75 2093.063 14651.44
140 - 14 150 2100 -25.75 663.0625 9282.875
160 - 20 170 3400 -5.75 33.0625 661.25
180 - 18 190 3420 14.25 203.0625 3655.125
200 - 12 210 2520 34.25 1173.063 14076.75
220 - 6 230 1380 54.25 2943.063 17658.38
Total 80 14060 72955
41
198.30
80
72955
).(
1
2




n
XXf
S
n
i
ii
Coefficient of Variation
A disadvantage of the standard deviation as a comparative measure of variation is
that it depends on the units of measurement. This means that it is difficult to use
the standard deviation to compare measurements from different populations. For
this reason, statisticians have defined the coefficient of variation, which expresses
the standard deviation as a percentage of the sample or population mean.
If X and S represents the sample mean and the sample standard deviation, then
the coefficient of variation (C.V.) is defined to be:
100*..
X
S
VC 
If μ and σ represent the population mean and standard deviation, then the
coefficient of variation CV is defined to be:
100*..


VC
Notice that the numerator and denominator in the definition of CV have the same
units, so CV itself has no units of measurement. This gives us the advantage of
being able to directly compare the variability of two different populations using
the coefficient of variation.
Example1: A company has two sections (A and B) with 40 and 65 employees
respectively. Their average weekly wages are $450 and $350. The standard
deviations are 7 and 9. Which section has larger variability in wages?
41
Solution:
55.1100*
450
7
100*.. )( 
X
S
VC A
57.2100*
350
9
100*.. )( 
X
S
VC B
Because the C.V for section A is smaller than C.V for section B then, section B
has larger variability. So section A has more homogeneity than section B.
Example2: if we know that the mean and standard deviation of heights and
weights of 40 students are as below:
Mean Standard Deviation
Weights 68.34 3.02
Heights 172.55 26.33
Then find the coefficient of variation of height and weight and compare the
results.
Solution:
42.4100*
34.68
02.3
100*. )Weights( 
X
S
VC
26.15100*
55.172
33.26
100*. )( 
X
S
VC Height
So, the Weights (with C.V. =4.42) have less variation than Heights (with
C.V.=15.26).

Más contenido relacionado

La actualidad más candente

3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendency
renz50
 
Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1
Debra Wallace
 
2. week 2 data presentation and organization
2. week 2 data presentation and organization2. week 2 data presentation and organization
2. week 2 data presentation and organization
renz50
 
Chapter 2 250110 083240
Chapter 2 250110 083240Chapter 2 250110 083240
Chapter 2 250110 083240
guest25d353
 
The nature of probability and statistics
The nature of probability and statisticsThe nature of probability and statistics
The nature of probability and statistics
San Benito CISD
 

La actualidad más candente (20)

Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
3. measures of central tendency
3. measures of central tendency3. measures of central tendency
3. measures of central tendency
 
Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1Statistics Vocabulary Chapter 1
Statistics Vocabulary Chapter 1
 
Lesson 1 07 measures of variation
Lesson 1 07 measures of variationLesson 1 07 measures of variation
Lesson 1 07 measures of variation
 
2. week 2 data presentation and organization
2. week 2 data presentation and organization2. week 2 data presentation and organization
2. week 2 data presentation and organization
 
Measurement and descriptive statistics
Measurement and descriptive statisticsMeasurement and descriptive statistics
Measurement and descriptive statistics
 
#2 Classification and tabulation of data
#2 Classification and tabulation of data#2 Classification and tabulation of data
#2 Classification and tabulation of data
 
Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)
 
1.3 collecting sample data
1.3 collecting sample data1.3 collecting sample data
1.3 collecting sample data
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
CABT SHS Statistics & Probability - Sampling Distribution of Means
CABT SHS Statistics & Probability - Sampling Distribution of MeansCABT SHS Statistics & Probability - Sampling Distribution of Means
CABT SHS Statistics & Probability - Sampling Distribution of Means
 
Chapter 2 250110 083240
Chapter 2 250110 083240Chapter 2 250110 083240
Chapter 2 250110 083240
 
Elementary Statistics
Elementary Statistics Elementary Statistics
Elementary Statistics
 
Descriptive Statistics - Thiyagu K
Descriptive Statistics - Thiyagu KDescriptive Statistics - Thiyagu K
Descriptive Statistics - Thiyagu K
 
Two chapter 2 statistics
Two chapter 2 statistics Two chapter 2 statistics
Two chapter 2 statistics
 
Lecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysisLecture 6. univariate and bivariate analysis
Lecture 6. univariate and bivariate analysis
 
The nature of probability and statistics
The nature of probability and statisticsThe nature of probability and statistics
The nature of probability and statistics
 
Business statistics
Business statisticsBusiness statistics
Business statistics
 
Intoduction to statistics
Intoduction to statisticsIntoduction to statistics
Intoduction to statistics
 

Similar a Principlles of statistics [amar mamusta amir]

Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency table
Jagdish Powar
 

Similar a Principlles of statistics [amar mamusta amir] (20)

2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data2.1 frequency distributions for organizing and summarizing data
2.1 frequency distributions for organizing and summarizing data
 
Chapter 4 MMW.pdf
Chapter 4 MMW.pdfChapter 4 MMW.pdf
Chapter 4 MMW.pdf
 
Tabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency tableTabulation of Data, Frequency Distribution, Contingency table
Tabulation of Data, Frequency Distribution, Contingency table
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Statistics and prob.
Statistics and prob.Statistics and prob.
Statistics and prob.
 
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdfSTATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
 
Classification and tabulation of data
Classification and tabulation of dataClassification and tabulation of data
Classification and tabulation of data
 
Biostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptxBiostatistics mean median mode unit 1.pptx
Biostatistics mean median mode unit 1.pptx
 
Biostats in ortho
Biostats in orthoBiostats in ortho
Biostats in ortho
 
Frequency distribution 6
Frequency distribution 6Frequency distribution 6
Frequency distribution 6
 
data
datadata
data
 
Descriptive
DescriptiveDescriptive
Descriptive
 
Statistics
StatisticsStatistics
Statistics
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
Class1.ppt
Class1.pptClass1.ppt
Class1.ppt
 
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICSSTATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
STATISTICS BASICS INCLUDING DESCRIPTIVE STATISTICS
 
Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1Introduction to Statistics - Basics of Data - Class 1
Introduction to Statistics - Basics of Data - Class 1
 

Último

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 

Último (20)

Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 

Principlles of statistics [amar mamusta amir]

  • 1. 1 SECTION 1 Statistics: is the science of obtaining data, organizing, summarizing, and presenting, analyzing, interpreting and drawing conclusions based on the data to give the best decision. Statistics divided in to two distinct parts: 1- Descriptive Statistics: It is concerned only with the collection, organization, summarizing, analysis and presentation of an array of numerical qualitative or quantitative data. Descriptive statistics include the mean, median, mode, standard deviation, range, etc. 2- Inferential Statistics: it is consist of methods for drawing conclusion based on the data to give the best decision. Its divide in to two parts also: A- Estimation B- Testing Hypothesis Population: Is the complete collection of all elements to be studied. Finite (countable) Population: A population is called finite if it is possible to count its individuals. For example, the number of students in Shaqlawa technical institute or number of computers in a libratory. Infinite (uncountable) Population: A population is called infinite if it is impossible to count its individuals, for example the number of bacteria's in a garden, number of fishes in a sea. Census: is the collection of data from every elements of population. Sample: is a sub- collection of elements drawn from a population. Sampling: the process of selecting a subset of data from the population is called Sampling.
  • 2. 2 Sources of collecting the data: 1- Historical Sources 2- Field Sources Probability (Random) Samples are drawn from populations through several different sampling methods: 1- Simple Random Sampling Every member of the population (N) has an equal chance of being selected for your sample (n). This is arguably the best sampling method, as your samples almost guaranteed to be representative of your population. However, it is rarely ever used due to being too impractical. 2- Systematic Sampling In this method, every nth individual from the population (N) is placed in the sample (n). For example, if you add every 7th individual to walk out of a supermarket to your sample, you are performing systematic sampling. 3- Stratified Sampling A general problem with random sampling is that you could, by chance, miss out a particular group in the sample. However, if you form the population into groups, and sample from each group, you can make sure the sample is representative. In METHODS OF COLLECTING THE DATA SAMPLES PROBABILITY (RANDOM) NON PROBABLITY CENSUS
  • 3. 3 stratified sampling, the population is divided into groups called strata. A sample is then drawn from within these strata. Some examples of strata commonly used by the ABS are States, Age and Sex. Other strata may be religion, academic ability or marital status. 4- MULTI-STAGE SAMPLING Multi-stage sampling is like cluster sampling, but involves selecting a sample within each chosen cluster, rather than including all units in the cluster. Thus, multi-stage sampling involves selecting a sample in at least two stages. In the first stage, large groups or clusters are selected. These clusters are designed to contain more population units than are required for the final sample. In the second stage, population units are chosen from selected clusters to derive a final sample. If more than two stages are used, the process of choosing population units within clusters continues until the final sample is achieved. Variable: is a characteristic or property of the elements in the population. The name of variable is derived from the fact that any particular characteristic may vary among the elements in a population. Variables Quantitative variables Descrete variables (Number of students) Continuous variables (Hieght, Weight) Qualitative (descriptive) variables
  • 4. 4 Levels of Measurement Variables can also be classified by how they are categorized, counted, or measured. Four common levels of measurement are:- 1. Nominal level of measurement: It is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (Such as low to high). For example, the genders of students (male, female). 2. Ordinal level of measurement: It involves data that may be arranged in some order, but differences between data values either cannot be determined or are meaningless. For example, the letter graded of students (A, B, C, and D). 3. Interval level of measurement: It is like the ordinal level, with the additional property that we can determine meaningful amounts of differences between data. However, there is no meaningful zero (where none of the quantity is not present). For example, the temperature (zero degree Fahrenheit does not mean true zero), the years 1000, 2000, 1776 (time did not begin in the year zero). 4. Ratio level of measurement: It is the interval level modified to include true zero (where zero indicates that none of the quantity is present). For example, the weights of peoples. Some Examples of Levels of Measurements # Nominal level data Ordinal level data Interval level data Ratio level data 1 Eye color (blue, brown, green, black) Student’s Grade (A, B, C, D) Intelligence level Height 2 Political affiliation (PDK, PUK, …) Rating scale (poor, good, excellent) Temperature Weight 3 Religious affiliation (Muslim, Christian, Jewish) Ranking of tennis players. Years Time 4 Major field (Mathematics, Arts, Computers, …) Level of education Salary
  • 5. 5 Section 2 Frequency Distribution (Table): After a researcher might have gotten a raw data from any source, there is a need for the raw data (ungrouped) to be arranged and organized in a meaningful way in order to be able to describe and come up with a useful inference. The method that is being used for such organization and arrangement is called frequency distribution. Frequency means the number of times something happens. Frequency distribution simply means organizing of raw data in table from using classes and frequencies. 1- Frequency Distribution for Qualitative variables: Frequency Distribution for Qualitative variables lists all classes and the number of elements that belong to each of the classes. Example1: the following list gives the rank of a sample that consists of 25 clerks in Soran institute: Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher, Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher. Create a frequency distribution for the above data. Solution: FrequencyFrequency (tally)Classes (rank) 4Lecturer 5Assistant lecturer 6Researcher 10Assistant Researcher 25Total
  • 6. 6 Relative Frequency of a Class: The relative frequency of a class is obtained by dividing the frequency of class by the sum of the all frequencies. Example 2: depending on the previous example, calculate the relative frequency. Solution: Relative FrequencyFrequencyClasses (rank) 4/25=0.164Lecturer 5/25=0.25Assistant lecturer 6/25=0.246Researcher 10/25=0.410Assistant Researcher 125Total 2- Frequency Distribution for Quantitative variables Total Range (T.R): is equal to highest value minus lowest value in the data set. Number of classes: the appropriate number of classes may be decided by Yules formula which is as follows: Number of classes= where n is the total number of observation. Class Width= T.R/ No. of classes Class Width (Length) is the difference between two consecutive lower class limit or two consecutive lower class boundaries. The class width can be found by the following formula: Lower class limits (L.C.L) are the smallest numbers that can actually belong to the different classes. 4 n2.5
  • 7. 7 Upper class limits are the largest numbers that can actually belong to the different classes. Class marks (class midpoints): each class midpoint can be found by: Midpoint of any class (xi) = (L.C.L of this class+ U.C.L. of this class)/2 Frequency (F): is the number of values in a specific class of the distribution. A- Frequency Distribution for Discrete variables: The lower and upper limits of the frequency distribution of discrete variables are as below: Frequency Class Upper limitLower limit f1Xs+W-1Xs f2Xs+2W-1Xs+W f3Xs+3W-1Xs+2W . . . . . . FmXs+M.W-1Xs+(M-1)W Where: Xs: the lowest value W: class width M: number of classes
  • 8. 8 Example3: Construct the frequency distribution for the following data: 60 76 80 120 132 82 90 65 68 142 157 164 88 90 98 101 103 110 119 116 120 126 109 114 120 122 111 116 90 78 93 95 98 104 120 113 121 119 125 126 130 131 136 118 120 142 150 154 122 123 139 125 65 154 136 137 110 137 72 150 Total Range (T.R) = 164-60=104 Number of Classes (M) = 2.5(2.783) = 6.958 = 7 Length of Classes (L) = 104/7=14.86 = 15 Class Frequency Midpoint Relative Frequency 60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067 75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083 90 – 104 10 97 =10/60 = 0.167 105 – 119 12 112 =12/60 = 0.200 120 – 134 16 127 =16/60 = 0.267 135 – 149 7 142 =7/60 = 0.117 150 – 164 6 157 =6/60 = 0.100 Total 60 1
  • 9. 9 B- Frequency Distribution for continuous variables: The lower and upper limits of the frequency distribution of continuous variables are as below: frequency Class Upper limitLower limit f1Xs+WXs f2Xs+2WXs+W f3Xs+3WXs+2W . . . . . . fmXs+M.WXs+(M-1)W Example4: construct a frequency distribution for below data: 1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8 5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3 5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5 3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9 Solution:
  • 10. 11 Cumulative Frequency Distribution A- Ascending Cumulative Frequency Distribution Ascending Cumulative Frequency Distribution is the total frequency of all values less than the upper class boundary of a given class interval. Example5: Construct an Ascending Cumulative Frequency Distribution depending on the example 3. Classes Frequency Upper Limit of Class Ascending Cumulative Frequency 60 - 74 4 74 Less than or equal to 74= 4 75 – 89 5 89 Less than or equal to 89= 9 90 – 104 10 104 Less than or equal to 104= 19 105 – 119 12 119 Less than or equal to 119= 31 120 – 134 16 134 Less than or equal to 134= 47 135 – 149 7 149 Less than or equal to 149= 54 150 – 164 6 164 Less than or equal to 164= 60 Total 60 B- Descending Cumulative Frequency Distribution Descending Cumulative Frequency Distribution is the total frequency of all values Greater than the lower class boundary of a given class interval. Example6: Construct a descending cumulative frequency distribution depending on the example 4. Classes Frequency Lower Limit of Class Descending Cumulative Freq. 0 – 2 1 0 Greater than or equal to 0= 50 2 – 4 7 2 Greater than or equal to 2= 49 4 – 6 15 4 Greater than or equal to 4= 42 6 – 8 15 6 Greater than or equal to 6= 27 8 – 10 8 8 Greater than or equal to 8= 12 10 – 12 4 10 Greater than or equal to 10= 4 Total 50
  • 11. 11 Charts The graphical presentation of statistical data is using statistical charts. There are several kinds of charts for representing set of data, such as: Bar- Charts A bar chart is a chart composed of bars whose heights are the frequencies of the different classes. (Qualitative Variables) Example7: Display the below data as a bar chart. Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green, Red, Red Solution: In the first step we will create a frequency table for this data: Color Frequency red 6 green 6 blue 4 Then we use this table for creating a bar chart 0 1 2 3 4 5 6 7 red green blue Frequency Color
  • 12. 12 Histogram A histogram is similar to bar charts, but it is used for representing the quantitative variable rather than qualitative variables. Example8: Draw a histogram for the following frequency distribution. Classes Frequency 60 - 74 4 75 – 89 5 90 – 104 10 105 – 119 12 120 – 134 16 135 – 149 7 150 – 164 6 Total 60 Solution: 0 2 4 6 8 10 12 14 16 18 60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164 Frequency Classes
  • 13. 13 Pie Chart A pie chart is a circle divided into sectors, where each sector represents a category (relative frequency of each class) of data that is proportional to the total amount of data collected. We can calculate the angle size of each class by the following rule: Angle size of class= relative of the class X 360o Example9: Draw a pie chart for the data in example 1. Angle SizeRelative FrequencyFrequencyClass 0.16*360=57.64/25=0.164Lecturer 0.2*360=725/25=0.25Assistant lecturer 0.24*360=86.46/25=0.246Researcher 0.4*360=14410/25=0.411Assistant Researcher 360125Total Lecturer 57.6o Assistant lecturer 72o Researcher 86.4 Assistant Researcher 1440
  • 14. 14 Frequency Polygon It is a chart that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. Example10: draw a frequency distribution for the frequency distribution in example3. Frequency Curve Frequency curve is like a frequency polygon, but there is one difference between them, instead of using lines to connect midpoints a smooth curve will be used. Example 11: draw a frequency curve for the data in example 4. 0 2 4 6 8 10 12 14 16 18 67 82 97 112 127 142 157 Frequency Midpoints 1 7 15 15 8 4 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 Frequency Midpoints 1 3 5 7 9 11
  • 15. 15 Cumulative Frequency Chart It is a chart that represents the cumulative frequencies of classes in frequency distribution. Example 12: Construct an ascending cumulative frequency chart for the data in example 4. Example13: Construct a descending cumulative frequency chart for the data in example 4. 0 10 20 30 40 50 60 1 2 3 4 5 6 Cumulativefrequency Upper Limit of classes 2 4 6 8 10 12 0 10 20 30 40 50 60 1 2 3 4 5 6 Cumulativefrequency Lower Limit of Classes 2 4 6 8 10 12
  • 16. 16 Exercise 1: complete the following frequency distributions if the widths of classes are equal. Class Midpoint Class Midpoint 3 8 6 18 Class Midpoint frequency 8 2 14 4 20 6 26 5 32 3 Exercise2: the height of 35 students were noted and shown as follows: 170 180 175 165 160 155 180 190 185 170 174 178 165 169 186 179 161 171 159 168 177 164 191 140 173 181 177 173 166 162 168 184 168 158 155 Find the following: 1- Frequency distribution 2- Midpoints 3- Descending cumulative frequency 4- Relative frequency And draw: a) Histogram b) frequency polygon
  • 17. 17 SECTION 3 Notations In this section we will represent some useful notations before explaining the subjects that related to measures of central tendency and measures of dispersion (variation). 1- Summation Notation (  ) The symbol n i iX 1 , read as (the summation of X), where n is the number of observations and (i) is the subscript for the order of values. Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X is represent as follow: 2010532 432 1 4 1 1     XXXXXX n i i ii Symbol Operation n n i i XXXX  21 1 Sum of observations 22 2 2 1 1 2 n n i i XXXX   Sum of Square of observations  2 21 2 1 n n i i XXXX         Square of Sum of observations Let X and Y are random variables and a is a constant then ana XaaX n i n i i n i i . 1 11                 n i i n i i n i ii n i i n i i YXYX anXaX 111 11 .  
  • 18. 18   nn n i ii n i i n i i n i ii YXYXYXYX YbXabYaX .... 2211 1 111                 n i i n i i XX 1 2 2 1 Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following: a-  n i iX 1 b-  n i iX 1 2 c-  n i iX 1 2 d-    n i iX 1 3 Solution:     11213)4(333) 26)13.(222) 511534 ) 131534 ) 4 1 4 11 4 11 2222 2 4 2 3 2 2 2 1 4 1 2 1 2 4321 4 11               i i i i n i i i i n i i i i n i i i i n i i XXXd XXc XXXXXXb XXXXXXa
  • 19. 19 2- Pie Notation  )( The symbol  n i iX 1 is used to multiplication of all values of Xi’s, or: n n i i XXXX .. 21 1         n i i n n i i n i n XaaX aa 11 1 . Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following: a-  n i iX 1 b-  n i iX 1 5 Solution: 1203*5*2*4 ...) 4321 4 11     XXXXXXa i i n i i b) 75000)120.(5 .55 4 11     n i i n n i i XX Exercise: If Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following: a-  4 1 2 i iX b-  4 1 3 i iY c- 2 4 1 . i i i YX d-    n i ii YX 1 e- 4 4 1 66 i f-  4 1i iX g-  n i iY 1 4 h- i n i i YX .2 1  j-  4 1i i i Y X k-   2.3 4 1  i i i YX
  • 20. 21 SECTION 4: MEASURES OF CENTRAL TENDENCY In the previous sections, we have studied how to collect raw data, its classification and tabulation in a useful form, which contributes in solving many problems of statistical concern. Yet, this is not sufficient, for in practical purposes, there is need for further condensation, particularly when we want to compare two or more different distributions. We may reduce the entire distribution to one number which represents the distribution. A single value which can be considered as typical or representative of a set of observations and around which the observations can be considered as Centered is called an ’Average’ (or average value) or a Center of location. Since such typical values tend to lie centrally within a set of observations when arranged according to magnitudes, averages are called measures of central tendency. So the measure of central tendency is a value at the center or middle of a data set. This value represents all data of the group. The fundamental measures of tendencies are: (1) Arithmetic Mean (2) Weighted Mean (3) Harmonic Mean (4) Quadratic Mean (5) Mode (6) Median However the most common measures of central tendencies or locations are: Arithmetic mean, median and mode.
  • 21. 21 1)Arithmetic Mean The arithmetic mean (generally called mean) is the sum of all observations (values of all items) together and divides this sum by the number of observations (or items). The symbol X (pronounced as X bar) represents the sample mean and  represents the population mean. Arithmetic mean for ungrouped data Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the Arithmetic mean is obviously: n XXXX n X X n n i i    3211 Where: Xi = the ith observation. n = the size of the data. The mean for a population consisting N observations is: N XXXX N X N N i i    3211  Example: Calculate the arithmetic mean of the given values: 98 96 95 98 100 92 96 69 Solution: 93 8 699692100989596981     n X X n i i
  • 22. 22 Arithmetic mean for grouped data: The arithmetic mean of grouped data is found by multiplying every midpoints (i.e. value of x) by its corresponding frequency (fi) then their total (sum) is found  ii xf . , and then dividing this sum by the  if .   i ii f xf X . The above formula will be sample data. Similar formulas are used for population data. Example: Determine the mean for the following set of data. Classes Frequency 8 - 2 10 3 12 5 14 4 16 1 Solution:   i ii f xf X . Classes )ifFrequency ( )ixMidpoint ( ix.if 8- 2 9 18 10- 3 11 33 12- 5 13 65 14- 4 15 60 16- 1 17 17 Total 15 193 87.12 15 193 X
  • 23. 23 The Properties of the Arithmetic Mean: 1- The sum of the deviations, of all the values of x, from their mean, is zero. 0)( : )( 1 1 1 1 1              XnXnXX then XXn n X Xhavewe XnXXX n i i n i i n i i n i n i ii 2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn observations respectively, the mean of the groups combined is:      k i i k i ii n Xn X 1 1 . 3- The sum of squares of the deviations from the mean is smaller than from any other value. (prove this property) Advantage (merits) of Arithmetic mean 1- It is easy to calculate and simple to understand. 2- It is very popular (most widely used). 3- It is based on all the observations; so that it becomes a good representative. Disadvantage (demerits) of Arithmetic mean 1- It is affected by outliers or extreme values. 2- It cannot be obtained if a single observation is missing or lost; 3- It cannot be calculated in case open-frequency distributions. 4- It cannot be computed for qualitative data.
  • 24. 24 2) Weighted Arithmetic Mean: One of the limitations of the arithmetic mean is that it gives equal importance to all the items. But there are cases where the relative importance of the different items is not the same. When this is so, we compute weighted arithmetic mean. The formula for computing weighted arithmetic mean in case of ungrouped data is: WWW XWXWXW W XW n nn n i i i n i i WX          21 2211 1 1 Where, Wi is the weight of ith observation. The formula for computing weighted arithmetic mean in case of grouped data is: nn nnn n i ii i n i i W fff ff f i X WWW xWXWXfW W XfW          2211 22211 1 1 1 Example: The marks of a student in the final examination of Statistics department are as follows: Subjects (Xi): 98 96 95 98 100 92 96 69 Units (Wi): 2 3 3 1 3 3 2 2 Calculate the weighted mean. Solution: 3158.93 19 1773 22331332 )2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98( 1 1          W n i i i n i i W X X W XW
  • 25. 25 Remark: If all the weights are equal, then the weighted mean is the same as the arithmetic mean. Exercise1: The average marks of three groups of students having 70, 50 and 30 students respectively are 50, 55 and 45. Find the average marks of all the 150 students, taken together. Exercise2: following frequency distribution showing the marks obtained by 50 students in statistics at Soran institute. Find the arithmetic mean. Classes )ifFrequency ( 20 - 29 1 30 - 39 5 40 - 49 12 50 - 59 15 60 - 69 9 70 - 79 6 80 - 89 2 Exercise3: The mean of a certain number of observations is 40. If two items with values 50 and 64 are added to this data, the mean rises to 42. Find the number of items in the original data. Exercise4: If   n i iX 1 72)4( and   n i iX 1 3)7( , then find the number of observation (n).
  • 26. 26 3) Harmonic Mean Harmonic mean is one of the measures of central tendency, which are used less than other measures (mean, median and mode). The formula for computing weighted arithmetic mean in case of ungrouped data is:   n i i h X n X 1 1 And for grouped data is:     n i i i i h X f f X 1 Example: calculate the harmonic mean for the following data: Xi: 8 2 5 3 4 7 8 Solution:   n i i h X n X 1 1 : 1 iX 0.13 0.5 0.2 0.33 0.25 0.14 0.13 167.4 68.1 7 68.1 1  h i X X 4) Quadratic mean n X X n i i q   1 2 for Ungrouped data      n i i n i ii q f Xf X 1 1 2 for grouped data
  • 27. 27 5) MODE The mode (Mo) is the value that occurs most often in a data set. Mode for ungrouped data: The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5 because it is repeated more than other numbers (6 times). Remark: When 2 numbers occur with the same greatest frequency, each one is mode and the data set is bimodal. When more than 2 numbers occur with the same greatest frequency, each is a mode and the data set is said to be multimodal. When no number is repeated, we say that there is no mode. Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5. Solution: Number 5 and 7 are both modes. The data set is bimodal. Mode for grouped data: Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …, fn) represent the frequencies. The modal class is that class which has the highest frequency. The formula of obtaining the mode is as follows: k kkkk kk k W ffff ff LMo       )()( )( 11 1 Where: Lk: lower limit of modal class. fk: modal class frequency fk-1: frequency of previous class fk+1: frequency of next class Wk: Size of modal class interval (class width).
  • 28. 28 Example: Find the mode for the following frequency distribution: Solution: Modal class is 30 – 39 because it has a highest frequency (10). Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10 k kkkk kk k W ffff ff LMo       )()( )( 11 1 3610 5 3 30 10 )810()710( )710( 30     Mo Remark1: If there are 2 or more modal classes; therefore, to find the model class we must use assembly method. Remark2: When we use assembly method, the formula of mode will be: k kkkk kk k W ffff ff LMo       11 1)( Remark3: If the widths of the classes are not equal, in this case adjusted frequency must be used instead of real frequency. Where adjusted frequency for each class is equal to i i W f . Class frequency 10 – 19 5 20 – 29 7 30 – 39 10 40 – 49 8 50 – 59 4 60 – 69 3 70 – 79 1
  • 29. 29 Example: Find the mode for the following frequency distribution: Solution: There are 2 modal classes, therefore, to find the model class we must use assembly method and it is as follows: From the previous table we can abstract the following table: Serial No. Of column Greatest frequency appears in the column Contributor Class 1 4 1, 2 2 8 1, 2 3 7 2, 3 4 11 1, 2, 3 5 9 2, 3, 4 Then the 2nd class is the modal class Class frequency 10 – 19 4 20 – 29 4 30 – 39 3 40 – 49 2 50 – 59 3 60 – 69 3 70 – 79 1 Class frequency 1st assembly 2nd assembly 3rd assembly 4th assembly 10 – 19 4 8 1120 – 29 4 7 930 – 39 3 5 40 – 49 2 5 850 – 59 3 6 760 – 69 3 4 70 – 79 1
  • 30. 31 Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10 k kkkk kk k W ffff ff LMo       11 1)( 2010 3444 )44( 20    Mo Advantage of Mode 1- It is easy to calculate. 2- It is not affected by extreme values. 3- It can be used for qualitative data. 4- It can be located graphically (Histogram). 5- It can be calculated for distributions with open end classes. Disadvantage of Mode 1- It is not based upon all the observations. 2- It is not always possible to find a clearly defined mode (2 modes or 3 modes). 3- It is not capable of further mathematical treatment. Exercise: Find the mode for the following frequency distributions: Class frequency Class frequency 5 – 2 10 – 30 10 – 6 20 – 12 15 – 10 30 – 16 25 – 22 40 – 28 35 – 27 50 – 26 50 – 60 11 60 – 14
  • 31. 31 6) MEDIAN The Median (Me) is the value of the middle item in a data set and divides the dataset in to two equal parts, one part comprising all values greater and the other all values smaller than the median Median for ungrouped data: In the first step we will arrange the data in ascending (increasing) order. If number of observations (n) is odd, the median is the observation that has        2 1n order. If number of observations (n) is even, then the median is the average of observations that have order       2 n and       1 2 n . Example: Find the median of the following data set: 55, 62, 53, 70, 68, 65, 63, 79, and 80. Solution: Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80. Since n=9 is odd, then the order of median is        2 1n 5 2 19 2 1              n Then the 5th observation is the value of median or Me=65. Example: Find the median of the following data set: 20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25. Solution: Arranging the data in increasing order 18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30 2366 2 12 2 isvalueththe n            
  • 32. 32 25771 2 12 1 2 isvalueththe n              Then: 24 2 2523   Me Median for grouped data: To find the median of a frequency distribution, follow these steps: Step1: Find cumulative frequency (Ascending or descending). Step2: Compute the median order that equal to 2  if . Step3: If k i k F f F    2 1 , then the median class is the class which its order is K . Step4: Compute the value of median: k k k i k f W F f LMe . 2 1            for ascending cumulative frequency. k ki kk f Wf FLMe . 2 *           for descending cumulative frequency. Where: Lk : Lower Limit of median class. fk : Frequency of the median class. W: Median class’s width.  if : Sum of the frequencies. Fk–1: Ascending cumulative frequency precede the median class. * kF : Descending cumulative frequency of the median class.
  • 33. 33 Example: Find the mode for the following frequency distribution: Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 - no. of families 3 7 14 20 18 12 6 Solution: In the first step we find ascending cumulative frequency Then we find the median order that equal to: Compare the median order with ascending cumulative frequency then: 444024 2 1    k i k F f F Then the median class is 4th class. Then: Lk=160, Wk=20, fk=20 4 4 34 . 2 f W F f LMe i           176 20 20 .24 2 80 160       Me Class frequency Ascending Cumulative frequency 100 - 3 3 120 - 7 10 140 - 14 24 160 - 20 44 180 - 18 62 200 - 12 74 220 - 6 80 Total 80 40 2 80 2   if
  • 34. 34 Merits of Median 1. It is easy to calculate and understand. 2. It is not affected by extreme values like the arithmetic mean 3. It can be found by mere inspection. 4. It can be used for qualitative studies. 5. It can be calculated for distributions with open-end classes. 6. It can be obtained graphically. Demerits of Median 1. It is not capable of further algebraic treatment. 2. It is not based on all observations. Exercise: find the median for the following frequency distribution by using ascending and descending cumulative frequency. The relationship between Arithmetic Mean, Median and Mode If the frequency distribution is symmetric then the following relationship between these measures is true: Class frequency 18 - 10 28 - 15 36 - 18 50 - 22 70 - 20 100 - 18 130 - 150 13 Total
  • 35. 35 3 o e MX MX   SECTION 5) Measures of Dispersion (Variation) Measures that describe the spread of a data set are called measures of dispersion. The main objective is to know the homogeneity of the values for a data set, or to compare between the values for two or more than two data set. 1-Range The simplest measure of absolute variation is the range which calculated by subtracting the smallest value from the largest value of a data set. R=Largest value – Smallest value Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15. Solution: R= Largest value – Smallest value=15-2=13 Remark: in case of grouped data we calculate the value of Range by subtracting the lower limit of first class from the upper limit of last class. 2- Mean Deviation It is the sum of the absolute deviation of observations from a point (A) divided by the number of observations. n AX DM n i i   1 . for ungrouped data n AXf DM n i ii   1 . for grouped data Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ).
  • 36. 36 Example: find the value of mean deviation for the following data by using mean, median and mode. Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19 Solution: First we find the value of ( X ) and ( eM ) and ( oM ). X =8, eM =5, oM =6 Xi XX i  ei MX  oi MX  2 6 3 4 3 5 2 3 4 4 1 2 5 3 0 1 5 3 0 1 6 2 1 0 7 1 2 1 10 2 5 4 13 5 8 7 14 6 9 8 19 11 14 13 Total 48 44 45 367.4 11 48 )(. 1     n XX XDM n i i 0909.4 11 45 )(. 1     n MX MDM n i ei o 4 11 44 )(. 1     n MX MDM n i oi e
  • 37. 37 3- Variance It is one of the most important measures of absolute variation. The variance can be calculated by taking the average of the square of the distance (deviation) of each observation from the mean of data set. The formula for the population variance (𝝈 𝟐 ) for raw data is: N X n i i   1 2 2 )(   Where: X: individual value µ: population mean N: population size (number of observations). Also the formula for the sample variance (S2 ) for raw data is as follows: 1 )( 1 2 2     n XX S n i i On the other hand, the formula for the sample variance for grouped data is: 1 )( 1 2 2     n XXf S n i ii Where  ifn Example: find the variance for the following dataset: 56, 68, 72, 63, 65, 68, 71, 69, 62, 56. Solution: 1 )( 1 2 2     n XX S n i i
  • 38. 38 65 10 650 10 10 1  i iX X Xi )( XXi  2 )( XXi  56 -9 81 68 3 9 72 7 49 63 -2 4 65 0 0 68 3 9 71 6 36 69 4 16 62 -3 9 56 -9 81 Total 294 then 667.32 110 2942   S Properties of variance: 1) 02 S 2) If 222 XYii SaSaXY  , where a is a constant. (Prove that) 3) If 22 XYii SSbXY  , where b is a constant. (Prove that) 4) If X and Y are independent variables and iii YX=Z  , then the variance of Z is: 222 YXZ SSS  5) If ),...,,( 22 2 2 1 nSSS represent the variance for k groups based on ),...,,( 21 knnn observations respectively, then the pooled variance of the groups is as follows:        n i i n i ii p n Sn S 1 1 2 2 )1( )1( where 30in
  • 39. 39      n i i n i ii p n Sn S 1 1 2 2 . where 30in 4-Standard deviation (S) Standard deviation is the most important and most widely used measure of absolute variation. Standard deviation is the square root of variance. 1 )( 1 2 2     n XX SS n i i Example: Find the standard deviation of the following frequency distribution. Solution: 75.175 80 14060 . 1 1      n i i n i ii f Xf X Class fi Xi fi.Xi )( XXi  2 )( XXi  2 ).( XXf ii  100 - 3 110 330 -65.75 4323.063 12969.19 120 - 7 130 910 -45.75 2093.063 14651.44 140 - 14 150 2100 -25.75 663.0625 9282.875 160 - 20 170 3400 -5.75 33.0625 661.25 180 - 18 190 3420 14.25 203.0625 3655.125 200 - 12 210 2520 34.25 1173.063 14076.75 220 - 6 230 1380 54.25 2943.063 17658.38 Total 80 14060 72955
  • 40. 41 198.30 80 72955 ).( 1 2     n XXf S n i ii Coefficient of Variation A disadvantage of the standard deviation as a comparative measure of variation is that it depends on the units of measurement. This means that it is difficult to use the standard deviation to compare measurements from different populations. For this reason, statisticians have defined the coefficient of variation, which expresses the standard deviation as a percentage of the sample or population mean. If X and S represents the sample mean and the sample standard deviation, then the coefficient of variation (C.V.) is defined to be: 100*.. X S VC  If μ and σ represent the population mean and standard deviation, then the coefficient of variation CV is defined to be: 100*..   VC Notice that the numerator and denominator in the definition of CV have the same units, so CV itself has no units of measurement. This gives us the advantage of being able to directly compare the variability of two different populations using the coefficient of variation. Example1: A company has two sections (A and B) with 40 and 65 employees respectively. Their average weekly wages are $450 and $350. The standard deviations are 7 and 9. Which section has larger variability in wages?
  • 41. 41 Solution: 55.1100* 450 7 100*.. )(  X S VC A 57.2100* 350 9 100*.. )(  X S VC B Because the C.V for section A is smaller than C.V for section B then, section B has larger variability. So section A has more homogeneity than section B. Example2: if we know that the mean and standard deviation of heights and weights of 40 students are as below: Mean Standard Deviation Weights 68.34 3.02 Heights 172.55 26.33 Then find the coefficient of variation of height and weight and compare the results. Solution: 42.4100* 34.68 02.3 100*. )Weights(  X S VC 26.15100* 55.172 33.26 100*. )(  X S VC Height So, the Weights (with C.V. =4.42) have less variation than Heights (with C.V.=15.26).