SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
D E S C R I P T I V E S T A T I S T I C S P a g e | 1
1.0 INTRODUCTION
In everyday life, whether at home or at work, we usually keep records or read reports. An item in
the record or report is a fact that is expressed in terms of a numerical value or described by its quality or
kind. That single item or fact is referred to as a datum. All these facts in a record or report are called
data.
Examples of data:
 Color of the hair
 Number of students in a class
 Height and weight
 Number of times you were absent from class
1.1Population and Sample
In data-gathering phase, the information is taken from a unit, which is a part of a collection of all
such units called a population. A population is consists of an entire set of objects, observations, or
scores that have something in common.
Some Definitions:
Population – collection of all units from which the data is to be collected.
Element – unit in a population
Sample – subset or a representative part of the population.
Frame – listing of all the elements of the population
Census – complete enumeration in which every member of the population is included
Sampling – or sample survey; only a part or a portion of the population is used to obtain data
1.2Definition of Statistics
The word “statistics” is used in several different senses. In the broadest sense, “statistics” is branch of
science that deals with the development of methods for a more effective way of collecting, organizing,
presenting, and analyzing data. Data and how to deal with it is the main concern of statistics.
In a second usage, a “statistic” is defined as numerical quantity (such as mean) calculated in a
sample. The grade point average (GPA) is an example of a statistic. It is a value computed from a set of
grades of a student in a particular semester.
Illustration:
If the data is the set of grades, then GPA is the statistic. Another numerical value that can be
computed from the set of grades is the percentage passing. The percentage passing is also a statistic.
From the same set of grades, the number of subjects that received a failing grade is another statistic.
Taken together, the GPA, the percentage passing, and the number that received a failing grade are
called statistics.
Major Areas of Statistics
1. Descriptive Statistics – deals largely with summary calculations, graphical displays, and
describing important features of a set of data. It does not attempt to draw conclusions about
anything that pertains to more than the data themselves.
2. Inferential Statistics – concerned with making generalizations from information gathered from a
small group of observations (sample) to a bigger group of observations (population).
Two Main Methods:
1. Estimation
- the sample statistic is used to estimate a population parameter
- a confidence interval about the estimate is constructed.
2. Hypothesis Testing
- a null hypothesis is put forward.
- Analysis of the data is then used to determine whether to reject it.
1.3Variables
A variable is any measured characteristic or an attribute that differs for different subjects. Those
variables having cause-and-effect relationships are called independent variables and dependent
variables.
D E S C R I P T I V E S T A T I S T I C S P a g e | 2
Types of Variables:
1. Qualitative Variables – sometimes called “categorical variables”
- facts for which no numerical measure exists
- expressed in categories or kind
Examples:
 color of the skin which can be black, brown, or white
 person’s sex which can be male or female
2. Quantitative Variables – variables that can be expressed in numbers.
- can be measured and counted.
Examples:
 person’s height and weight – can be measured
 number of students in a class – can be counted
Classification of Quantitative Variables
1. Continuous variable
A continuous variable is one for which within the limits the variable ranges, any value is
possible.
Examples:
 Time to solve a math problem is continuous since it could take 2 minutes, 2.13
minutes, etc. to finish a problem
 Height is continuous since it could take 1.55 meters, 1.65 meters, etc.
2. Discrete variable
A discrete variable is one that cannot take on all values within the limits of the variable
Examples:
 Responses to a five-point rating scale is discrete since it can only take 1, 2, 3, 4
and 5.
 Number of provinces
1.4Types of Measurements
1. Nominal measurement is consists of assigning items to groups or categories. No quantitative
information is conveyed and no ordering of the items is implied. Nominal measurements are
therefore qualitative rather than quantitative. Nominal measurement is the lowest form of
measurement.
Examples:
 Color
 Sex
 Blood type
 Religion
2. Ordinal Measurement
Measurement in ordinal scales are ordered in the sense that higher number
represent higher values. However, the intervals between the numbers are not necessarily
equal. For example, on a five-point rating scale measuring attitudes towards gun control,
the difference between a rating of 2 and a rating of 3 may not represent the same
difference as the difference between a rating of 4 and a rating of 5. There is no “true” zero
point for ordinal scales since the zero point is chosen arbitrarily. The lowest point on the
rating scale in the example was chosen to be 1. It could just as well have been 0 or 5.
Examples:
 Taste preferences
 Satisfactions
 Social classes
 Academic honors
3. Interval Measurement
On interval measurement scales, one unit on the scale represents the same
magnitude on the trait or characteristic being measured across the whole range of scale.
Interval scales do not have a “true” zero point, however, and therefore it is not
possible to make statements about how many times higher one score is than the another.
A good example of interval scale is the Fahrenheit scale for temperature.
4. Ratio Measurement
Ratio measurements are like interval measurement except they have true zero
point. It is the highest form of measurement.
D E S C R I P T I V E S T A T I S T I C S P a g e | 3
Examples:
 Length
 Weight
Note: A large number of statistical analysis tools are available for each type of measurements. It is
important that the statistical user has a good understanding of the type of data that is to be processed in
order that the statistical tool that is chosen is used properly.
1.5 Random and Non-Random Sampling
 Random sampling is the most commonly used sampling technique in which each member in
the population is given an equal chance of being selected in the sample.
 Non-random sampling is the method of collecting a small portion of the population by which
not all the members in the population are given the chance to be included in the sample.
Properties of Random Sampling
1. Equiprobability – means that each member of the population has an equal chance
of being selected and included in the sample.
2. Independence – means that the chance of one member being drawn does not
affect the chance of the other member.
1.6 Probability Sampling Techniques
1. Simple Random Sampling (SRS) – process for selecting a sample wherein every element in
the sampled population is given an equal chance of being included in the sample
2. Systematic Random Sampling – sampling wherein every kth
unit is included after a random
start is taken for the sample
3. Stratified Proportional Random Sampling – population is divided into homogeneous groups of
strata and selection is done within each stratum
4. Multi-stage Sampling – this technique uses several stages or phases in getting sample from
the population. This method is an extension or a multiple application of the stratified random
sampling technique.
1.7Non-random Sampling Techniques
1. Judgment or Purposive Sampling – this method is also referred as non-probability sampling. It
plays a major role in the selection of a particular item and in making decisions in cases of
incomplete responses or observation.
2. Quota Sampling – this is a relatively quick and inexpensive method to operate since the choice
of the number of subjects to be included in a sample is done at the researcher’s own
convenience or preference and is not predetermined by some carefully operated randomizing
plan.
3. Cluster Sampling – population is divided into a number of relatively small subdivisions, which
are themselves clusters of still smaller units, and then some of these subdivisions, or clusters,
are randomly selected for inclusion in the overall sample.
4. Incidental Sampling – this design is applied to those samples which are taken because they
are the most available.
5. Convenience Sampling – this method has been widely used in television and radio programs to
find out opinions of TV viewers and listeners regarding a controversial issues.
1.8 Methods of Collecting Data
There are many ways of collecting data, each of which has its own advantages and
disadvantages. The more general methods of collecting informations are:
1. Direct or Interview Method
A very common and effective method of obtaining informations is by conducting interviews.
People usually respond when visited in person.
Disadvantages: People may tend to lie and interviews are quite costly and needs thorough
training of the interviewers (untrained interviewers tend to influence the respondent’s
answers).
2. Indirect or Questionnaire Method
Questionnaires can either be mailed or handed personally to respondents.
Advantages: It does not require interviews and is therefore less costly. It also cover wider
area than interviews.
D E S C R I P T I V E S T A T I S T I C S P a g e | 4
Disadvantages: Response rate is usually lower than interview. Many people tend to ignore
mailed questionnaires.
To encourage participation, a questionnaire should be kept short as possible and
contain questions related to the objectives of the survey.
3. Direct Observation
In situations where less personal responses are needed, collecting data by direct
observation may be used.
Disadvantage: Assigned person to observe may commit some observational errors.
4. Experimentation – is used when the objective is to determine the cause-and-effect of a
certain phenomenon under some controlled conditions.
5. Utilizing Existing Records
A very convenient way of obtaining data is by utilizing existing records. There are
number of institutions that gather data not only for their own purposes but for purposes of
other group of people.
Advantage: It is very economical and requires less cooperation from people.
Disadvantage: Informations needed may not be found in these sources.
Data are sometimes obtained in published/unpublished document and can be
classified as follows:
 Primary sources – provide data first hand; data gathered originally have not been
subjected to some transcription or condensation. Its authenticity is guaranteed by the
group who gathered it originally.
 Secondary sources – provide data that have been transcribed or compiled from
original sources
2.0 ORGANIZATION AND PRESENTATION OF DATA
After data have been gathered and checked for possible errors, the next logical step is to present
the data in a manner that is easy to understand. It should also readily convey the relevant information
and the important results at a glance.
Ways/Methods of presenting data:
1. Textual presentation – a narrative way of describing the collected characteristics of the population
based on the data collected and organized
2. Tabular presentation – data are tallied into the appropriate row and/or column categories
3. Graphical presentation – data are presented graphically such as bar chart, histogram, pie chart
and pictograph
2.1 Textual Presentation
Example:
A total of 22.4 million children aged 5-17 years old in 9.6 million households were
estimated from the 1995 National Survey of Working Children (NSWC).
Sixteen percent (16%) or 3.6 million children were reported engaged in economic activities
at any time in 1995. Boys were more likely to work than girls with a national sex ratio of working
children of 187.
2.2 Tabular Presentation
- may be in the form of a cross tabulation table, a frequency distribution table (FDT) or a
stem-and-leaf plot.
2.2.1 Cross Tabulation Table
When a data are in categories, results are usually presented in systematic manner by using a table,
which arranges data in rows and columns.
D E S C R I P T I V E S T A T I S T I C S P a g e | 5
Example:
Table 1. Numbers of Subjects Falling Into Smoking/Lung Cancer Combination
Smoker
Lung Cancer
Present Absent Total
Yes 688 650 1338
No 21 59 80
Total 709 709 1418
A table contains:
1. Heading
Heading includes a table number and a title. A Table number is necessary to easily identify
the table. It should be followed by a title, which briefly de describes the contents of the table.
2. Body
The body is the main part of the table. It contains row categories (which are found in the left
side of the table) and the column categories (which are found at the top of the table). Row
totals may also be included and is located in the right side of the table. A column total may
also be included and is located at the top of the table.
The figures found in the cells of the main body are usually the frequencies, representing
the number of time the two categories occur together. Percentages can be used instead of
frequencies. Or use both percentages and frequencies.
3. Footnote (optional)
The data used may have been taken from some publications of provided by another group of
person. Footnotes may be added to indicate the source of information.
Contingency Table – a table listing the frequencies for the different combination of values of two
categorical variables.
2.2.2 Frequency Distribution
In many instances, information gathered is numerical in nature, such as age respondent or
exam score of a student. When faced with a large set of this kind of data, it is often
advantageous to group the data into a number of classes of intervals so as to get a better
overall picture.
Table 2.3 Scores in a Statistics Final Exam
31 28 15 10 47
18 32 29 58 48
37 49 26 54 56
21 24 28 32 28
43 12 23 29 61
16 42 40 32 26
48 36 39 22 40
20 63 54 30 17
18 30 23 26 36
47 19 25 38 35
Table 2.3 is a set of scores in the exam of Statistics. The above data will be used in illustrating the
construction of a frequency table.
Frequency distribution – is a grouping of all observations into interval or classes together with a count
of the number of observations that fall in each interval or class.
Data in Table 2.3 is called raw data and such form is difficult to read and analyze. In frequency
distributions the data is presented in a more compact and usable manner. However, this process brings
about some loss of details.
1.1 Steps in Constructing a Frequency Distribution
1. From the data set, identify the highest value and lowest value. Compute the range R as
R = highest value – lowest value
2. Estimate the number of classes, k as
D E S C R I P T I V E S T A T I S T I C S P a g e | 6
nk 
Note: The results are “rounded off” to the next higher integer, NOT the usual nearest integer.
Rounding off to the nearest integer will often yield a number of intervals that cannot
accommodate all the observations.
3. Estimate the width c of the interval by dividing the range R by the number of classes k. That is,
k
R
c 
Round off this estimate to the same number of significant places as the original data set.
No. of decimal places
of the raw data Precision
0 1
1 0.1
2 0.01
3 0.001
4. List the lower and upper class limits of the first interval. This interval should contain the smallest
observation in the data set. The starting lower limit could be the lowest or any number closest to
it.
5. List all the class limits by adding the class width to the limits of the previous interval. The highest
class should contain the largest observation in the data set.
6. Tally the frequencies for each class.
7. Compute the class marks and the class boundaries.
Class midpoint, or class mark is the midpoint of an interval. That is,
2
ULLL
CM


where, CM – class mark
LL – lower limit
UL – upper limit
To find class boundaries, it is important to know the unit of accuracy of the raw data. The final
exam scores are accurate to the ones unit. The value reported as 5.8 kg. is accurate to the tenth
unit, while a GPA of 2.64 is accurate to the hundredth unit.
Lower class boundary, Li, is given as
Li = LL – 0.5 (Precision)
Upper class boundary, Ui, is given as
Ui = UL + 0.5 (Precision)
Additional columns may be added to obtain additional information about the distributional
characteristics of the data. Among these are:
a) Relative Frequency (RF) – frequency of a class expressed in proportion or
percentage of the total number of observations. That is,
n
f
RF i
 where fi is the frequency in each interval
b) Cumulative Frequency (CF). This is the accumulated frequency of a class. There are
two types:
The “less than” CF (<CF) of a class is the number of observations whose values are less than or equal to
the upper limit of the class.
The “greater than” CF (>CF) of a class is the number of observations whose values are greater than or
equal to the lower limit of the class.
D E S C R I P T I V E S T A T I S T I C S P a g e | 7
2.3 Graphical Presentation
This form is the most effective means of organizing and presenting data because the important
relationships are brought out more clearly and creatively in virtually solid and colorful figures.
2.3.1 Different Kinds of Graphs/Charts
1. Line Graph – it shows relationships between two sets of quantities. This is done by
plotting point of X set of quantities along the horizontal axis against the Y set of quantities
along the vertical axis in a Cartesian coordinate plane. Those plotted points will be
connected by a line segment which finally forms the line graphs.
2. Bar Graph – it consists of bars or rectangles of equal widths, either drawn vertically or
horizontally.
3. Circle Graph or Pie Chart – it represents relationships of the different components of a
single total as revealed in the sectors of a circle.
4. Picture Graph or Pictogram – it is a visual presentation of statistical quantities by means
of drawing pictures or symbols related to the subject under study.
2.3.2 Graphical Representation of the Frequency Distribution
1. Bar Chart and Histogram - is one of the more popular ways of representing a frequency
distribution graphically. It is a graph where the different classes are represented by the
class limits in the horizontal axis or categories for nominal data. The length of the
rectangle, represented by the class frequency is drawn in the vertical axis. A graph that is
close resemblance of the bar graph is the histogram. The basic difference is: a bar chart
uses class limits for the horizontal axis while the histogram employs the class boundaries.
Using the class boundaries, it eliminates spaces between the rectangles giving it a solid
appearance.
2. Frequency Polygon - is constructed by plotting the class marks against the frequency.
The set of (x,y) points formed the class marks and their corresponding frequencies are
connected by straight lines. To complete the polygon, which is defined as closed figure, an
additional class mark is added at the beginning and at the end of the distribution.
3. Frequency Ogive - A cumulative frequency distribution can be represented graphically by
a frequency ogive. An ogive is obtained by plotting the upper class boundaries on the
horizontal scale and the cumulative frequency less than the upper class boundaries in the
vertical scale.
3.0 NUMERICAL DESCRIPTION OF DATA
It is a numerical value that summarizes a set of observations into a single value, and that value
may be used to represent the entire population.
3.1 The Summation Symbol
The Greek letter ‘ ’ ( upper case sigma) denotes the summation symbol. It is a more compact
way of writing a sum of a set of data values. A convenient way of writing a data value in mathematical
notation is the subscripted variable ix , which is read as ‘ x sub i ’. When a set of data values are written
in the subscripted variable notation nxxxx ,...,,, 321 , the notation 
n
i
ix
1
is defined as
n
n
i
i xxxxx 
321
1
.
The symbol 
n
i
ix
1
is read as ‘the summation of x sub i from 1 to n ’.
Example: Consider the set of data values 5, 4, 8 and 6 which are measurements of weights. Find the
following:
1. 
4
1i
ix 2. 
4
1
2
i
ix 3.
24
1






i
ix
3.2 Measures of Central Tendency
It is a single value about which the set of observation tend to cluster.
D E S C R I P T I V E S T A T I S T I C S P a g e | 8
3.2.1 ARITHMETIC MEAN
The arithmetic mean or simply mean, is the sum of a set of measurements divided by the
number of measurements in the set. This measure is appropriate for the data in the interval or
ratio scale.
a. Population mean;
N
x
N
i
i
 1

b. Sample mean;
n
x
x
n
i
i
 1
c. Weighted mean;




 k
i
i
k
i
ii
w
f
xf
x
1
1
d. Grand mean;




 k
i
i
k
i
ii
n
xn
x
1
1
Examples 3.2.1:
1. The number of hours spent by ten students in studying per day were recorded as follows: 5, 8, 2,
2, 2, 6, 5, 3, 1, and 4. Find the mean.
2. The following table shows the number of households in the five (5) Barangays in Iligan City in
2010, and corresponding percentage changes in the number of households 2010 – 2012.
Barangay
Number of
Households
Percentage
Change
Tibanga 11,802 9.1
Suarez 8,624 8.3
Hinaplanon 5,326 4.5
Digkilaan 894 1.4
Palao 12,012 10.6
Find the weighted mean of the percentage changes.
3.2.2 MEDIAN
The median is not affected by the presence of abnormally large or abnormally small
observations. It is the middle value of a set of observations arranged in an increasing or
decreasing order of magnitude. It is the middle value when the number of observations is
odd if it is even i. e. it is the value such that half of the observations fall above it and half
below it.
a. Population Median: ~ =
.,
2
1
,
1
22
2
1
evenisNifxx
oddisNifx
NN
N


























 
b. Sample Median: x~ =
.,
2
1
,
1
22
2
1
evenisNifxx
oddisNifx
nn
n


























 
D E S C R I P T I V E S T A T I S T I C S P a g e | 9
3.2.3 MODE – is the value which occurs the most number of times, or the value with the greatest
frequency.
Remarks 3.2.1
1. When mean, median, and mode equal in a given data set then the data set is said to be
normally distributed.
2. The graph of the said data is a symmetrical bell-shaped curved.
3.3 Measures of Variability or Dispersion
They are numerical values computed from the given observations that measures how the data
spread from the central location.
3.3.1 RANGE – is the difference between the largest and the smallest values in the set.
It is denoted by R i.e., R = Highest Value – Lowest Value
3.3.2 VARIANCE – is the average squared differences of the scores from the mean score of a
distribution.
a. Population Variance. Given the finite population x1, x2,…,xN the population variance is:
2
 =
 
2
1
N
x
N
i
i
 
For ease of computation, an alternative form is suggested below:
2
 =
N
Nx
N
i
i

1
22

b. Sample Variance. Given the random sample x1, x2,…,xn , the sample variance is:
2
s =
 
2
1
n
xx
n
i
i

A computationally faster form is
 1
1
2
1
2
2









  
nn
xxn
s
n
i
n
i
ii
Note that in sample variance the denominator is involving “n – 1”, this is because using only “n” to solve
sample variance will underestimate the variance and would create a bias.
3.3.3 STANDARD DEVIATION – is the positive square root of the variance.
a. population standard deviation :
2
 
b. sample standard deviation :
2
ss 
3.3.4 COEFFICIENT OF VARIATION (denoted by CV) – is a measure of relative variation expressed
as percentage. It is the ratio of the standard deviation and the mean multiplied by 100%.
a. %100


CV
b. %100
x
s
CV
Examples 3.3.4
1. The final examination given to two sections of Math 2 gave the following mean and standard
deviation:
Statistics Section A Section B
Mean 30 46
Standard Deviation 10 12
Find the coefficient of variation of the two sections and determine which of the two sections
has greater variability of scores.
D E S C R I P T I V E S T A T I S T I C S P a g e |
10
2. The mean height of college women is 157.48 cm. with a standard deviation of 6.35 cm., while
their mean weight is 47.70 kg. with a standard deviation of 3.64 kg. Which is more variable, the
height or the weight of the college women?
3.3.5 Characteristics of the Standard Deviation
The standard deviation and variance are the most commonly used in measures of dispersion in the social
sciences because:
1. Both take into account the precise difference between each score and the mean.
2. If any single score is change, the standard deviation changes. If the score is moved away from the
mean the standard deviation increases. Otherwise, decreases.
3. If a score is added that is far from the mean the standard deviation increases. Otherwise,
decreases
3.3.6 Interpreting the Standard Deviation
The standard deviation is very important regardless of the mean. It makes a great deal of
difference whether the distribution is spread-out over a broad range or bunched up closely
around the mean. Figure 3.1, shows set scores which are normally distributed.
3.3.6.1
Figure 3.1 A Normal Curve Showing the Percent of Cases Lying Within 1, 2, and 3 Standard Deviations From
the Mean
3.3.6.1 Chebyshev’s Theorem
The accuracy and the position of the scores in frequency distribution relative to the mean can
be determined by using the Chebyshev’s Theorem
Chebyshev’s Theorem: Chebyshev’s theorem states that the proportion or
percentage of any data set that lies within k standard deviations of the mean (where k
is any positive integer greater than 1) is at least
.
1
1 2
k

For any data set, at least 88.9% of the data lie
within three standard deviations to either side of its
mean.
Example 3.3.6.1
If the mean score of the students enrolled in
Statistics class is 66 points with standard deviations
of 5 points, at least what percentage of the scores
must lie between 46 and 86?
Solution:
 
 
4
54666
46566
46




k
k
k
Skx
Hence from Chebyshev’s Theorem, %75.93
16
15
4
1
1
1
1 22

k
D E S C R I P T I V E S T A T I S T I C S P a g e |
11
3.4 Other Measures of Location (Quantiles or Fractiles)
The measures of central tendency refer only to the center of the entire set of data, but there are
other measures of location that describes or locate the non-central position of this set of data. These
measures are referred to as quantiles or fractiles. In this section, we will consider the fractiles, which can
be a percentile, a decile, or a quartile.
3.4.1 Percentiles – are values that divide an ordered set of observations into 100 equal parts. These
values, denoted by P1, P2, … , P99, are such that 1 % of the data falls below P1, 2% falls below
P2,…, and 99 % falls below P99.
3.4.2 Deciles – are values that divide an ordered set of observations into 10 equal parts. These values
denoted by D1, D2, …, D9, are such that 10 % of the data falls below D1, 20 % falls below D2, …,
and 90 % falls below D9.
3.4.3 Quartiles – are values that divide an ordered set of observations into 4 equal parts. These
values, denoted by Q1, Q2, and Q3, are such that 25 % of the data falls below Q1, 50 % falls below
Q2, …, and 75 % falls below Q3.
Procedure for the computation of the fractiles:
1. Arrange the data in an increasing order of magnitude.
2. Solve for the value of L, where










Quartilesfor
mn
Decilesfor
mn
sPercentilefor
mn
L
'
4
,
10
,
100
where: m is the location of the percentile, decile, or quartile
n is the number of observations.
3. If L is an integer, the desired fractile is the average of the Lth
and the (L + 1)th
observations. If L is
fractional, get the next higher integer to find the required location. The fractile corresponds to the
value in that location.
Remark 3.4:
1. Semi-Interquartile Range represents the distance on a scale between Q1 and Q3.
2. Quartile Deviation is the half of semi-interquartile range.
3.5 Skewness and Kurtosis
Skewness is the degree of departure from symmetry of a distribution. Kurtosis is the
degree of peakedness of distribution.
3.5.1 Symmetric Distribution (those where one side is the mirror image of the other) when
presented graphically will show normal curves. They have a mean and a median that
have the same value. If the distribution is symmetric and unimodal, the mode also has
the same value as the mean and median (see Graph 1 in Figure 4.1).
3.5.2 Skewed Distribution – have different values for the mean, median, and mode. For
unimodal skewed distributions, the mean is pulled toward the tail, and the median is
between the mean and mode.
Figure 4.1 Graphs of Different Type of Distribution
D E S C R I P T I V E S T A T I S T I C S P a g e |
12
Remarks 3.4
1. A positively skewed distribution has “tail” which
pulled in positive direction (see Graph 3 in
Figure 4.1).
2. A negatively skewed distribution has “tail” which
pulled in negative direction (see Graph 2 in
Figure 4.1).
3. A symmetric distribution has zero skewness.
4. A normal distribution is a mesokurtic distribution.
5. A pure leptokurtic distribution has a higher peak
than the normal distribution and has heavier
tails.
6. A pure platykurtic distribution has a lower peak than a normal distribution and lighter tails.
3.5.3 Application of Measuring Skewness and Kurtosis
One application is testing for normality: many statistics inferences require that a distribution be normal or
nearly normal. A normal distribution has skewness and excess kurtosis of 0, so if your distribution is
close to those values then it is probably close to normal.
3.5.4 Calculating Skewness
The moment coefficient of skewness of a data set is skewness:
.
3
2
3
1
m
m
g 
where:
 
n
xx
m
n
i
i

 1
3
3
x̄ - is the mean and n is the sample size, as usual.
m3 - is called the third moment of the data set.
m2 - is the variance.
Note: Remember that you have to choose one of two different measures of standard deviation,
depending on whether you have data for the whole population or just a sample. The same is true of
skewness. If you have the whole population, then g1 above is the measure of skewness. But if you have
just a sample, you need the sample skewness:
 
11
2
1
g
n
nn
G 



3.5.5 Interpreting Skewness
1. If skewness is positive, the data are positively skewed or skewed right, meaning that the right
tail of the distribution is longer than the left.
2. If skewness is negative, the data are negatively skewed or skewed left, meaning that the left tail
is longer.
3. If skewness = 0, the data are perfectly symmetrical.
4. But a skewness of exactly zero is quite unlikely for real-world data, so how can you interpret the
skewness number? Bulmer, M. G., Principles of Statistics (Dover,1979) — classically
suggests this rule of thumb:
a. If skewness is less than −1 or greater than +1, the distribution is highly skewed.
b. If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately
skewed.
c. If skewness is between −½ and +½, the distribution is approximately symmetric.
Inferring
Your data set is just one sample drawn from a population. Maybe, from ordinary sample variability, your
sample is skewed even though the population is symmetric. But if the sample is skewed too much for
random chance to be the explanation, then you can conclude that there is skewness in the population.
To answer that, you need to divide the sample skewness G1 by the standard error of skewness (SES) to
get the test statistic, which measures how many standard errors separate the sample skewness from
zero:
D E S C R I P T I V E S T A T I S T I C S P a g e |
13
test statistic:
 
   312
16
,1
1



nnn
nn
SES
SES
G
Z g
The critical value of Zg1 is approximately 2. (This is a two-tailed test of skewness ≠ 0 at roughly the 0.05
significance level.)
 If Zg1< −2, the population is very likely skewed negatively (though you don’t know by how much).
 If Zg1 is between −2 and +2, you can’t reach any conclusion about the skewness of the
population: it might be symmetric, or it might be skewed in either direction.
 If Zg1 > 2, the population is very likely skewed positively (though you don’t know by how much).
D E S C R I P T I V E S T A T I S T I C S P a g e |
14
CASE STUDIES:
Case Study1
1. A study was conducted to see how well reading success in first grade could be predicted from
various kinds of information obtained in kindergarten: age, sex, tribe, academic rank, and IQ.
Which of the variables represents a
a. nominal scale
b. ordinal scale
c. interval scale
d. ratio scale
2. Are the following variables discrete or continuous?
a. The number of correct answers on the true-false test.
b. The duration of the effectiveness of a pain medication.
c. The number of commercials aired daily by a television station.
d. The weights of Sunday newspaper.
e. The heights of basketball players.
2. Among 250 employees of the local office of an international insurance company, 182 are whites,
51 are blacks, and 17 are Orientals. If we use the stratified random sampling to select a
committee of 15 employees, how many employees must we take from each class?
3. Suppose you were asked to make a study on the brand preferences and satisfaction of the
customers of famous laundry soaps in four (4) different supermarkets.
a. Arrange the letters of the following steps to statistical inquiry in a logical way.
A. Collecting relevant information
B. Defining a problem
C. Interpreting the data
D. Analyzing the data
E. Organizing and presenting data
b. Who will be the most appropriate respondents of the study?
c. How will you apply multi-stage sampling to the population of the study?
e. Calculate the sample size if the population size is 2000 and the margin of error is 5%.
Case Study2
1. Create a textual presentation based from the table shown below. Suppose there are 800 million
users per day.
2. Create tabular and (any) graphical presentations of the textual presentation as presented below.
“The top three regions in terms of population count are Region IV-Southern
Tagalog (11.32 million or 15.04% of the total), NCR (10.49 million or 13.93%), Region III
– Central Luzon (7.80 million or 10.35%). The population residing in these regions
combined comprises 39.32% of the total Filipino population. This means that four out of
ten persons in the country reside in NCR and the adjoining regions of Central Luzon and
Southern Tagalog.”
D E S C R I P T I V E S T A T I S T I C S P a g e |
15
3. Using the table below
Table 2.5 Number of Passengers for P&P Airlines
68
72
50
70
65
83
77
78
80
93
71
74
60
84
72
84
73
81
84
92
77
57
70
59
85
74
78
79
91
102
83
67
66
75
79
82
93
90
101
80
79
69
76
94
71
97
92
83
86
69
a. Construct a frequency distribution table (with the class interval, frequency, class
boundaries, class marks and cumulative frequency) for the given data.
b. Construct its bar graph, histogram, frequency polygon, and frequency ogive.
c. Determine whether the given data set is normally distributed.
3. Given the frequency polygon below.
a. Reconstruct the frequency distribution table.
b. Construct the frequency histogram.
c. Give the answers of the following:
i. What is the lower class limit of the lowest class?
ii. What is the lower class boundary of the highest class?
iii. What is the class width?
Case Study3
1. A random sample of 10 students was given a special test. The time in minutes it took the students
to finish the exam were taken and are given as follows:
Find the following:
a) Mean
b) Median
c) Variance
d) Standard Deviation
e) Range
f) Mode
g) Coefficient of Variation
h) 18th
Percentile
i) 7th
Decile
j) 3rd
Quartlie
FREQUENCY
CLASS MARKS
6
10
12
14
21.2 22.9 24.6 26.3 28 29.7 31.1 34.8 36.5
0
15 30 26 40 35 19 22 28 17 38
D E S C R I P T I V E S T A T I S T I C S P a g e |
16
2. Suppose that you are investigating the influence of interactive approach on the students’
mathematics performance. Consider the following samples of students’ final examination scores
taken from three (3) sections of Math 1 enrolled during the first semester of SY 2011 – 2012.
Sections Sample Scores
Rizal 19 8 7 2 19 29 36 20 3 14
Bonifacio 14 25 12 32 13 17 10 22 13 32
Luna 24 13 20 1 8 28 16 21 23 26
a. Describe the performance of each section by their respective mean and standard
deviation.
b. Which of these 3 sections showed great improvements of the students’ performance in
mathematics? Explain why?
3. Table shown below is the distribution of the responses of your respondents in the emotional
intelligence inventory.
Emotional Intelligence Inventory
Indicators
Almost
Never
Seldom
Sometimes
Usually
Almost
Always
(1) (2) (3) (4) (5)
1. I appropriately communicate decisions to stakeholders. 11 9 15 5 9
2. I fail to recognize how my feelings drive my behavior at work. 18 2 10 12 8
3. When upset at work, I still think clearly. 5 6 15 14 8
4. I fail to handle stressful situations at work effectively. 10 12 8 14 6
5. I understand the things that make people feel optimistic at
work.
18 2 13 7 10
6. I fail to keep calm in difficult situations at work. 21 12 8 9 0
7. I am effective in helping others feel positive at work. 1 4 16 19 10
8. I find it difficult to identify the things that motivate people at
work.
15 12 5 8 5
1. Find the weighted mean of each statement.
2. Set-up a Likert scale with 5 intervals to interpret the results by assigning a descriptive equivalent
such as “very low”, “low”, “average”, “high”, “very high”.
3. Find the weighted mean of each statement.
4. Find the standard deviation of each item.
5. Find the grand mean.
6. Interpret the results.

Más contenido relacionado

La actualidad más candente

Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statisticsalbertlaporte
 
Class lecture notes # 2 (statistics for research)
Class lecture notes # 2 (statistics for research)Class lecture notes # 2 (statistics for research)
Class lecture notes # 2 (statistics for research)Harve Abella
 
Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Harve Abella
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
 
Frequency Distribution Table-Grouped Data.pptx
Frequency Distribution Table-Grouped Data.pptxFrequency Distribution Table-Grouped Data.pptx
Frequency Distribution Table-Grouped Data.pptxErwinRombaoa2
 
Association between-variables
Association between-variablesAssociation between-variables
Association between-variablesBorhan Uddin
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsAmira Talic
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysissristi1992
 
What is Data? in Statistics
What is Data? in StatisticsWhat is Data? in Statistics
What is Data? in StatisticsSaurabh Patni
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionGnana Sravani
 
Measures of dispersion or variation
Measures of dispersion or variationMeasures of dispersion or variation
Measures of dispersion or variationRaj Teotia
 
Describing Distributions with Numbers
Describing Distributions with NumbersDescribing Distributions with Numbers
Describing Distributions with Numbersnszakir
 
MEASURESOF CENTRAL TENDENCY
MEASURESOF CENTRAL TENDENCYMEASURESOF CENTRAL TENDENCY
MEASURESOF CENTRAL TENDENCYRichelle Saberon
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsteena1991
 
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettyApplication of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettySundar B N
 
Basic Statistical Concepts and Methods
Basic Statistical Concepts and MethodsBasic Statistical Concepts and Methods
Basic Statistical Concepts and MethodsAhmed-Refat Refat
 

La actualidad más candente (20)

Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Class lecture notes # 2 (statistics for research)
Class lecture notes # 2 (statistics for research)Class lecture notes # 2 (statistics for research)
Class lecture notes # 2 (statistics for research)
 
Sampling
SamplingSampling
Sampling
 
Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)Data organization and presentation (statistics for research)
Data organization and presentation (statistics for research)
 
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
 
Frequency Distribution Table-Grouped Data.pptx
Frequency Distribution Table-Grouped Data.pptxFrequency Distribution Table-Grouped Data.pptx
Frequency Distribution Table-Grouped Data.pptx
 
Association between-variables
Association between-variablesAssociation between-variables
Association between-variables
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
 
Measures of dispersion
Measures  of  dispersionMeasures  of  dispersion
Measures of dispersion
 
What is Data? in Statistics
What is Data? in StatisticsWhat is Data? in Statistics
What is Data? in Statistics
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Measures of dispersion or variation
Measures of dispersion or variationMeasures of dispersion or variation
Measures of dispersion or variation
 
Describing Distributions with Numbers
Describing Distributions with NumbersDescribing Distributions with Numbers
Describing Distributions with Numbers
 
MEASURESOF CENTRAL TENDENCY
MEASURESOF CENTRAL TENDENCYMEASURESOF CENTRAL TENDENCY
MEASURESOF CENTRAL TENDENCY
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Measures of variability
Measures of variabilityMeasures of variability
Measures of variability
 
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shettyApplication of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
Application of Univariate, Bi-variate and Multivariate analysis Pooja k shetty
 
Data organization
Data organizationData organization
Data organization
 
Basic Statistical Concepts and Methods
Basic Statistical Concepts and MethodsBasic Statistical Concepts and Methods
Basic Statistical Concepts and Methods
 

Similar a Review of descriptive statistics

Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1Katrina Mae
 
Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1Katrina Mae
 
Collecting Quantitative Datafinished
Collecting Quantitative DatafinishedCollecting Quantitative Datafinished
Collecting Quantitative Datafinishedzainab85
 
UPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for PrelimsUPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for PrelimsMarvin158667
 
Experimental Psychology
Experimental PsychologyExperimental Psychology
Experimental PsychologyElla Mae Ayen
 
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
BASIC STATISTICAL TREATMENT IN RESEARCH.pptxBASIC STATISTICAL TREATMENT IN RESEARCH.pptx
BASIC STATISTICAL TREATMENT IN RESEARCH.pptxardrianmalangen2
 
Statistics 1
Statistics 1Statistics 1
Statistics 1Saed Jama
 
Meaning and Importance of Statistics
Meaning and Importance of StatisticsMeaning and Importance of Statistics
Meaning and Importance of StatisticsFlipped Channel
 
Probability in statistics
Probability in statisticsProbability in statistics
Probability in statisticsSukirti Garg
 
Sampling and instrumentation
Sampling and instrumentationSampling and instrumentation
Sampling and instrumentationshree.vivek
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 
SAMPLING TECHNIQUES.pptx
SAMPLING TECHNIQUES.pptxSAMPLING TECHNIQUES.pptx
SAMPLING TECHNIQUES.pptxMayFerry
 
Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Fatima Bianca Gueco
 
Probability and statistics(exercise answers)
Probability and statistics(exercise answers)Probability and statistics(exercise answers)
Probability and statistics(exercise answers)Fatima Bianca Gueco
 

Similar a Review of descriptive statistics (20)

Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1
 
Statistics lesson 1
Statistics   lesson 1Statistics   lesson 1
Statistics lesson 1
 
Collecting Quantitative Datafinished
Collecting Quantitative DatafinishedCollecting Quantitative Datafinished
Collecting Quantitative Datafinished
 
UPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for PrelimsUPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for Prelims
 
Experimental Psychology
Experimental PsychologyExperimental Psychology
Experimental Psychology
 
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
BASIC STATISTICAL TREATMENT IN RESEARCH.pptxBASIC STATISTICAL TREATMENT IN RESEARCH.pptx
BASIC STATISTICAL TREATMENT IN RESEARCH.pptx
 
Statistics 1
Statistics 1Statistics 1
Statistics 1
 
1.3 collecting sample data
1.3 collecting sample data1.3 collecting sample data
1.3 collecting sample data
 
Meaning and Importance of Statistics
Meaning and Importance of StatisticsMeaning and Importance of Statistics
Meaning and Importance of Statistics
 
Probability in statistics
Probability in statisticsProbability in statistics
Probability in statistics
 
Stat and prob a recap
Stat and prob   a recapStat and prob   a recap
Stat and prob a recap
 
Sampling and instrumentation
Sampling and instrumentationSampling and instrumentation
Sampling and instrumentation
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
SAMPLING TECHNIQUES.pptx
SAMPLING TECHNIQUES.pptxSAMPLING TECHNIQUES.pptx
SAMPLING TECHNIQUES.pptx
 
Sampling
Sampling Sampling
Sampling
 
Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)Probability and statistics(assign 7 and 8)
Probability and statistics(assign 7 and 8)
 
New statistics
New statisticsNew statistics
New statistics
 
Probability and statistics(exercise answers)
Probability and statistics(exercise answers)Probability and statistics(exercise answers)
Probability and statistics(exercise answers)
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Finals Stat 1
Finals Stat 1Finals Stat 1
Finals Stat 1
 

Último

The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapitolTechU
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?TechSoup
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesMohammad Hassany
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 

Último (20)

The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptxCapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Human-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming ClassesHuman-AI Co-Creation of Worked Examples for Programming Classes
Human-AI Co-Creation of Worked Examples for Programming Classes
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 

Review of descriptive statistics

  • 1. D E S C R I P T I V E S T A T I S T I C S P a g e | 1 1.0 INTRODUCTION In everyday life, whether at home or at work, we usually keep records or read reports. An item in the record or report is a fact that is expressed in terms of a numerical value or described by its quality or kind. That single item or fact is referred to as a datum. All these facts in a record or report are called data. Examples of data:  Color of the hair  Number of students in a class  Height and weight  Number of times you were absent from class 1.1Population and Sample In data-gathering phase, the information is taken from a unit, which is a part of a collection of all such units called a population. A population is consists of an entire set of objects, observations, or scores that have something in common. Some Definitions: Population – collection of all units from which the data is to be collected. Element – unit in a population Sample – subset or a representative part of the population. Frame – listing of all the elements of the population Census – complete enumeration in which every member of the population is included Sampling – or sample survey; only a part or a portion of the population is used to obtain data 1.2Definition of Statistics The word “statistics” is used in several different senses. In the broadest sense, “statistics” is branch of science that deals with the development of methods for a more effective way of collecting, organizing, presenting, and analyzing data. Data and how to deal with it is the main concern of statistics. In a second usage, a “statistic” is defined as numerical quantity (such as mean) calculated in a sample. The grade point average (GPA) is an example of a statistic. It is a value computed from a set of grades of a student in a particular semester. Illustration: If the data is the set of grades, then GPA is the statistic. Another numerical value that can be computed from the set of grades is the percentage passing. The percentage passing is also a statistic. From the same set of grades, the number of subjects that received a failing grade is another statistic. Taken together, the GPA, the percentage passing, and the number that received a failing grade are called statistics. Major Areas of Statistics 1. Descriptive Statistics – deals largely with summary calculations, graphical displays, and describing important features of a set of data. It does not attempt to draw conclusions about anything that pertains to more than the data themselves. 2. Inferential Statistics – concerned with making generalizations from information gathered from a small group of observations (sample) to a bigger group of observations (population). Two Main Methods: 1. Estimation - the sample statistic is used to estimate a population parameter - a confidence interval about the estimate is constructed. 2. Hypothesis Testing - a null hypothesis is put forward. - Analysis of the data is then used to determine whether to reject it. 1.3Variables A variable is any measured characteristic or an attribute that differs for different subjects. Those variables having cause-and-effect relationships are called independent variables and dependent variables.
  • 2. D E S C R I P T I V E S T A T I S T I C S P a g e | 2 Types of Variables: 1. Qualitative Variables – sometimes called “categorical variables” - facts for which no numerical measure exists - expressed in categories or kind Examples:  color of the skin which can be black, brown, or white  person’s sex which can be male or female 2. Quantitative Variables – variables that can be expressed in numbers. - can be measured and counted. Examples:  person’s height and weight – can be measured  number of students in a class – can be counted Classification of Quantitative Variables 1. Continuous variable A continuous variable is one for which within the limits the variable ranges, any value is possible. Examples:  Time to solve a math problem is continuous since it could take 2 minutes, 2.13 minutes, etc. to finish a problem  Height is continuous since it could take 1.55 meters, 1.65 meters, etc. 2. Discrete variable A discrete variable is one that cannot take on all values within the limits of the variable Examples:  Responses to a five-point rating scale is discrete since it can only take 1, 2, 3, 4 and 5.  Number of provinces 1.4Types of Measurements 1. Nominal measurement is consists of assigning items to groups or categories. No quantitative information is conveyed and no ordering of the items is implied. Nominal measurements are therefore qualitative rather than quantitative. Nominal measurement is the lowest form of measurement. Examples:  Color  Sex  Blood type  Religion 2. Ordinal Measurement Measurement in ordinal scales are ordered in the sense that higher number represent higher values. However, the intervals between the numbers are not necessarily equal. For example, on a five-point rating scale measuring attitudes towards gun control, the difference between a rating of 2 and a rating of 3 may not represent the same difference as the difference between a rating of 4 and a rating of 5. There is no “true” zero point for ordinal scales since the zero point is chosen arbitrarily. The lowest point on the rating scale in the example was chosen to be 1. It could just as well have been 0 or 5. Examples:  Taste preferences  Satisfactions  Social classes  Academic honors 3. Interval Measurement On interval measurement scales, one unit on the scale represents the same magnitude on the trait or characteristic being measured across the whole range of scale. Interval scales do not have a “true” zero point, however, and therefore it is not possible to make statements about how many times higher one score is than the another. A good example of interval scale is the Fahrenheit scale for temperature. 4. Ratio Measurement Ratio measurements are like interval measurement except they have true zero point. It is the highest form of measurement.
  • 3. D E S C R I P T I V E S T A T I S T I C S P a g e | 3 Examples:  Length  Weight Note: A large number of statistical analysis tools are available for each type of measurements. It is important that the statistical user has a good understanding of the type of data that is to be processed in order that the statistical tool that is chosen is used properly. 1.5 Random and Non-Random Sampling  Random sampling is the most commonly used sampling technique in which each member in the population is given an equal chance of being selected in the sample.  Non-random sampling is the method of collecting a small portion of the population by which not all the members in the population are given the chance to be included in the sample. Properties of Random Sampling 1. Equiprobability – means that each member of the population has an equal chance of being selected and included in the sample. 2. Independence – means that the chance of one member being drawn does not affect the chance of the other member. 1.6 Probability Sampling Techniques 1. Simple Random Sampling (SRS) – process for selecting a sample wherein every element in the sampled population is given an equal chance of being included in the sample 2. Systematic Random Sampling – sampling wherein every kth unit is included after a random start is taken for the sample 3. Stratified Proportional Random Sampling – population is divided into homogeneous groups of strata and selection is done within each stratum 4. Multi-stage Sampling – this technique uses several stages or phases in getting sample from the population. This method is an extension or a multiple application of the stratified random sampling technique. 1.7Non-random Sampling Techniques 1. Judgment or Purposive Sampling – this method is also referred as non-probability sampling. It plays a major role in the selection of a particular item and in making decisions in cases of incomplete responses or observation. 2. Quota Sampling – this is a relatively quick and inexpensive method to operate since the choice of the number of subjects to be included in a sample is done at the researcher’s own convenience or preference and is not predetermined by some carefully operated randomizing plan. 3. Cluster Sampling – population is divided into a number of relatively small subdivisions, which are themselves clusters of still smaller units, and then some of these subdivisions, or clusters, are randomly selected for inclusion in the overall sample. 4. Incidental Sampling – this design is applied to those samples which are taken because they are the most available. 5. Convenience Sampling – this method has been widely used in television and radio programs to find out opinions of TV viewers and listeners regarding a controversial issues. 1.8 Methods of Collecting Data There are many ways of collecting data, each of which has its own advantages and disadvantages. The more general methods of collecting informations are: 1. Direct or Interview Method A very common and effective method of obtaining informations is by conducting interviews. People usually respond when visited in person. Disadvantages: People may tend to lie and interviews are quite costly and needs thorough training of the interviewers (untrained interviewers tend to influence the respondent’s answers). 2. Indirect or Questionnaire Method Questionnaires can either be mailed or handed personally to respondents. Advantages: It does not require interviews and is therefore less costly. It also cover wider area than interviews.
  • 4. D E S C R I P T I V E S T A T I S T I C S P a g e | 4 Disadvantages: Response rate is usually lower than interview. Many people tend to ignore mailed questionnaires. To encourage participation, a questionnaire should be kept short as possible and contain questions related to the objectives of the survey. 3. Direct Observation In situations where less personal responses are needed, collecting data by direct observation may be used. Disadvantage: Assigned person to observe may commit some observational errors. 4. Experimentation – is used when the objective is to determine the cause-and-effect of a certain phenomenon under some controlled conditions. 5. Utilizing Existing Records A very convenient way of obtaining data is by utilizing existing records. There are number of institutions that gather data not only for their own purposes but for purposes of other group of people. Advantage: It is very economical and requires less cooperation from people. Disadvantage: Informations needed may not be found in these sources. Data are sometimes obtained in published/unpublished document and can be classified as follows:  Primary sources – provide data first hand; data gathered originally have not been subjected to some transcription or condensation. Its authenticity is guaranteed by the group who gathered it originally.  Secondary sources – provide data that have been transcribed or compiled from original sources 2.0 ORGANIZATION AND PRESENTATION OF DATA After data have been gathered and checked for possible errors, the next logical step is to present the data in a manner that is easy to understand. It should also readily convey the relevant information and the important results at a glance. Ways/Methods of presenting data: 1. Textual presentation – a narrative way of describing the collected characteristics of the population based on the data collected and organized 2. Tabular presentation – data are tallied into the appropriate row and/or column categories 3. Graphical presentation – data are presented graphically such as bar chart, histogram, pie chart and pictograph 2.1 Textual Presentation Example: A total of 22.4 million children aged 5-17 years old in 9.6 million households were estimated from the 1995 National Survey of Working Children (NSWC). Sixteen percent (16%) or 3.6 million children were reported engaged in economic activities at any time in 1995. Boys were more likely to work than girls with a national sex ratio of working children of 187. 2.2 Tabular Presentation - may be in the form of a cross tabulation table, a frequency distribution table (FDT) or a stem-and-leaf plot. 2.2.1 Cross Tabulation Table When a data are in categories, results are usually presented in systematic manner by using a table, which arranges data in rows and columns.
  • 5. D E S C R I P T I V E S T A T I S T I C S P a g e | 5 Example: Table 1. Numbers of Subjects Falling Into Smoking/Lung Cancer Combination Smoker Lung Cancer Present Absent Total Yes 688 650 1338 No 21 59 80 Total 709 709 1418 A table contains: 1. Heading Heading includes a table number and a title. A Table number is necessary to easily identify the table. It should be followed by a title, which briefly de describes the contents of the table. 2. Body The body is the main part of the table. It contains row categories (which are found in the left side of the table) and the column categories (which are found at the top of the table). Row totals may also be included and is located in the right side of the table. A column total may also be included and is located at the top of the table. The figures found in the cells of the main body are usually the frequencies, representing the number of time the two categories occur together. Percentages can be used instead of frequencies. Or use both percentages and frequencies. 3. Footnote (optional) The data used may have been taken from some publications of provided by another group of person. Footnotes may be added to indicate the source of information. Contingency Table – a table listing the frequencies for the different combination of values of two categorical variables. 2.2.2 Frequency Distribution In many instances, information gathered is numerical in nature, such as age respondent or exam score of a student. When faced with a large set of this kind of data, it is often advantageous to group the data into a number of classes of intervals so as to get a better overall picture. Table 2.3 Scores in a Statistics Final Exam 31 28 15 10 47 18 32 29 58 48 37 49 26 54 56 21 24 28 32 28 43 12 23 29 61 16 42 40 32 26 48 36 39 22 40 20 63 54 30 17 18 30 23 26 36 47 19 25 38 35 Table 2.3 is a set of scores in the exam of Statistics. The above data will be used in illustrating the construction of a frequency table. Frequency distribution – is a grouping of all observations into interval or classes together with a count of the number of observations that fall in each interval or class. Data in Table 2.3 is called raw data and such form is difficult to read and analyze. In frequency distributions the data is presented in a more compact and usable manner. However, this process brings about some loss of details. 1.1 Steps in Constructing a Frequency Distribution 1. From the data set, identify the highest value and lowest value. Compute the range R as R = highest value – lowest value 2. Estimate the number of classes, k as
  • 6. D E S C R I P T I V E S T A T I S T I C S P a g e | 6 nk  Note: The results are “rounded off” to the next higher integer, NOT the usual nearest integer. Rounding off to the nearest integer will often yield a number of intervals that cannot accommodate all the observations. 3. Estimate the width c of the interval by dividing the range R by the number of classes k. That is, k R c  Round off this estimate to the same number of significant places as the original data set. No. of decimal places of the raw data Precision 0 1 1 0.1 2 0.01 3 0.001 4. List the lower and upper class limits of the first interval. This interval should contain the smallest observation in the data set. The starting lower limit could be the lowest or any number closest to it. 5. List all the class limits by adding the class width to the limits of the previous interval. The highest class should contain the largest observation in the data set. 6. Tally the frequencies for each class. 7. Compute the class marks and the class boundaries. Class midpoint, or class mark is the midpoint of an interval. That is, 2 ULLL CM   where, CM – class mark LL – lower limit UL – upper limit To find class boundaries, it is important to know the unit of accuracy of the raw data. The final exam scores are accurate to the ones unit. The value reported as 5.8 kg. is accurate to the tenth unit, while a GPA of 2.64 is accurate to the hundredth unit. Lower class boundary, Li, is given as Li = LL – 0.5 (Precision) Upper class boundary, Ui, is given as Ui = UL + 0.5 (Precision) Additional columns may be added to obtain additional information about the distributional characteristics of the data. Among these are: a) Relative Frequency (RF) – frequency of a class expressed in proportion or percentage of the total number of observations. That is, n f RF i  where fi is the frequency in each interval b) Cumulative Frequency (CF). This is the accumulated frequency of a class. There are two types: The “less than” CF (<CF) of a class is the number of observations whose values are less than or equal to the upper limit of the class. The “greater than” CF (>CF) of a class is the number of observations whose values are greater than or equal to the lower limit of the class.
  • 7. D E S C R I P T I V E S T A T I S T I C S P a g e | 7 2.3 Graphical Presentation This form is the most effective means of organizing and presenting data because the important relationships are brought out more clearly and creatively in virtually solid and colorful figures. 2.3.1 Different Kinds of Graphs/Charts 1. Line Graph – it shows relationships between two sets of quantities. This is done by plotting point of X set of quantities along the horizontal axis against the Y set of quantities along the vertical axis in a Cartesian coordinate plane. Those plotted points will be connected by a line segment which finally forms the line graphs. 2. Bar Graph – it consists of bars or rectangles of equal widths, either drawn vertically or horizontally. 3. Circle Graph or Pie Chart – it represents relationships of the different components of a single total as revealed in the sectors of a circle. 4. Picture Graph or Pictogram – it is a visual presentation of statistical quantities by means of drawing pictures or symbols related to the subject under study. 2.3.2 Graphical Representation of the Frequency Distribution 1. Bar Chart and Histogram - is one of the more popular ways of representing a frequency distribution graphically. It is a graph where the different classes are represented by the class limits in the horizontal axis or categories for nominal data. The length of the rectangle, represented by the class frequency is drawn in the vertical axis. A graph that is close resemblance of the bar graph is the histogram. The basic difference is: a bar chart uses class limits for the horizontal axis while the histogram employs the class boundaries. Using the class boundaries, it eliminates spaces between the rectangles giving it a solid appearance. 2. Frequency Polygon - is constructed by plotting the class marks against the frequency. The set of (x,y) points formed the class marks and their corresponding frequencies are connected by straight lines. To complete the polygon, which is defined as closed figure, an additional class mark is added at the beginning and at the end of the distribution. 3. Frequency Ogive - A cumulative frequency distribution can be represented graphically by a frequency ogive. An ogive is obtained by plotting the upper class boundaries on the horizontal scale and the cumulative frequency less than the upper class boundaries in the vertical scale. 3.0 NUMERICAL DESCRIPTION OF DATA It is a numerical value that summarizes a set of observations into a single value, and that value may be used to represent the entire population. 3.1 The Summation Symbol The Greek letter ‘ ’ ( upper case sigma) denotes the summation symbol. It is a more compact way of writing a sum of a set of data values. A convenient way of writing a data value in mathematical notation is the subscripted variable ix , which is read as ‘ x sub i ’. When a set of data values are written in the subscripted variable notation nxxxx ,...,,, 321 , the notation  n i ix 1 is defined as n n i i xxxxx  321 1 . The symbol  n i ix 1 is read as ‘the summation of x sub i from 1 to n ’. Example: Consider the set of data values 5, 4, 8 and 6 which are measurements of weights. Find the following: 1.  4 1i ix 2.  4 1 2 i ix 3. 24 1       i ix 3.2 Measures of Central Tendency It is a single value about which the set of observation tend to cluster.
  • 8. D E S C R I P T I V E S T A T I S T I C S P a g e | 8 3.2.1 ARITHMETIC MEAN The arithmetic mean or simply mean, is the sum of a set of measurements divided by the number of measurements in the set. This measure is appropriate for the data in the interval or ratio scale. a. Population mean; N x N i i  1  b. Sample mean; n x x n i i  1 c. Weighted mean;      k i i k i ii w f xf x 1 1 d. Grand mean;      k i i k i ii n xn x 1 1 Examples 3.2.1: 1. The number of hours spent by ten students in studying per day were recorded as follows: 5, 8, 2, 2, 2, 6, 5, 3, 1, and 4. Find the mean. 2. The following table shows the number of households in the five (5) Barangays in Iligan City in 2010, and corresponding percentage changes in the number of households 2010 – 2012. Barangay Number of Households Percentage Change Tibanga 11,802 9.1 Suarez 8,624 8.3 Hinaplanon 5,326 4.5 Digkilaan 894 1.4 Palao 12,012 10.6 Find the weighted mean of the percentage changes. 3.2.2 MEDIAN The median is not affected by the presence of abnormally large or abnormally small observations. It is the middle value of a set of observations arranged in an increasing or decreasing order of magnitude. It is the middle value when the number of observations is odd if it is even i. e. it is the value such that half of the observations fall above it and half below it. a. Population Median: ~ = ., 2 1 , 1 22 2 1 evenisNifxx oddisNifx NN N                             b. Sample Median: x~ = ., 2 1 , 1 22 2 1 evenisNifxx oddisNifx nn n                            
  • 9. D E S C R I P T I V E S T A T I S T I C S P a g e | 9 3.2.3 MODE – is the value which occurs the most number of times, or the value with the greatest frequency. Remarks 3.2.1 1. When mean, median, and mode equal in a given data set then the data set is said to be normally distributed. 2. The graph of the said data is a symmetrical bell-shaped curved. 3.3 Measures of Variability or Dispersion They are numerical values computed from the given observations that measures how the data spread from the central location. 3.3.1 RANGE – is the difference between the largest and the smallest values in the set. It is denoted by R i.e., R = Highest Value – Lowest Value 3.3.2 VARIANCE – is the average squared differences of the scores from the mean score of a distribution. a. Population Variance. Given the finite population x1, x2,…,xN the population variance is: 2  =   2 1 N x N i i   For ease of computation, an alternative form is suggested below: 2  = N Nx N i i  1 22  b. Sample Variance. Given the random sample x1, x2,…,xn , the sample variance is: 2 s =   2 1 n xx n i i  A computationally faster form is  1 1 2 1 2 2             nn xxn s n i n i ii Note that in sample variance the denominator is involving “n – 1”, this is because using only “n” to solve sample variance will underestimate the variance and would create a bias. 3.3.3 STANDARD DEVIATION – is the positive square root of the variance. a. population standard deviation : 2   b. sample standard deviation : 2 ss  3.3.4 COEFFICIENT OF VARIATION (denoted by CV) – is a measure of relative variation expressed as percentage. It is the ratio of the standard deviation and the mean multiplied by 100%. a. %100   CV b. %100 x s CV Examples 3.3.4 1. The final examination given to two sections of Math 2 gave the following mean and standard deviation: Statistics Section A Section B Mean 30 46 Standard Deviation 10 12 Find the coefficient of variation of the two sections and determine which of the two sections has greater variability of scores.
  • 10. D E S C R I P T I V E S T A T I S T I C S P a g e | 10 2. The mean height of college women is 157.48 cm. with a standard deviation of 6.35 cm., while their mean weight is 47.70 kg. with a standard deviation of 3.64 kg. Which is more variable, the height or the weight of the college women? 3.3.5 Characteristics of the Standard Deviation The standard deviation and variance are the most commonly used in measures of dispersion in the social sciences because: 1. Both take into account the precise difference between each score and the mean. 2. If any single score is change, the standard deviation changes. If the score is moved away from the mean the standard deviation increases. Otherwise, decreases. 3. If a score is added that is far from the mean the standard deviation increases. Otherwise, decreases 3.3.6 Interpreting the Standard Deviation The standard deviation is very important regardless of the mean. It makes a great deal of difference whether the distribution is spread-out over a broad range or bunched up closely around the mean. Figure 3.1, shows set scores which are normally distributed. 3.3.6.1 Figure 3.1 A Normal Curve Showing the Percent of Cases Lying Within 1, 2, and 3 Standard Deviations From the Mean 3.3.6.1 Chebyshev’s Theorem The accuracy and the position of the scores in frequency distribution relative to the mean can be determined by using the Chebyshev’s Theorem Chebyshev’s Theorem: Chebyshev’s theorem states that the proportion or percentage of any data set that lies within k standard deviations of the mean (where k is any positive integer greater than 1) is at least . 1 1 2 k  For any data set, at least 88.9% of the data lie within three standard deviations to either side of its mean. Example 3.3.6.1 If the mean score of the students enrolled in Statistics class is 66 points with standard deviations of 5 points, at least what percentage of the scores must lie between 46 and 86? Solution:     4 54666 46566 46     k k k Skx Hence from Chebyshev’s Theorem, %75.93 16 15 4 1 1 1 1 22  k
  • 11. D E S C R I P T I V E S T A T I S T I C S P a g e | 11 3.4 Other Measures of Location (Quantiles or Fractiles) The measures of central tendency refer only to the center of the entire set of data, but there are other measures of location that describes or locate the non-central position of this set of data. These measures are referred to as quantiles or fractiles. In this section, we will consider the fractiles, which can be a percentile, a decile, or a quartile. 3.4.1 Percentiles – are values that divide an ordered set of observations into 100 equal parts. These values, denoted by P1, P2, … , P99, are such that 1 % of the data falls below P1, 2% falls below P2,…, and 99 % falls below P99. 3.4.2 Deciles – are values that divide an ordered set of observations into 10 equal parts. These values denoted by D1, D2, …, D9, are such that 10 % of the data falls below D1, 20 % falls below D2, …, and 90 % falls below D9. 3.4.3 Quartiles – are values that divide an ordered set of observations into 4 equal parts. These values, denoted by Q1, Q2, and Q3, are such that 25 % of the data falls below Q1, 50 % falls below Q2, …, and 75 % falls below Q3. Procedure for the computation of the fractiles: 1. Arrange the data in an increasing order of magnitude. 2. Solve for the value of L, where           Quartilesfor mn Decilesfor mn sPercentilefor mn L ' 4 , 10 , 100 where: m is the location of the percentile, decile, or quartile n is the number of observations. 3. If L is an integer, the desired fractile is the average of the Lth and the (L + 1)th observations. If L is fractional, get the next higher integer to find the required location. The fractile corresponds to the value in that location. Remark 3.4: 1. Semi-Interquartile Range represents the distance on a scale between Q1 and Q3. 2. Quartile Deviation is the half of semi-interquartile range. 3.5 Skewness and Kurtosis Skewness is the degree of departure from symmetry of a distribution. Kurtosis is the degree of peakedness of distribution. 3.5.1 Symmetric Distribution (those where one side is the mirror image of the other) when presented graphically will show normal curves. They have a mean and a median that have the same value. If the distribution is symmetric and unimodal, the mode also has the same value as the mean and median (see Graph 1 in Figure 4.1). 3.5.2 Skewed Distribution – have different values for the mean, median, and mode. For unimodal skewed distributions, the mean is pulled toward the tail, and the median is between the mean and mode. Figure 4.1 Graphs of Different Type of Distribution
  • 12. D E S C R I P T I V E S T A T I S T I C S P a g e | 12 Remarks 3.4 1. A positively skewed distribution has “tail” which pulled in positive direction (see Graph 3 in Figure 4.1). 2. A negatively skewed distribution has “tail” which pulled in negative direction (see Graph 2 in Figure 4.1). 3. A symmetric distribution has zero skewness. 4. A normal distribution is a mesokurtic distribution. 5. A pure leptokurtic distribution has a higher peak than the normal distribution and has heavier tails. 6. A pure platykurtic distribution has a lower peak than a normal distribution and lighter tails. 3.5.3 Application of Measuring Skewness and Kurtosis One application is testing for normality: many statistics inferences require that a distribution be normal or nearly normal. A normal distribution has skewness and excess kurtosis of 0, so if your distribution is close to those values then it is probably close to normal. 3.5.4 Calculating Skewness The moment coefficient of skewness of a data set is skewness: . 3 2 3 1 m m g  where:   n xx m n i i   1 3 3 x̄ - is the mean and n is the sample size, as usual. m3 - is called the third moment of the data set. m2 - is the variance. Note: Remember that you have to choose one of two different measures of standard deviation, depending on whether you have data for the whole population or just a sample. The same is true of skewness. If you have the whole population, then g1 above is the measure of skewness. But if you have just a sample, you need the sample skewness:   11 2 1 g n nn G     3.5.5 Interpreting Skewness 1. If skewness is positive, the data are positively skewed or skewed right, meaning that the right tail of the distribution is longer than the left. 2. If skewness is negative, the data are negatively skewed or skewed left, meaning that the left tail is longer. 3. If skewness = 0, the data are perfectly symmetrical. 4. But a skewness of exactly zero is quite unlikely for real-world data, so how can you interpret the skewness number? Bulmer, M. G., Principles of Statistics (Dover,1979) — classically suggests this rule of thumb: a. If skewness is less than −1 or greater than +1, the distribution is highly skewed. b. If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately skewed. c. If skewness is between −½ and +½, the distribution is approximately symmetric. Inferring Your data set is just one sample drawn from a population. Maybe, from ordinary sample variability, your sample is skewed even though the population is symmetric. But if the sample is skewed too much for random chance to be the explanation, then you can conclude that there is skewness in the population. To answer that, you need to divide the sample skewness G1 by the standard error of skewness (SES) to get the test statistic, which measures how many standard errors separate the sample skewness from zero:
  • 13. D E S C R I P T I V E S T A T I S T I C S P a g e | 13 test statistic:      312 16 ,1 1    nnn nn SES SES G Z g The critical value of Zg1 is approximately 2. (This is a two-tailed test of skewness ≠ 0 at roughly the 0.05 significance level.)  If Zg1< −2, the population is very likely skewed negatively (though you don’t know by how much).  If Zg1 is between −2 and +2, you can’t reach any conclusion about the skewness of the population: it might be symmetric, or it might be skewed in either direction.  If Zg1 > 2, the population is very likely skewed positively (though you don’t know by how much).
  • 14. D E S C R I P T I V E S T A T I S T I C S P a g e | 14 CASE STUDIES: Case Study1 1. A study was conducted to see how well reading success in first grade could be predicted from various kinds of information obtained in kindergarten: age, sex, tribe, academic rank, and IQ. Which of the variables represents a a. nominal scale b. ordinal scale c. interval scale d. ratio scale 2. Are the following variables discrete or continuous? a. The number of correct answers on the true-false test. b. The duration of the effectiveness of a pain medication. c. The number of commercials aired daily by a television station. d. The weights of Sunday newspaper. e. The heights of basketball players. 2. Among 250 employees of the local office of an international insurance company, 182 are whites, 51 are blacks, and 17 are Orientals. If we use the stratified random sampling to select a committee of 15 employees, how many employees must we take from each class? 3. Suppose you were asked to make a study on the brand preferences and satisfaction of the customers of famous laundry soaps in four (4) different supermarkets. a. Arrange the letters of the following steps to statistical inquiry in a logical way. A. Collecting relevant information B. Defining a problem C. Interpreting the data D. Analyzing the data E. Organizing and presenting data b. Who will be the most appropriate respondents of the study? c. How will you apply multi-stage sampling to the population of the study? e. Calculate the sample size if the population size is 2000 and the margin of error is 5%. Case Study2 1. Create a textual presentation based from the table shown below. Suppose there are 800 million users per day. 2. Create tabular and (any) graphical presentations of the textual presentation as presented below. “The top three regions in terms of population count are Region IV-Southern Tagalog (11.32 million or 15.04% of the total), NCR (10.49 million or 13.93%), Region III – Central Luzon (7.80 million or 10.35%). The population residing in these regions combined comprises 39.32% of the total Filipino population. This means that four out of ten persons in the country reside in NCR and the adjoining regions of Central Luzon and Southern Tagalog.”
  • 15. D E S C R I P T I V E S T A T I S T I C S P a g e | 15 3. Using the table below Table 2.5 Number of Passengers for P&P Airlines 68 72 50 70 65 83 77 78 80 93 71 74 60 84 72 84 73 81 84 92 77 57 70 59 85 74 78 79 91 102 83 67 66 75 79 82 93 90 101 80 79 69 76 94 71 97 92 83 86 69 a. Construct a frequency distribution table (with the class interval, frequency, class boundaries, class marks and cumulative frequency) for the given data. b. Construct its bar graph, histogram, frequency polygon, and frequency ogive. c. Determine whether the given data set is normally distributed. 3. Given the frequency polygon below. a. Reconstruct the frequency distribution table. b. Construct the frequency histogram. c. Give the answers of the following: i. What is the lower class limit of the lowest class? ii. What is the lower class boundary of the highest class? iii. What is the class width? Case Study3 1. A random sample of 10 students was given a special test. The time in minutes it took the students to finish the exam were taken and are given as follows: Find the following: a) Mean b) Median c) Variance d) Standard Deviation e) Range f) Mode g) Coefficient of Variation h) 18th Percentile i) 7th Decile j) 3rd Quartlie FREQUENCY CLASS MARKS 6 10 12 14 21.2 22.9 24.6 26.3 28 29.7 31.1 34.8 36.5 0 15 30 26 40 35 19 22 28 17 38
  • 16. D E S C R I P T I V E S T A T I S T I C S P a g e | 16 2. Suppose that you are investigating the influence of interactive approach on the students’ mathematics performance. Consider the following samples of students’ final examination scores taken from three (3) sections of Math 1 enrolled during the first semester of SY 2011 – 2012. Sections Sample Scores Rizal 19 8 7 2 19 29 36 20 3 14 Bonifacio 14 25 12 32 13 17 10 22 13 32 Luna 24 13 20 1 8 28 16 21 23 26 a. Describe the performance of each section by their respective mean and standard deviation. b. Which of these 3 sections showed great improvements of the students’ performance in mathematics? Explain why? 3. Table shown below is the distribution of the responses of your respondents in the emotional intelligence inventory. Emotional Intelligence Inventory Indicators Almost Never Seldom Sometimes Usually Almost Always (1) (2) (3) (4) (5) 1. I appropriately communicate decisions to stakeholders. 11 9 15 5 9 2. I fail to recognize how my feelings drive my behavior at work. 18 2 10 12 8 3. When upset at work, I still think clearly. 5 6 15 14 8 4. I fail to handle stressful situations at work effectively. 10 12 8 14 6 5. I understand the things that make people feel optimistic at work. 18 2 13 7 10 6. I fail to keep calm in difficult situations at work. 21 12 8 9 0 7. I am effective in helping others feel positive at work. 1 4 16 19 10 8. I find it difficult to identify the things that motivate people at work. 15 12 5 8 5 1. Find the weighted mean of each statement. 2. Set-up a Likert scale with 5 intervals to interpret the results by assigning a descriptive equivalent such as “very low”, “low”, “average”, “high”, “very high”. 3. Find the weighted mean of each statement. 4. Find the standard deviation of each item. 5. Find the grand mean. 6. Interpret the results.