3. Dealing with Uncertainty
The price of L&T stock will be higher
in six months than it is now.
versus
The price of L&T stock is likely
to be higher in six months than it
is now.
4. Dealing with Uncertainty
If the union budget deficit is as high as
predicted, interest rates will remain high
for the rest of the year.
versus
If the union budget deficit is as high
as predicted, it is probable that
interest rates will remain high for
the rest of the year.
5. Statistical Thinking
Statistical thinking is a philosophy of learning
and action based on the following fundamental
principles:
All work occurs in a system of interconnected
processes;
Variation exists in all processes, and
Understanding and reducing variation are the
keys to success.
6. Statistical Thinking
Systems and Processes
A system is a number of components that
are logically and sometimes physically
linked together for some purpose.
7. Statistical Thinking
Systems and Processes
A process is a set of activities operating on a system
that transforms inputs to outputs. A business process is
groups of logically related tasks and activities, that
when performed utilizes the resources of the business
to provide definitive results required to achieve the
business objectives.
8. Making Decisions
Data, Information, Knowledge
q Data: specific observations of measured numbers.
q Information: processed and summarized data
yielding facts and ideas.
q Knowledge: selected and organized information
that provides understanding, recommendations, and
the basis for decisions.
9. Making Decisions
Descriptive and Inferential Statistics
Descriptive Statistics include graphical and
numerical procedures that summarize and
process data and are used to transform data
into information.
10. Making Decisions
Descriptive and Inferential Statistics
Inferential Statistics provide the bases for
predictions, forecasts, and estimates that are
used to transform information to knowledge.
11. The Journey to Making Decisions
Decision
Knowledge
Experience, Theory,
Literature, Inferential
Statistics, Computers
Information
Descriptive Statistics,
Probability, Computers
Begin Here:
Data
Identify the
Problem
15. Classification of Variables
Discrete Numerical Variable
A variable that produces a response that
comes from a counting process.
16. Classification of Variables
Continuous Numerical Variable
A variable that produces a response that is
the outcome of a measurement process.
17. Classification of Variables
Categorical Variables
Variables that produce responses that
belong to groups (sometimes called
“classes”) or categories.
18. Measurement Levels
Nominal and Ordinal Levels of Measurement
refer to data obtained from categorical
questions.
• A nominal scale indicates assignments to
groups or classes.
• Ordinal data indicate rank ordering of items.
19. Frequency Distributions
A frequency distribution is a table used to organize data.
The left column (called classes or groups) includes
numerical intervals on a variable being studied. The
right column is a list of the frequencies, or number of
observations, for each class. Intervals are normally of
equal size, must cover the range of the sample
observations, and be non-overlapping.
20. Construction of a Frequency
Distribution
Rule 1: Intervals (classes) must be inclusive and non-
overlapping;
Rule 2: Determine k, the number of classes;
Rule 3: Intervals should be the same width, w; the width
is determined by the following:
(Largest Number - Smallest Number)
w = Interval Width =
Number of Intervals
Both k and w should be rounded upward, possibly to the next largest integer.
21. Construction of a Frequency
Distribution
Quick Guide to Number of Classes for a Frequency Distribution
Sample Size Number of Classes
Fewer than 50 5 – 6 classes
50 to 100 6 – 8 classes
over 100 8 – 10 classes
22. Example of a Frequency Distribution
A Frequency Distribution for the Suntan Lotion Example
Weights (in mL) Number of Bottles
220 less than 225 1
225 less than 230 4
230 less than 235 29
235 less than 240 34
240 less than 245 26
245 less than 250 6
23. Cumulative Frequency
Distributions
A cumulative frequency distribution contains the
number of observations whose values are less than the
upper limit of each interval. It is constructed by
adding the frequencies of all frequency distribution
intervals up to and including the present interval.
24. Relative Cumulative Frequency
Distributions
A relative cumulative frequency distribution
converts all cumulative frequencies to
cumulative percentages
25. Example of a Frequency Distribution
A Cumulative Frequency Distribution for the Sun tan Lotion
Example
Weights (in mL) Number of Bottles
less than 225 1
less than 230 5
less than 235 34
less than 240 68
less than 245 94
less than 250 100
26. Histograms and Ogives
A histogram is a bar graph that consists of vertical bars
constructed on a horizontal line that is marked off with
intervals for the variable being displayed. The
intervals correspond to those in a frequency
distribution table. The height of each bar is
proportional to the number of observations in that
interval.
27. Histograms and Ogives
An ogive, sometimes called a cumulative line graph, is
a line that connects points that are the cumulative
percentage of observations below the upper limit of
each class in a cumulative frequency distribution.
28. Histogram and Ogive for Example 1
Histogram of Weights
40 100
35 90
80
30
70
Frequency
25 60
20 50
15 40
30
10
20
5 10
0 0
224.5 229.5 234.5 239.5 244.5 249.5
Interval Weights (mL)
29. Stem-and-Leaf Display
A stem-and-leaf display is an exploratory data analysis
graph that is an alternative to the histogram. Data are
grouped according to their leading digits (called the stem)
while listing the final digits (called leaves) separately for
each member of a class. The leaves are displayed
individually in ascending order after each of the stems.
31. Tables
- Bar and Pie Charts -
Frequency and Relative Frequency Distribution for
Top Company Employers Example
Number of
Industry Employees Percent
Tourism 85,287 0.35
Retail 49,424 0.2
Health Care 39,588 0.16
Restaurants 16,050 0.06
Communications 11,750 0.05
Technology 11,144 0.05
Space 11,418 0.05
Other 21,336 0.08
32. Tables
- Bar and Pie Charts -
Bar Chart for Top Company Employers Example
1999 Top Company Employers in Central Florida
0.35
0.2
0.16
0.06 0.08
0.05 0.05 0.05
e
gy
e
il
ism
er
s
ns
ta
ar
nt
ac
th
lo
t io
Re
C
ra
ur
Sp
O
no
ica
au
th
To
ch
al
st
un
Te
He
Re
m
m
Co
Industry Category
33. Tables
- Bar and Pie Charts -
Pie Chart for Top Company Employers Example
1999 Top Company Employers in Central Florida
Others
29% Tourism
35%
Health Care
16% Retail
20%
34. Pareto Diagrams
A Pareto diagram is a bar chart that displays the
frequency of defect causes. The bar at the left indicates
the most frequent cause and bars to the right indicate
causes in decreasing frequency. A Pareto diagram is use
to separate the “vital few” from the “trivial many.”
few many.
35. Line Charts
A line chart, also called a time plot, is a series of data plotted
at various time intervals. Measuring time along the horizontal
axis and the numerical quantity of interest along the vertical
axis yields a point on the graph for each observation. Joining
points adjacent in time by straight lines produces a time plot.
36. Line Charts
Growth Trends in Internet Use by Age
1997 to 1999
35
Millions of Adults
31.3 32.7
30
25 26.3
20 20.2 18.5
15 16.5 15.8 17.2
13.8 13 14.2
10 9.8 11.4
7.5
5 5
0 Age 18 to 29
Age 30 to 49
98
99
9
O 7
O 8
7
8
9
7
8
l-9
l-9
l-9
r- 9
r- 9
r- 9
-9
-9
n-
n-
ct
ct
Ju
Ju
Ju
Age 50+
Ap
Ap
Ap
Ja
Ja
April 1997 to July 1999
37. Parameters and Statistics
A statistic is a descriptive measure computed from a
sample of data. A parameter is a descriptive
measure computed from an entire population of
data.
38. Measures of Central Tendency
- Arithmetic Mean -
A arithmetic mean is of a set of data is the
sum of the data values divided by the
number of observations.
39. Sample Mean
If the data set is from a sample, then the sample
n
mean, X , is:
∑x i
x1 + x2 + + xn
X= i =1
=
n n
40. Population Mean
If the data set is from a population, then the
population mean, µ , is:
N
∑x
x1 + x2 + + xn
i
µ= =i =1
N N
41. Measures of Central Tendency
- Median -
An ordered array is an arrangement of data in either
ascending or descending order. Once the data are
arranged in ascending order, the median is the value such
that 50% of the observations are smaller and 50% of the
observations are larger.
If the sample size n is an odd number, the median,
Xm, is the middle observation. If the sample size n
is an even number, the median, Xm, is the average
median
of the two middle observations. The median will
be located in the 0.50(n+1)th ordered position.
position
42. Measures of Central Tendency
- Mode -
The mode, if one exists, is the most
frequently occurring observation in the
sample or population.
43. Shape of the Distribution
The shape of the distribution is said to be
symmetric if the observations are balanced,
or evenly distributed, about the mean. In a
symmetric distribution the mean and median
are equal.
44. Shape of the Distribution
A distribution is skewed if the observations are not
symmetrically distributed above and below the mean.
A positively skewed (or skewed to the right)
distribution has a tail that extends to the right in the
direction of positive values. A negatively skewed (or
skewed to the left) distribution has a tail that extends
to the left in the direction of negative values.
45. Shapes of the Distribution
Symmetric Distribution
10
9
8
7
Frequency
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9
Positively Skewed Distribution Negatively Skewed Distribution
12 12
10 10
8 8
Frequency
Frequency
6 6
4 4
2 2
0 0
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
46. Measures of Central Tendency
- Geometric Mean -
The Geometric Mean is the nth root of the product of n
numbers:
X g = n ( x1 • x2 • • xn ) = ( x1 • x2 • • xn )1/ n
The Geometric Mean is used to obtain mean growth over
several periods given compounded growth from each
period.
47. Measures of Variability
- The Range -
The range is in a set of data is the
difference between the largest and
smallest observations
48. Measures of Variability
- Sample Variance -
The sample variance, s2, is the sum of the squared
differences between each observation and the sample
mean divided by the sample size minus 1.
n
∑ (x − X )
i
2
s2 = i =1
n −1
49. Measures of Variability
- Short-cut Formulas for Sample
Variance -
Short-cut formulas for the sample variance are:
n (∑ xi ) 2
∑ xi − n ∑ xi2 − nX 2
s 2 = i =1 or s2 =
n −1 n −1
50. Measures of Variability
- Population Variance -
The population variance, σ2, is the sum of the squared
differences between each observation and the population
mean divided by the population size, N.
N
∑ (x − µ)
i
2
σ2 = i =1
N
51. Measures of Variability
- Sample Standard Deviation -
The sample standard deviation, s, is the positive square
root of the variance, and is defined as:
n
∑ (x − X )
i
2
s= s = 2 i =1
n −1
52. Measures of Variability
- Population Standard Deviation-
The population standard deviation, σ, is
N
∑ (x − µ)
i
2
σ= σ = 2 i =1
N
53. The Empirical Rule
(the 68%, 95%, or almost all rule)
For a set of data with a mound-shaped histogram, the Empirical
Rule is:
• approximately 68% of the observations are contained with a
distance of one standard deviation around the mean; µ± 1σ
• approximately 95% of the observations are contained with a
distance of two standard deviations around the mean; µ± 2σ
• almost all of the observations are contained with a distance
of three standard deviation around the mean; µ± 3σ
54. Coefficient of Variation
The Coefficient of Variation, CV, is a measure of relative
dispersion that expresses the standard deviation as a
percentage of the mean (provided the mean is positive).
The sample coefficient of variation is
s
CV = × 100 if X > 0
X
The population coefficient of variation is
σ
CV = ×100 if µ > 0
µ
55. Percentiles and Quartiles
Data must first be in ascending order. Percentiles
separate large ordered data sets into 100ths. The Pth
percentile is a number such that P percent of all the
observations are at or below that number.
Quartiles are descriptive measures that separate large
ordered data sets into four quarters.
56. Percentiles and Quartiles
The first quartile, Q1, is another name for the 25th
percentile. The first quartile divides the ordered data
percentile
such that 25% of the observations are at or below this
value. Q1 is located in the .25(n+1)st position when
the data is in ascending order. That is,
(n + 1)
Q1 = ordered position
4
57. Percentiles and Quartiles
The third quartile, Q3, is another name for the 75th
percentile. The first quartile divides the ordered
percentile
data such that 75% of the observations are at or
below this value. Q3 is located in the .75(n+1)st
position when the data is in ascending order. That
is,
3(n + 1)
Q3 = ordered position
4
58. Interquartile Range
The Interquartile Range (IQR) measures the spread
in the middle 50% of the data; that is the difference
between the observations at the 25th and the 75th
percentiles:
IQR = Q3 − Q1
59. Five-Number Summary
The Five-Number Summary refers to the five
descriptive measures: minimum, first quartile,
median, third quartile, and the maximum.
X min imum < Q1 < Median < Q3 < X max imum
60. Box-and-Whisker Plots
A Box-and-Whisker Plot is a graphical procedure that
uses the Five-Number summary.
A Box-and-Whisker Plot consists of
• an inner box that shows the numbers which span the
range from Q1 Box-and-Whisker Plot to Q3.
•a line drawn through the box at the median.
The “whiskers” are lines drawn from Q1 to the minimum
vale, and from Q3 to the maximum value.
62. Grouped Data Mean
For a population of N observations the mean is
K
∑fm i i
µ= i =1
N
For a sample of n observations, the mean is
K
∑fm i i
X= i =1
n
Where the data set contains observation values m1, m2, . . ., mk occurring with
frequencies f1, f2, . . . fK respectively
63. Grouped Data Variance
For a population of N observations the variance is
K K
∑f i (mi −µ) 2
∑ f i m i2
σ2 = i=1
= i=1
−µ2
N N
For a sample of n observations, the variance is
K K
∑ f i (mi − X ) 2 ∑ f i m i2 − nX 2
s2 = i =1
= i =1
n −1 n −1
Where the data set contains observation values m1, m2, . . ., mk occurring with
frequencies f1, f2, . . . fK respectively