SlideShare a Scribd company logo
1 of 77
1
Descriptive Statistics
2-1 Overview
2-2 Summarizing Data with Frequency Tables
2-3 Pictures of Data
2-4 Measures of Center
2-5 Measures of Variation
2-6 Measures of Position
2-7 Exploratory Data Analysis (EDA)
2
 Descriptive Statistics
summarize or describe the important
characteristics of a known set of
population data
 Inferential Statistics
use sample data to make inferences (or
generalizations) about a population
2 -1 Overview
3
1. Center: A representative or average value that
indicates where the middle of the data set is located
2. Variation: A measure of the amount that the values
vary among themselves
3. Distribution: The nature or shape of the
distribution of data (such as bell-shaped, uniform, or
skewed)
4. Outliers: Sample values that lie very far away from
the vast majority of other sample values
5. Time: Changing characteristics of the data over
time
Important Characteristics of Data
4
 Frequency Table
lists classes (or categories) of values,
along with frequencies (or counts) of the
number of values that fall into each class
2-2 Summarizing Data With
Frequency Tables
5
Qwerty Keyboard Word Ratings
Table 2-1
2 2 5 1 2 6 3 3 4 2
4 0 5 7 7 5 6 6 8 10
7 2 2 10 5 8 2 5 4 2
6 2 6 1 7 2 7 2 3 8
1 5 2 5 2 14 2 2 6 3
1 7
6
Frequency Table of Qwerty Word Ratings
Table 2-3
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
7
Frequency Table
Definitions
8
Lower Class Limits
Lower Class
Limits
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
are the smallest numbers that can actually belong to
different classes
9
Upper Class Limits
Upper Class
Limits
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
are the largest numbers that can actually belong to
different classes
10
are the numbers used to separate classes, but
without the gaps created by class limits
Class Boundaries
11
number separating classes
Class Boundaries
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
- 0.5
2.5
5.5
8.5
11.5
14.5
12
Class Boundaries
Class
Boundaries
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
- 0.5
2.5
5.5
8.5
11.5
14.5
number separating classes
13
midpoints of the classes
Class Midpoints
14
midpoints of the classes
Class Midpoints
Class
Midpoints
0 - 1 2 20
3 - 4 5 14
6 - 7 8 15
9 - 10 11 2
12 - 13 14 1
Rating Frequency
15
is the difference between two consecutive lower
class limits or two consecutive class boundaries
Class Width
16
Class Width
Class Width
3 0 - 2 20
3 3 - 5 14
3 6 - 8 15
3 9 - 11 2
3 12 - 14 1
Rating Frequency
is the difference between two consecutive lower
class limits or two consecutive class boundaries
17
1. Be sure that the classes are mutually exclusive.
2. Include all classes, even if the frequency is zero.
3. Try to use the same width for all classes.
4. Select convenient numbers for class limits.
5. Use between 5 and 20 classes.
6. The sum of the class frequencies must equal the
number of original data values.
Guidelines For Frequency Tables
18
3. Select for the first lower limit either the lowest score or a
convenient value slightly less than the lowest score.
4. Add the class width to the starting point to get the second lower
class limit, add the width to the second lower limit to get the
third, and so on.
5. List the lower class limits in a vertical column and enter the
upper class limits.
6. Represent each score by a tally mark in the appropriate class.
Total tally marks to find the total frequency for each class.
Constructing A Frequency Table
1. Decide on the number of classes .
2. Determine the class width by dividing the range by the number
of classes (range = highest score - lowest score) and round up.
class width ≈ round up of
range
number of classes
19
20
Relative Frequency Table
relative frequency =
class frequency
sum of all frequencies
21
Relative Frequency Table
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
0 - 2 38.5%
3 - 5 26.9%
6 - 8 28.8%
9 - 11 3.8%
12 - 14 1.9%
Rating
Relative
Frequency
20/52 = 38.5%
14/52 = 26.9%
etc.
Total frequency = 52
22
Cumulative Frequency Table
Cumulative
Frequencies
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
Less than 3 20
Less than 6 34
Less than 9 49
Less than 12 51
Less than 15 52
Rating
Cumulative
Frequency
23
Frequency Tables
0 - 2 20
3 - 5 14
6 - 8 15
9 - 11 2
12 - 14 1
Rating Frequency
0 - 2 38.5%
3 - 5 26.9%
6 - 8 28.8%
9 - 11 3.8%
12 - 14 1.9%
Rating
Relative
Frequency
Less than 3 20
Less than 6 34
Less than 9 49
Less than 12 51
Less than 15 52
Rating
Cumulative
Frequency
24
a value at the
center or middle
of a data set
Measures of Center
25
Mean
(Arithmetic Mean)
AVERAGE
the number obtained by adding the
values and dividing the total by the
number of values
Definitions
26
Notation
Σ denotes the addition of a set of values
x is the variable usually used to represent the individual
data values
n represents the number of data values in a sample
N represents the number of data values in a population
27
Notation
is pronounced ‘x-bar’ and denotes the mean of a set
of sample values
x =
n
Σ x
x
28
Notation
µ is pronounced ‘mu’ and denotes the mean of all values
in a population
is pronounced ‘x-bar’ and denotes the mean of a set
of sample values
Calculators can calculate the mean of data
x =
n
Σ x
x
N
µ =
Σ x
29
Definitions
 Median
the middle value when the original
data values are arranged in order of
increasing (or decreasing) magnitude
30
Definitions
 Median
the middle value when the original
data values are arranged in order of
increasing (or decreasing) magnitude
 often denoted by x (pronounced ‘x-tilde’)
~
31
Definitions
 Median
the middle value when the original
data values are arranged in order of
increasing (or decreasing) magnitude
 often denoted by x (pronounced ‘x-tilde’)
 is not affected by an extreme value
~
32
6.72 3.46 3.60 6.44
3.46 3.60 6.44 6.72
no exact middle -- shared by two numbers
3.60 + 6.44
2
(even number of values)
MEDIAN is 5.02
33
6.72 3.46 3.60 6.44 26.70
3.46 3.60 6.44 6.72 26.70
(in order - odd number of values)
exact middle MEDIAN is 6.44
6.72 3.46 3.60 6.44
3.46 3.60 6.44 6.72
no exact middle -- shared by two numbers
3.60 + 6.44
2
(even number of values)
MEDIAN is 5.02
34
Definitions
 Mode
the score that occurs most frequently
Bimodal
Multimodal
No Mode
denoted by M
the only measure of central tendency that can be
used with nominal data
35
a. 5 5 5 3 1 5 1 4 3 5
b. 1 2 2 2 3 4 5 6 6 6 7 9
c. 1 2 3 6 7 8 9 10
Examples
Mode is 5
Bimodal - 2 and 6
No Mode
36
 Midrange
the value midway between the highest and
lowest values in the original data set
Definitions
37
 Midrange
the value midway between the highest and
lowest values in the original data set
Definitions
Midrange =
highest score + lowest score
2
38
 Symmetric
Data is symmetric if the left half of its
histogram is roughly a mirror of its
right half.
 Skewed
Data is skewed if it is not symmetric
and if it extends more to one side than
the other.
Definitions
39
Skewness
Mode = Mean = Median
SYMMETRIC
40
Skewness
Mode = Mean = Median
SKEWED LEFT
(negatively)
SYMMETRIC
Mean Mode
Median
41
Skewness
Mode = Mean = Median
SKEWED LEFT
(negatively)
SYMMETRIC
Mean Mode
Median
SKEWED RIGHT
(positively)
MeanMode
Median
42
Waiting Times of Bank Customers
at Different Banks
in minutes
Jefferson Valley Bank
Bank of Providence
6.5
4.2
6.6
5.4
6.7
5.8
6.8
6.2
7.1
6.7
7.3
7.7
7.4
7.7
7.7
8.5
7.7
9.3
7.7
10.0
43
Jefferson Valley Bank
Bank of Providence
6.5
4.2
6.6
5.4
6.7
5.8
6.8
6.2
7.1
6.7
7.3
7.7
7.4
7.7
7.7
8.5
7.7
9.3
7.7
10.0
Jefferson Valley Bank
7.15
7.20
7.7
7.10
Bank of Providence
7.15
7.20
7.7
7.10
Mean
Median
Mode
Midrange
Waiting Times of Bank Customers
at Different Banks
in minutes
44
Dotplots of Waiting Times
45
Measures of Variation
46
Measures of Variation
Range
value
highest lowest
value
47
a measure of variation of the scores
about the mean
(average deviation from the mean)
Measures of Variation
Standard Deviation
48
Sample Standard Deviation
Formula
calculators can compute the
sample standard deviation of data
Σ (x - x)2
n - 1
S =
49
Sample Standard Deviation
Shortcut Formula
n (n - 1)
s =
n (Σx2
) - (Σx)2
calculators can compute the
sample standard deviation of data
50
Σ x - x
Mean Absolute Deviation
Formula
n
51
Population Standard Deviation
calculators can compute the
population standard deviation
of data
2
Σ (x - µ)
N
σ =
52
Measures of Variation
Variance
standard deviation squared
53
Measures of Variation
Variance
standard deviation squared
s
σ
2
2
} use square key
on calculatorNotation
54
Sample
Variance
Population
Variance
Variance
Σ (x - x )2
n - 1
s
2
=
Σ (x - µ)2
N
σ 2
=
55
Estimation of Standard Deviation
Range Rule of Thumb
x - 2s x x + 2s
Range ≈ 4s
or
(minimum
usual value)
(maximum
usual value)
56
Estimation of Standard Deviation
Range Rule of Thumb
x - 2s x x + 2s
Range ≈ 4s
or
(minimum
usual value)
(maximum
usual value)
Range
4
s ≈ =
highest value - lowest value
4
57
x
The Empirical Rule
(applies to bell-shaped distributions)FIGURE 2-15
58
x - s x x + s
68% within
1 standard deviation
34% 34%
The Empirical Rule
(applies to bell-shaped distributions)FIGURE 2-15
59
x - 2s x - s x x + 2sx + s
68% within
1 standard deviation
34% 34%
95% within
2 standard deviations
The Empirical Rule
(applies to bell-shaped distributions)
13.5% 13.5%
FIGURE 2-15
60
x - 3s x - 2s x - s x x + 2s x + 3sx + s
68% within
1 standard deviation
34% 34%
95% within
2 standard deviations
99.7% of data are within 3 standard deviations of the mean
The Empirical Rule
(applies to bell-shaped distributions)
0.1% 0.1%
2.4% 2.4%
13.5% 13.5%
FIGURE 2-15
61
Chebyshev’s Theorem
 applies to distributions of any shape.
 the proportion (or fraction) of any set of data
lying within K standard deviations of the mean
is always at least 1 - 1/K
2
, where K is any
positive number greater than 1.
 at least 3/4 (75%) of all values lie within 2
standard deviations of the mean.
 at least 8/9 (89%) of all values lie within 3
standard deviations of the mean.
62
Measures of Variation Summary
For typical data sets, it is unusual for a
score to differ from the mean by more than
2 or 3 standard deviations.
63
 z Score (or standard score)
the number of standard deviations
that a given value x is above or below
the mean
Measures of Position
64
Sample
z = x - x
s
Population
z = x - µ
σ
Measures of Position
z score
65
- 3 - 2 - 1 0 1 2 3
Z
Unusual
Values
Unusual
Values
Ordinary
Values
Interpreting Z Scores
FIGURE 2-16
66
Measures of Position
Quartiles, Deciles,
Percentiles
67
Q1, Q2, Q3
divides ranked scores into four equal parts
Quartiles
25% 25% 25% 25%
Q3Q2Q1
(minimum) (maximum)
(median)
68
D1, D2, D3, D4, D5, D6, D7, D8, D9
divides ranked data into ten equal parts
Deciles
10% 10% 10% 10% 10% 10% 10% 10% 10% 10%
D1 D2 D3 D4 D5 D6 D7 D8 D9
69
99 Percentiles
Percentiles
70
Quartiles, Deciles, Percentiles
Fractiles
(Quantiles)
partitions data into approximately equal parts
71
Exploratory Data Analysis
the process of using statistical tools
(such as graphs, measures of center,
and measures of variation) to investigate
the data sets in order to understand their
important characteristics
72
Outliers
 a value located very far away from
almost all of the other values
 an extreme value
 can have a dramatic effect on the mean,
standard deviation, and on the scale of
the histogram so that the true nature of
the distribution is totally obscured
73
Boxplots
(Box-and-Whisker Diagram)
Reveals the:
 center of the data
 spread of the data
 distribution of the data
 presence of outliers
Excellent for comparing two
or more data sets
74
Boxplots
5 - number summary
 Minimum
 first quartile Q1
 Median (Q2)
 third quartile Q3
 Maximum
75
Boxplots
Boxplot of Qwerty Word Ratings
2 4 6 8 10 12 14
0
2 4 6
14
0
76
Bell-Shaped Skewed
Boxplots
Uniform
77
Exploring
 Measures of center: mean, median, and mode
 Measures of variation: Standard deviation and
range
 Measures of spread and relative location:
minimum values, maximum value, and quartiles
 Unusual values: outliers
 Distribution: histograms, stem-leaf plots, and
boxplots

More Related Content

What's hot

Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"muhammad raza
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersionSachin Shekde
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: EstimationParag Shah
 
2. measures of dis[persion
2. measures of dis[persion2. measures of dis[persion
2. measures of dis[persionKaran Kukreja
 
Skewness and Kurtosis
Skewness and KurtosisSkewness and Kurtosis
Skewness and KurtosisRohan Nagpal
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statisticsRabea Jamal
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersionsCapricorn
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysissristi1992
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendencymauitaylor007
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in StatisticsAzmi Mohd Tamil
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testShakehand with Life
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statisticsAjendra Sharma
 

What's hot (20)

Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Statistical inference: Estimation
Statistical inference: EstimationStatistical inference: Estimation
Statistical inference: Estimation
 
2. measures of dis[persion
2. measures of dis[persion2. measures of dis[persion
2. measures of dis[persion
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Skewness and Kurtosis
Skewness and KurtosisSkewness and Kurtosis
Skewness and Kurtosis
 
Central tendency
Central tendencyCentral tendency
Central tendency
 
Estimation in statistics
Estimation in statisticsEstimation in statistics
Estimation in statistics
 
Measures of dispersion
Measures of dispersionMeasures of dispersion
Measures of dispersion
 
Introduction to Statistics .pdf
Introduction to Statistics .pdfIntroduction to Statistics .pdf
Introduction to Statistics .pdf
 
statistic
statisticstatistic
statistic
 
Skewness
SkewnessSkewness
Skewness
 
Statistics-Measures of dispersions
Statistics-Measures of dispersionsStatistics-Measures of dispersions
Statistics-Measures of dispersions
 
Univariate & bivariate analysis
Univariate & bivariate analysisUnivariate & bivariate analysis
Univariate & bivariate analysis
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendency
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive Analysis in Statistics
Descriptive Analysis in StatisticsDescriptive Analysis in Statistics
Descriptive Analysis in Statistics
 
Hypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-testHypothesis testing; z test, t-test. f-test
Hypothesis testing; z test, t-test. f-test
 
Basic Descriptive statistics
Basic Descriptive statisticsBasic Descriptive statistics
Basic Descriptive statistics
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 

Similar to Descriptive Statistics Overview

Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsBabasab Patil
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyPrithwis Mukerjee
 
Group 3 measures of central tendency and variation - (mean, median, mode, ra...
Group 3  measures of central tendency and variation - (mean, median, mode, ra...Group 3  measures of central tendency and variation - (mean, median, mode, ra...
Group 3 measures of central tendency and variation - (mean, median, mode, ra...reymartyvette_0611
 
descriptive statistics- 1.pptx
descriptive statistics- 1.pptxdescriptive statistics- 1.pptx
descriptive statistics- 1.pptxSylvia517203
 
3. Descriptive statistics.pdf
3. Descriptive statistics.pdf3. Descriptive statistics.pdf
3. Descriptive statistics.pdfYomifDeksisaHerpa
 
Day2 session i&ii - spss
Day2 session i&ii - spssDay2 session i&ii - spss
Day2 session i&ii - spssabir hossain
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptxVanmala Buchke
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendencyHajrahMughal
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of dataUnsa Shakir
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviationAdebanji Ayeni
 
Measures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesMeasures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesScholarsPoint1
 
Central tendency and Variation or Dispersion
Central tendency and Variation or DispersionCentral tendency and Variation or Dispersion
Central tendency and Variation or DispersionJohny Kutty Joseph
 

Similar to Descriptive Statistics Overview (20)

Numerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec domsNumerical measures stat ppt @ bec doms
Numerical measures stat ppt @ bec doms
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central TendencyQT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
 
Group 3 measures of central tendency and variation - (mean, median, mode, ra...
Group 3  measures of central tendency and variation - (mean, median, mode, ra...Group 3  measures of central tendency and variation - (mean, median, mode, ra...
Group 3 measures of central tendency and variation - (mean, median, mode, ra...
 
descriptive statistics- 1.pptx
descriptive statistics- 1.pptxdescriptive statistics- 1.pptx
descriptive statistics- 1.pptx
 
3. Descriptive statistics.pdf
3. Descriptive statistics.pdf3. Descriptive statistics.pdf
3. Descriptive statistics.pdf
 
Median
MedianMedian
Median
 
G7-quantitative
G7-quantitativeG7-quantitative
G7-quantitative
 
8490370.ppt
8490370.ppt8490370.ppt
8490370.ppt
 
Day2 session i&ii - spss
Day2 session i&ii - spssDay2 session i&ii - spss
Day2 session i&ii - spss
 
central tendency.pptx
central tendency.pptxcentral tendency.pptx
central tendency.pptx
 
Measures of Dispersion.pptx
Measures of Dispersion.pptxMeasures of Dispersion.pptx
Measures of Dispersion.pptx
 
Mod mean quartile
Mod mean quartileMod mean quartile
Mod mean quartile
 
Descriptive statistics and graphs
Descriptive statistics and graphsDescriptive statistics and graphs
Descriptive statistics and graphs
 
Measure of central tendency
Measure of central tendencyMeasure of central tendency
Measure of central tendency
 
analytical representation of data
 analytical representation of data analytical representation of data
analytical representation of data
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviation
 
Measures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and ShapesMeasures of Central Tendency, Variability and Shapes
Measures of Central Tendency, Variability and Shapes
 
Central tendency and Variation or Dispersion
Central tendency and Variation or DispersionCentral tendency and Variation or Dispersion
Central tendency and Variation or Dispersion
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Descriptive Statistics Overview

  • 1. 1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Position 2-7 Exploratory Data Analysis (EDA)
  • 2. 2  Descriptive Statistics summarize or describe the important characteristics of a known set of population data  Inferential Statistics use sample data to make inferences (or generalizations) about a population 2 -1 Overview
  • 3. 3 1. Center: A representative or average value that indicates where the middle of the data set is located 2. Variation: A measure of the amount that the values vary among themselves 3. Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed) 4. Outliers: Sample values that lie very far away from the vast majority of other sample values 5. Time: Changing characteristics of the data over time Important Characteristics of Data
  • 4. 4  Frequency Table lists classes (or categories) of values, along with frequencies (or counts) of the number of values that fall into each class 2-2 Summarizing Data With Frequency Tables
  • 5. 5 Qwerty Keyboard Word Ratings Table 2-1 2 2 5 1 2 6 3 3 4 2 4 0 5 7 7 5 6 6 8 10 7 2 2 10 5 8 2 5 4 2 6 2 6 1 7 2 7 2 3 8 1 5 2 5 2 14 2 2 6 3 1 7
  • 6. 6 Frequency Table of Qwerty Word Ratings Table 2-3 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency
  • 8. 8 Lower Class Limits Lower Class Limits 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency are the smallest numbers that can actually belong to different classes
  • 9. 9 Upper Class Limits Upper Class Limits 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency are the largest numbers that can actually belong to different classes
  • 10. 10 are the numbers used to separate classes, but without the gaps created by class limits Class Boundaries
  • 11. 11 number separating classes Class Boundaries 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency - 0.5 2.5 5.5 8.5 11.5 14.5
  • 12. 12 Class Boundaries Class Boundaries 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency - 0.5 2.5 5.5 8.5 11.5 14.5 number separating classes
  • 13. 13 midpoints of the classes Class Midpoints
  • 14. 14 midpoints of the classes Class Midpoints Class Midpoints 0 - 1 2 20 3 - 4 5 14 6 - 7 8 15 9 - 10 11 2 12 - 13 14 1 Rating Frequency
  • 15. 15 is the difference between two consecutive lower class limits or two consecutive class boundaries Class Width
  • 16. 16 Class Width Class Width 3 0 - 2 20 3 3 - 5 14 3 6 - 8 15 3 9 - 11 2 3 12 - 14 1 Rating Frequency is the difference between two consecutive lower class limits or two consecutive class boundaries
  • 17. 17 1. Be sure that the classes are mutually exclusive. 2. Include all classes, even if the frequency is zero. 3. Try to use the same width for all classes. 4. Select convenient numbers for class limits. 5. Use between 5 and 20 classes. 6. The sum of the class frequencies must equal the number of original data values. Guidelines For Frequency Tables
  • 18. 18 3. Select for the first lower limit either the lowest score or a convenient value slightly less than the lowest score. 4. Add the class width to the starting point to get the second lower class limit, add the width to the second lower limit to get the third, and so on. 5. List the lower class limits in a vertical column and enter the upper class limits. 6. Represent each score by a tally mark in the appropriate class. Total tally marks to find the total frequency for each class. Constructing A Frequency Table 1. Decide on the number of classes . 2. Determine the class width by dividing the range by the number of classes (range = highest score - lowest score) and round up. class width ≈ round up of range number of classes
  • 19. 19
  • 20. 20 Relative Frequency Table relative frequency = class frequency sum of all frequencies
  • 21. 21 Relative Frequency Table 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency 0 - 2 38.5% 3 - 5 26.9% 6 - 8 28.8% 9 - 11 3.8% 12 - 14 1.9% Rating Relative Frequency 20/52 = 38.5% 14/52 = 26.9% etc. Total frequency = 52
  • 22. 22 Cumulative Frequency Table Cumulative Frequencies 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency Less than 3 20 Less than 6 34 Less than 9 49 Less than 12 51 Less than 15 52 Rating Cumulative Frequency
  • 23. 23 Frequency Tables 0 - 2 20 3 - 5 14 6 - 8 15 9 - 11 2 12 - 14 1 Rating Frequency 0 - 2 38.5% 3 - 5 26.9% 6 - 8 28.8% 9 - 11 3.8% 12 - 14 1.9% Rating Relative Frequency Less than 3 20 Less than 6 34 Less than 9 49 Less than 12 51 Less than 15 52 Rating Cumulative Frequency
  • 24. 24 a value at the center or middle of a data set Measures of Center
  • 25. 25 Mean (Arithmetic Mean) AVERAGE the number obtained by adding the values and dividing the total by the number of values Definitions
  • 26. 26 Notation Σ denotes the addition of a set of values x is the variable usually used to represent the individual data values n represents the number of data values in a sample N represents the number of data values in a population
  • 27. 27 Notation is pronounced ‘x-bar’ and denotes the mean of a set of sample values x = n Σ x x
  • 28. 28 Notation µ is pronounced ‘mu’ and denotes the mean of all values in a population is pronounced ‘x-bar’ and denotes the mean of a set of sample values Calculators can calculate the mean of data x = n Σ x x N µ = Σ x
  • 29. 29 Definitions  Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude
  • 30. 30 Definitions  Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude  often denoted by x (pronounced ‘x-tilde’) ~
  • 31. 31 Definitions  Median the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude  often denoted by x (pronounced ‘x-tilde’)  is not affected by an extreme value ~
  • 32. 32 6.72 3.46 3.60 6.44 3.46 3.60 6.44 6.72 no exact middle -- shared by two numbers 3.60 + 6.44 2 (even number of values) MEDIAN is 5.02
  • 33. 33 6.72 3.46 3.60 6.44 26.70 3.46 3.60 6.44 6.72 26.70 (in order - odd number of values) exact middle MEDIAN is 6.44 6.72 3.46 3.60 6.44 3.46 3.60 6.44 6.72 no exact middle -- shared by two numbers 3.60 + 6.44 2 (even number of values) MEDIAN is 5.02
  • 34. 34 Definitions  Mode the score that occurs most frequently Bimodal Multimodal No Mode denoted by M the only measure of central tendency that can be used with nominal data
  • 35. 35 a. 5 5 5 3 1 5 1 4 3 5 b. 1 2 2 2 3 4 5 6 6 6 7 9 c. 1 2 3 6 7 8 9 10 Examples Mode is 5 Bimodal - 2 and 6 No Mode
  • 36. 36  Midrange the value midway between the highest and lowest values in the original data set Definitions
  • 37. 37  Midrange the value midway between the highest and lowest values in the original data set Definitions Midrange = highest score + lowest score 2
  • 38. 38  Symmetric Data is symmetric if the left half of its histogram is roughly a mirror of its right half.  Skewed Data is skewed if it is not symmetric and if it extends more to one side than the other. Definitions
  • 39. 39 Skewness Mode = Mean = Median SYMMETRIC
  • 40. 40 Skewness Mode = Mean = Median SKEWED LEFT (negatively) SYMMETRIC Mean Mode Median
  • 41. 41 Skewness Mode = Mean = Median SKEWED LEFT (negatively) SYMMETRIC Mean Mode Median SKEWED RIGHT (positively) MeanMode Median
  • 42. 42 Waiting Times of Bank Customers at Different Banks in minutes Jefferson Valley Bank Bank of Providence 6.5 4.2 6.6 5.4 6.7 5.8 6.8 6.2 7.1 6.7 7.3 7.7 7.4 7.7 7.7 8.5 7.7 9.3 7.7 10.0
  • 43. 43 Jefferson Valley Bank Bank of Providence 6.5 4.2 6.6 5.4 6.7 5.8 6.8 6.2 7.1 6.7 7.3 7.7 7.4 7.7 7.7 8.5 7.7 9.3 7.7 10.0 Jefferson Valley Bank 7.15 7.20 7.7 7.10 Bank of Providence 7.15 7.20 7.7 7.10 Mean Median Mode Midrange Waiting Times of Bank Customers at Different Banks in minutes
  • 47. 47 a measure of variation of the scores about the mean (average deviation from the mean) Measures of Variation Standard Deviation
  • 48. 48 Sample Standard Deviation Formula calculators can compute the sample standard deviation of data Σ (x - x)2 n - 1 S =
  • 49. 49 Sample Standard Deviation Shortcut Formula n (n - 1) s = n (Σx2 ) - (Σx)2 calculators can compute the sample standard deviation of data
  • 50. 50 Σ x - x Mean Absolute Deviation Formula n
  • 51. 51 Population Standard Deviation calculators can compute the population standard deviation of data 2 Σ (x - µ) N σ =
  • 53. 53 Measures of Variation Variance standard deviation squared s σ 2 2 } use square key on calculatorNotation
  • 54. 54 Sample Variance Population Variance Variance Σ (x - x )2 n - 1 s 2 = Σ (x - µ)2 N σ 2 =
  • 55. 55 Estimation of Standard Deviation Range Rule of Thumb x - 2s x x + 2s Range ≈ 4s or (minimum usual value) (maximum usual value)
  • 56. 56 Estimation of Standard Deviation Range Rule of Thumb x - 2s x x + 2s Range ≈ 4s or (minimum usual value) (maximum usual value) Range 4 s ≈ = highest value - lowest value 4
  • 57. 57 x The Empirical Rule (applies to bell-shaped distributions)FIGURE 2-15
  • 58. 58 x - s x x + s 68% within 1 standard deviation 34% 34% The Empirical Rule (applies to bell-shaped distributions)FIGURE 2-15
  • 59. 59 x - 2s x - s x x + 2sx + s 68% within 1 standard deviation 34% 34% 95% within 2 standard deviations The Empirical Rule (applies to bell-shaped distributions) 13.5% 13.5% FIGURE 2-15
  • 60. 60 x - 3s x - 2s x - s x x + 2s x + 3sx + s 68% within 1 standard deviation 34% 34% 95% within 2 standard deviations 99.7% of data are within 3 standard deviations of the mean The Empirical Rule (applies to bell-shaped distributions) 0.1% 0.1% 2.4% 2.4% 13.5% 13.5% FIGURE 2-15
  • 61. 61 Chebyshev’s Theorem  applies to distributions of any shape.  the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1 - 1/K 2 , where K is any positive number greater than 1.  at least 3/4 (75%) of all values lie within 2 standard deviations of the mean.  at least 8/9 (89%) of all values lie within 3 standard deviations of the mean.
  • 62. 62 Measures of Variation Summary For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.
  • 63. 63  z Score (or standard score) the number of standard deviations that a given value x is above or below the mean Measures of Position
  • 64. 64 Sample z = x - x s Population z = x - µ σ Measures of Position z score
  • 65. 65 - 3 - 2 - 1 0 1 2 3 Z Unusual Values Unusual Values Ordinary Values Interpreting Z Scores FIGURE 2-16
  • 66. 66 Measures of Position Quartiles, Deciles, Percentiles
  • 67. 67 Q1, Q2, Q3 divides ranked scores into four equal parts Quartiles 25% 25% 25% 25% Q3Q2Q1 (minimum) (maximum) (median)
  • 68. 68 D1, D2, D3, D4, D5, D6, D7, D8, D9 divides ranked data into ten equal parts Deciles 10% 10% 10% 10% 10% 10% 10% 10% 10% 10% D1 D2 D3 D4 D5 D6 D7 D8 D9
  • 71. 71 Exploratory Data Analysis the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate the data sets in order to understand their important characteristics
  • 72. 72 Outliers  a value located very far away from almost all of the other values  an extreme value  can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured
  • 73. 73 Boxplots (Box-and-Whisker Diagram) Reveals the:  center of the data  spread of the data  distribution of the data  presence of outliers Excellent for comparing two or more data sets
  • 74. 74 Boxplots 5 - number summary  Minimum  first quartile Q1  Median (Q2)  third quartile Q3  Maximum
  • 75. 75 Boxplots Boxplot of Qwerty Word Ratings 2 4 6 8 10 12 14 0 2 4 6 14 0
  • 77. 77 Exploring  Measures of center: mean, median, and mode  Measures of variation: Standard deviation and range  Measures of spread and relative location: minimum values, maximum value, and quartiles  Unusual values: outliers  Distribution: histograms, stem-leaf plots, and boxplots

Editor's Notes

  1. This chapter will focus on descriptive statistics. Using descriptive statistics and the information from Chapter 3, 4, and 5, we will begin the study of inferential statistics.
  2. Most important characteristics necessary to describe, explore, and compare data sets. page 34 of text
  3. page 35 of text
  4. Discussion of what these rating values represent is found on page 33 (the Chapter Problem for Chapter 2).
  5. Final result of a frequency table on the Qwerty data given in previous slide. Next few slides will develop definitions and steps for developing the frequency table.
  6. The concept is usually difficult for some students. Emphasize that boundaries are often used in graphical representations of data. See Section 2-3.
  7. Students will question where the -0.5 and the 14.5 came from. After establishing the boundaries between the existing classes, explain that one should find the boundary below the first class and above the last class, again referencing the use of boundaries in the development of some histograms (Section 2-3).
  8. Being able to identify the midpoints of each class will be important when determining the mean and standard deviation of a frequency table.
  9. Class widths ideally should be the same between each class. However, open-ended first or last classes (e.g., 65 years or older) are sometimes necessary to keep from have a large number of classes with a frequency of 0.
  10. page 36 of text #1: Classes should not overlap in order to determine which class a value belongs to. #2: A common mistake of students developing frequency tables. #3: Open-ended classes are sometimes necessary #4: Round up to use fewer decimal places or use relevant numbers #5: Less than 5, the table will be too concise; more than 20, the table will be too large or unwieldy. #6: One test to make sure all data has been included.
  11. This is the same as the flow chart on the next slide.
  12. page 37 of text
  13. page 38 of text Emphasis on the relative frequency table will assist in development the concept of probability distributions - the use of percentages with classes will relate to the use of probabilities with random variables.
  14. page 55 of text
  15. In this text, the arithmetic mean will be referred to as the MEAN.
  16.  is pronounced ‘sigma’ (Greek upper case ‘S’) indicates ‘summation’ of values
  17. The mean is sensitive to every value , so one exceptional value can affect the mean dramatically. The median (a couple of slides later) overcomes this disadvantage.
  18. The mean is sensitive to every value , so one exceptional value can affect the mean dramatically. The median (next slide) overcomes that disadvantage.
  19. Examples on page 57 of text
  20. Examples on page 57 of text
  21. page 58 of text
  22. page 58 of text Name the two values if the set is bimodal
  23. Midrange is seldom used because it is too sensitive to extremes. (See Figure 2-12, page 59)
  24. Midrange is seldom used because it is too sensitive to extremes. (See Figure 2-12, page 59)
  25. Data skewed to the left is said to be ‘negatively skewed’ with the mean and median to the left of the mode. Data skewed to the right is said to be ‘positively skewed’ with the mean and media to the right of the mode.
  26. Data not ‘lopsided’.
  27. Data lopsided to left (or slants down to the left - definition of skew is ‘slanting’)
  28. Data lopsided to the right (or slants down to the right)
  29. Even though the measures of center are all the same, it is obvious from the dotplots of each group of data that there are some differences in the ‘spread’ (or variation) of the data. page 69 of text
  30. The range is not a very useful measure of variation as it only uses two values of the data. Other measures of variation (to follow) will more more useful as they are computed by using every data value.
  31. Ask students to explain what this definition indicates.
  32. The definition indicates that one should find the average distance each score is from the mean. The use of n-1 in the denominator is necessary because there are only n-1 independent values - that is, only n-1 values can be assigned any number before the nth value is determined. page 70 of text
  33. This formula is easier to use if computing the standard deviation ‘by hand’ as it does not rely on the use of the mean. Only three values are needed - n,  x, and  x 2 .
  34. page 71 of text
  35.  is the lowercase Greek ‘sigma’. Note the division by N(population size), rather than n-1 Most data is a sample, rather than a population; therefore, this formula is not use very often.
  36. page 74 of text
  37. After computing the standard deviation, square the resulting number using a calculator.
  38. The formulas for variance are the same as for standard deviation without the square root - remind student, squaring a square root will result in the radicand.
  39. Reminder: range is the highest score minus the lowest score
  40. Reminder: range is the highest score minus the lowest score
  41. page 79 of text
  42. Some student have difficulty understand the idea of ‘within one standard deviation of the mean’. Emphasize that this means the interval from one standard deviation below the mean to one standard deviation above the mean.
  43. These percentages will be verified by the concepts learned in Chapter 5. Emphasize the Empirical Rule is appropriate for data that is in a BELL-SHAPED distribution.
  44. Emphasize Chebyshev’s Theorem applies to data that is in a distribution of any shape - that is, it is less prescriptive than the Empirical Rule. page 80 of text
  45. This idea will be revisited throughout the study of Elementary Statistics.
  46. page 85 of text
  47. This concept of ‘unusual’ values will be revisited several times during the course, especially in Chapter 7 - Hypothesis Testing. page 86 of text
  48. These 5 numbers will be used in this text’s method for depicting boxplots.
  49. page 94 of text
  50. Outliers can have a dramatic effect on the certain descriptive statistics (mean and standard deviation). It is important to be aware of such data values.
  51. page 96 of text
  52. Medians and quartiles are not very sensitive to extreme values. Refer to Section 2-6 for finding the Q1 and Q3 values, and Section 2-4 for finding the median.
  53. It appears that the Qwerty data set has a skewed right distribution page 97 of text
  54. page 97 of text