SlideShare una empresa de Scribd logo
1 de 87
Week 11: Basic Descriptive
Quantitative Data Analysis
Tables, Graphs, & Summary Statistics
1
Objectives
 Learn about basic descriptive quantitative analysis
 How to perform these tasks in Excel
 Starting point for 502B
 Excel knowledge and quantitative skills are highly desired by
Employers
 EC stream
2
Introduction
3
 Without data, it is anyone’s opinion
 Why use tables, graphs, summary stats?
“At their best, tables, graphs, and statistics are instruments
for reasoning about complex quantitative information.”
 Why learn how to design them appropriately?
“At their worst, tables, graphs and summary statistics are
instruments of evil used for deceiving a naive viewer.”
 Does your mindset match my dataset!
 http://www.ted.com/talks/hans_rosling_at_state.html
Quantitative Research Process
Page 4
Introduction
Page 5
Page 6
Presenting the Data
Frequency Distribution
Page 7
 A convenient way of summarizing a lot of tabular data
 What is a Frequency Distribution?
 A frequency distribution is a list or a table …
 containing class groupings (categories or ranges within
which the data fall) ...
 and the corresponding frequencies with which data fall
within each class or category
 For nominal/ordinal data
Introduction
Page 8
Page 9
Table 1
Univariate Frequencies of Percentage of Sales
Reported to Tax Authorities
Source: 1999 World Bank World Business Environment
Survey (WBES), excludes missing observations
% of Sales
Reported
100%
90-99%
80-89%
70-79%
60-69%
50-59%
<50%
Total
Frequency
3307
1096
916
703
501
694
936
8153
Percent
(%)
40.56
13.44
11.24
8.62
6.14
8.51
11.48
100
http://www.enterprisesurveys.org/
Contingency/Pivot/Cross Table
10
 May also want to produce a table with more
categories
 Cross table or Contingency table or Pivot table
 Suitable if you have two nominal/ordinal variables
 Simple extension to a univariate table
 Considers relationship between two variables
 Row variable (Dependent)
 Column variable (Independent)
Table2
Percentage of Sales Reported to Tax Authorities
by Region
Page 11
Africa Transition Asia Latin OECD Former Total
Europe America Soviet
Countries
100% 490 554 416 794 446 607 3,307
90-99% 266 196 142 119 145 228 1,096
80-89% 158 152 117 192 73 224 916
70-79% 162 117 103 153 43 125 703
60-69% 140 69 70 115 22 85 501
50-59% 140 105 141 118 16 174 694
<50% 100 106 283 296 25 126 936
Total 1,456 1,299 1,272 1,787 770 1,569 8,153
Source: 1999 World Bank World Business Environment Survey (WBES)
* Excludes missing observations
Features of a Table
12
 Title that accurately summarizes the data
 Simple, indicates major variables, and time frame (if applicable)
 Source: data set or origin of table
 Explanatory footnotes
 Easy to read & separated from text
 Properly formatted for style (see APA Rules)
 Necessary to advance analysis
 See Module 7 for APA Table Checklist
 Reproduced from APA manual
Page 13
Presenting the Data
Bar Graph
Page 14
 Often used to describe categorical data
 Ordinal/Nominal
 Draws attention to the frequency of each category
Page 15
Table 1
Univariate Frequencies of Percentage of Sales
Reported to Tax Authorities
Source: 1999 World Bank World Business Environment
Survey (WBES), excludes missing observations
% of Sales
Reported
100%
90-99%
80-89%
70-79%
60-69%
50-59%
<50%
Total
Frequency
3307
1096
916
703
501
694
936
8153
Percent
(%)
40.56
13.44
11.24
8.62
6.14
8.51
11.48
100
http://www.enterprisesurveys.org/
Bar Graph
Page 16
Figure 1
Percentage of sales reported to tax authority
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Relative Frequency Polygone
17
Pie Graph
Page 18
 Emphasizes the proportion of each category
 Something that may be good for our tax evasion data
 Circle represents the total
 Segments the shares of the total
 Segment size is proportional to frequency
Pie Graph
19
Figure 1
Percentage of sales reported to tax authority
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Page 2020
Pie Graph
Figure 1
Percentage of sales reported to tax authority
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Page 2121
Pie Graph
Figure 1
Percentage of sales reported to tax authority
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Charts in Excel I
22
Table2
Percentage of Sales Reported to Tax Authorities
by Region
Page 23
Africa Transition Asia Latin OECD Former Total
Europe America Soviet
Countries
100% 490 554 416 794 446 607 3,307
90-99% 266 196 142 119 145 228 1,096
80-89% 158 152 117 192 73 224 916
70-79% 162 117 103 153 43 125 703
60-69% 140 69 70 115 22 85 501
50-59% 140 105 141 118 16 174 694
<50% 100 106 283 296 25 126 936
Total 1,456 1,299 1,272 1,787 770 1,569 8,153
Bar Graph
Page 24
Figure 1
Percentage of sales reported to tax authority
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Page 2525
Segmented Bar Chart
Figure 1
Percentage of sales reported to tax authority
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Pie Graph
Page 26
Figure 2
Percentage of sales reported to tax authority by region
Source: 1999 World Bank World Business Environment Survey (WBES)
Note. Excludes missing observations. n = 8314
Vertical Bar Chart
27
Charts in Excel II
28
Time Series Graph
Page 29
 Time series are often used in social sciences
 Data collected at various time period: daily, weekly, monthly,
quarterly, annually, etc.
 Examples include GDP, Unemployment, University Tuition
 Plot series of interest over time
 Let’s look at a graph of the unemployment rate by gender and
age
Line Graph
Page 30
InstructorPage 31
Histogram
 Used for continuous data
 Frequency Distribution for continuous data
 Summary graph showing count of the data pints falling in
various ranges
 Rough approximate of the distribution of the data
 A histogram is a way to summarize data
 The distribution condenses the raw data into a more useful
form...
 and allows for a quick visual interpretation of the data
Histogram
32
InstructorPage 33
Scatter Graphs
 Graphs relationship between two continuous
variables
Scatter Graph
34
Principles of Graphical Excellence
35
 Well-designed presentation of interesting data
 Substance & design
 Simplicity of design, complexity of data
 Proportion and Balance
 Clear, precise, efficient
 Know what you are trying to show (have a story)
 make sure you graph shows it
 Well formatted, professional
 Choose format that reflects your data and the story
 Informative and legible axis
 Fully labelled & legible
 Gets across main point(s) in the shortest time with the least ink in the
smallest space
 Adds information not otherwise available to the reader
 But supplemented with text describing the figure
 Tells the truth about the data
 Limits complexity and confusion
 Avoid Chart Junk
36
0
10
20
30
40
50
60
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
0
20
40
60
80
100
120 West
North
Northeast
Southwest
Mexico
Europe
Japan
East
South
International
Examples of Chartjunk
37
Examples of Chartjunk
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Gridlines!
Vibration
Pointless
Fake 3-D Effects
Filled “Floor” Clip Art
In or out?
Filled
“Walls”
Borders and
Fills Galore
Unintentional
Heavy or Double Lines
Filled Labels
Serif Font with
Thin & Thick Lines
Displaying Data: “Mistakes”
Page 38
 Graphs are also instruments of evil used for deceiving
a naive viewer.
 Non-zero origin
 Omitting data that refutes your “evidence”
 Limiting scope of data
What is Wrong with this Graph?
39
Provincial Personal Income Taxes
Single Individual with $45,000 in
income claiming basic personal tax
credits
The Real Story
40
Exaggerates a change in data
Page 41
Source: Statistics Canada, CANSIM II, V31215364
Dr. Kendall
42
Worst Recession Since the Depression (?)
43
Page 44
Presenting the Data
Describing Data Numerically
45
Simple Arithmetic Mean
Median
Mode
Describing Data Numerically
Variance
Standard Deviation
Range
Central Tendency Variation Association
Covariance
Correlation
Shape of the Distribution
Mode
46
 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical data
 There may be no mode or several modes
 What are the modes for the displayed data?
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
Mode
47
 A measure of central tendency
 Value that occurs most often
 Not affected by extreme values
 Used for either numerical or categorical
data
 There may be no mode
 There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
Mode
48
 There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5 & 9
Mode
49
 Caution: Mode may not be representative of the data
 {0.1, 0.1, 5000, 4900, 4500, 5200,…}
Median
50
 In an ordered list, the median is the “middle” number
(50% above, 50% below)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean
51
 The “balancing point” (centre of gravity) of the data
 E.g. The data “balances” at 5
1 2 3 4 5 6 7 8 9
-2
-1 +3
Arithmetic Mean
52
 The arithmetic mean (mean) is the most
common measure of central tendency
 Calculated by summing the value observations
and dividing by the number of observations
 For a sample of size n:
# of observationsn
xxx
n
x
x n21
n
1i
i
+++
==
∑=  Observed
values
Arithmetic Mean
53
 The most common measure of central tendency
 Mean = sum of values divided by the number of values
 Affected by extreme values (outliers)
 What is the mean for these examples?
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Arithmetic Mean
54
 The most common measure of central tendency
 Mean = sum of values divided by the number of values
 Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3
5
15
5
54321
==
++++ 4
5
20
5
104321
==
++++
Measures of Central Tendency
55
Central Tendency
Mean Median Mode
n
x
x
n
1i
i∑=
=
Overview
Midpoint of
ranked values
Most frequently
observed valueArithmetic
average
50% 50%
The “Shape of a Distribution”
56
 Use information on mean, median, and mode to
“visualize” the data
 A data distribution is said to be symmetric if its shape
is the same on both sides of the median
 Symmetry implies that median=arithmetic mean
 If a distribution is uni-modal and symmetric then
 Median=mean=mode
The “Shape of a Distribution”
57
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6 7
#ofObs.
Value
MEDIAN50% 50%
Symmetric:
Median=Mean
Sym
m
etric:
Median=M
ean
UNIMODAL
Symmetric & Unimodel: Median=Mean=Mode
The “Shape of a Distribution”
58
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6 7
#ofObs.
Value
MEDIAN50% 50%
Sym
m
etric:
Median=M
ean Symmetric:
Median=Mean
BIMODAL BIMODAL
Symmetric & Bimodel: Median=Mean≠Mode
The “Shape of a Distribution”
59
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7
#ofObs.
Values
MEDIAN50% 50%
Symmetric:
Median=Mean
Symmetric:
Median=Mean
MODE?
Symmetric & no mode: Median=Mean (Uniform
The “Shape of a Distribution”
60
 An asymmetric distribution is said to be skewed
1. Negatively if Mean<Median<Mode
2. Positively if Mean>Median>Mode
 Hence, by comparing our measures of cental tendancy,
we can start to visualize the shape and characteristics
of the data
The “Shape of a Distribution”
61
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8
MODE=2
MEDIAN=3
50% 50%
MEAN=3.2
MODE < MEDIAN < MEAN = POSITIVELY SKEWED
DISTRIBUTION
Example: Positively skewed variable
62
 The Distribution of
After-Tax Income
 shows the distribution
of income across all
Canadian households
Example: Positively skewed variable
63
 The mode income is the
most common income and
was in the range from
$15,000 to $19,999.
 The median income is the
level of income that
separates the population into
two groups of equal size and
was $39,700.
 The mean income is the
average income and was
$48,400.
Example: Positively skewed variable
64
 A distribution in which the
mean exceeds the median
and the median exceeds
the mode is positively
skewed, which means it
has a long tail of high
values.
 The distribution of income
in Canada is positively
skewed.
 Most likely to report
median rather than mean
since long tail distorts
average
Example: Positively skewed variable
65
 Volunteer hours
 Charitable contributions
 # of Cigarette packs smoked (excluding 0)
 Collective bargaining agreement duration (in years)
 # of beers consumed on a Saturday night
 Duration of low income (in years)
 Number of children
The “Shape of a Distribution”
66
0
2
4
6
8
10
12
0 1 2 3 4 5 6 7
MODE=6
MEDIAN=5
50% 50%
MEAN=4.7
Mean< MEDIAN < Mode = NEGATIVELY SKEWED
DISTRIBUTION
Examples
67
 University Grades
 Age
 Years in school
 Etc.
Describing Data Numerically
68
Simple Arithmetic
Mean
Median
Mode
Describing Data Numerically
Variance
Standard Deviation
Range
Central Tendency Variation Association
Covariance
Correlation
Shape of the
Distribution
Same center,
different variation
Measures of Dispersion/Variability
69
Variation
Variance Standard
Deviation
Range
 Measures of variation
give information on the
spread or variability of
the data values.
Range
70
 Simplest measure of variation
 Difference between the largest and the smallest
observations:
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Example:
Range
71
 Simplest measure of variation
 Difference between the largest and the smallest
observations:
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
The Range
72
• Problem
• Ignores all but two data points
• These values may be “outliers” (i.e. not
representative)
Disadvantages of the Range
73
 Ignores the way in which data are distributed
 Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
The Variance
74
• A single summary measure of dispersion would be
more helpful
• Takes account of all data Values
The Variance
1. Variance
2. Standard Deviation
∑=
−
−
=
N
i
i Xx
n
s
1
22
)(
1
1
75
siancedeviationdards == vartan
Measuring variation
76
Small standard deviation
Large standard deviation
Comparing Standard Deviations
77
Mean = 15.5
s = 3.33811 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.570
Data C
Describing Data Numerically
78
Simple Arithmetic Mean
Median
Mode
Describing Data Numerically
Variance
Standard Deviation
Range
Central Tendency Variation Association
Covariance
Correlation
Shape of the Distribution
The Sample Covariance
79
 The covariance measures the strength of the linear
relationship between two variables
 The sample covariance:
 Only concerned with the strength of the
relationship
 No causal effect is implied
1n
)y)(yx(x
sy),(xCov
n
1i
ii
xy
−
−−
==
∑=
Interpreting Covariance
80
Covariance between two variables:
Cov(x,y) > 0 x and y tend to move in the same direction
Cov(x,y) < 0 x and y tend to move in opposite directions
Cov(x,y) = 0 x and y are independent
Coefficient of Correlation
81
 Measures the relative strength of the linear relationship
between two variable
 Sample correlation coefficient:
YX ss
y),(xCov
r =
Features of
Correlation Coefficient, r
82
 Unit free
 Ranges between –1 and 1
 The closer to –1, the stronger the negative linear
relationship
 The closer to 1, the stronger the positive linear
relationship
 The closer to 0, the weaker any positive linear
relationship
Interpreting the Correlation Coefficient, r
83
Scatter Plots of Data with Variou
Correlation Coefficients
84
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1
Cov<0
r = -.6
Cov<0
r = 0
Cov=0
r = +.3r = +1
Y
X
r = 0
502B
85
Fun with Graphs
86
 Does your mindset match my dataset!
 http://www.ted.com/talks/hans_rosling_at_state.html
Looking ahead
 SRs to client (cc) and Turnitin on Wednesday by
noon
 No class next week
 Work on 598 critiques
 598 Critiques due in class & Turnitin Nov. 30
 Comments on your SRs will be ready Nov. 30
 Final SRs (if required) due Dec. 8 @11:55PM PST
 Note carefully the requirements
 Moodle site will be inaccessible sometime in December
 Final Grades reported via usource once approved by
the Director
87

Más contenido relacionado

La actualidad más candente

Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesUni Azza Aunillah
 
Chap15 time series forecasting & index number
Chap15 time series forecasting & index numberChap15 time series forecasting & index number
Chap15 time series forecasting & index numberUni Azza Aunillah
 
Basic Stat Notes
Basic Stat NotesBasic Stat Notes
Basic Stat Notesroopcool
 
Data visualization 101_how_to_design_charts_and_graphs
Data visualization 101_how_to_design_charts_and_graphsData visualization 101_how_to_design_charts_and_graphs
Data visualization 101_how_to_design_charts_and_graphsAtner Yegorov
 
Chap14 multiple regression model building
Chap14 multiple regression model buildingChap14 multiple regression model building
Chap14 multiple regression model buildingUni Azza Aunillah
 
1 uncertain numbers and diversification
1 uncertain numbers and diversification1 uncertain numbers and diversification
1 uncertain numbers and diversificationyipping
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSr Edith Bogue
 
Chap01 describing data; graphical
Chap01 describing data;  graphicalChap01 describing data;  graphical
Chap01 describing data; graphicalJudianto Nugroho
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05Tuul Tuul
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06Tuul Tuul
 

La actualidad más candente (16)

Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Chap10 anova
Chap10 anovaChap10 anova
Chap10 anova
 
Chap09 2 sample test
Chap09 2 sample testChap09 2 sample test
Chap09 2 sample test
 
Statistics
StatisticsStatistics
Statistics
 
Chap15 time series forecasting & index number
Chap15 time series forecasting & index numberChap15 time series forecasting & index number
Chap15 time series forecasting & index number
 
Workshop lyons ray
Workshop lyons rayWorkshop lyons ray
Workshop lyons ray
 
Basic Stat Notes
Basic Stat NotesBasic Stat Notes
Basic Stat Notes
 
Data visualization 101_how_to_design_charts_and_graphs
Data visualization 101_how_to_design_charts_and_graphsData visualization 101_how_to_design_charts_and_graphs
Data visualization 101_how_to_design_charts_and_graphs
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Chap14 multiple regression model building
Chap14 multiple regression model buildingChap14 multiple regression model building
Chap14 multiple regression model building
 
1 uncertain numbers and diversification
1 uncertain numbers and diversification1 uncertain numbers and diversification
1 uncertain numbers and diversification
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Kofi nyanteng excel notes
Kofi nyanteng excel notesKofi nyanteng excel notes
Kofi nyanteng excel notes
 
Chap01 describing data; graphical
Chap01 describing data;  graphicalChap01 describing data;  graphical
Chap01 describing data; graphical
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06
 

Destacado

Inferential statistics (2)
Inferential statistics (2)Inferential statistics (2)
Inferential statistics (2)rajnulada
 
Quickreminder nature of the data (relationship)
Quickreminder nature of the data (relationship)Quickreminder nature of the data (relationship)
Quickreminder nature of the data (relationship)Ken Plummer
 
Descriptive v inferential
Descriptive v inferentialDescriptive v inferential
Descriptive v inferentialKen Plummer
 
Quick reminder ordinal or scaled or nominal porportional
Quick reminder   ordinal or scaled or nominal porportionalQuick reminder   ordinal or scaled or nominal porportional
Quick reminder ordinal or scaled or nominal porportionalKen Plummer
 
Workplace harassment of health worker
Workplace harassment of health workerWorkplace harassment of health worker
Workplace harassment of health workerShahid Imran Khan
 
Basic Statistics & Data Analysis
Basic Statistics & Data AnalysisBasic Statistics & Data Analysis
Basic Statistics & Data AnalysisAjendra Sharma
 
1a difference between inferential and descriptive statistics (explanation)
1a difference between inferential and descriptive statistics (explanation)1a difference between inferential and descriptive statistics (explanation)
1a difference between inferential and descriptive statistics (explanation)Ken Plummer
 
Standard Deviation
Standard DeviationStandard Deviation
Standard DeviationJRisi
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Ken Plummer
 
Scales of measurement in statistics
Scales of measurement in statisticsScales of measurement in statistics
Scales of measurement in statisticsShahid Imran Khan
 
Tutorial parametric v. non-parametric
Tutorial   parametric v. non-parametricTutorial   parametric v. non-parametric
Tutorial parametric v. non-parametricKen Plummer
 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-testKen Plummer
 
Null hypothesis for single linear regression
Null hypothesis for single linear regressionNull hypothesis for single linear regression
Null hypothesis for single linear regressionKen Plummer
 
Measurement scales (1)
Measurement scales (1)Measurement scales (1)
Measurement scales (1)Ken Plummer
 
Quick reminder diff-rel-ind-gd of fit (spanish in four slides) (2)
Quick reminder   diff-rel-ind-gd of fit (spanish in four slides) (2)Quick reminder   diff-rel-ind-gd of fit (spanish in four slides) (2)
Quick reminder diff-rel-ind-gd of fit (spanish in four slides) (2)Ken Plummer
 
Basic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsBasic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsStatistics Consultation
 

Destacado (20)

Burns And Bush Chapter 15
Burns And Bush Chapter 15Burns And Bush Chapter 15
Burns And Bush Chapter 15
 
Basic Statistics
Basic  StatisticsBasic  Statistics
Basic Statistics
 
Inferential statistics (2)
Inferential statistics (2)Inferential statistics (2)
Inferential statistics (2)
 
Quickreminder nature of the data (relationship)
Quickreminder nature of the data (relationship)Quickreminder nature of the data (relationship)
Quickreminder nature of the data (relationship)
 
Descriptive v inferential
Descriptive v inferentialDescriptive v inferential
Descriptive v inferential
 
Quick reminder ordinal or scaled or nominal porportional
Quick reminder   ordinal or scaled or nominal porportionalQuick reminder   ordinal or scaled or nominal porportional
Quick reminder ordinal or scaled or nominal porportional
 
EEX 501 Assess Ch4,5,6,7,All
EEX 501 Assess Ch4,5,6,7,AllEEX 501 Assess Ch4,5,6,7,All
EEX 501 Assess Ch4,5,6,7,All
 
K&n's
K&n'sK&n's
K&n's
 
Workplace harassment of health worker
Workplace harassment of health workerWorkplace harassment of health worker
Workplace harassment of health worker
 
Basic Statistics & Data Analysis
Basic Statistics & Data AnalysisBasic Statistics & Data Analysis
Basic Statistics & Data Analysis
 
1a difference between inferential and descriptive statistics (explanation)
1a difference between inferential and descriptive statistics (explanation)1a difference between inferential and descriptive statistics (explanation)
1a difference between inferential and descriptive statistics (explanation)
 
Standard Deviation
Standard DeviationStandard Deviation
Standard Deviation
 
Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?Is the Data Scaled, Ordinal, or Nominal Proportional?
Is the Data Scaled, Ordinal, or Nominal Proportional?
 
Scales of measurement in statistics
Scales of measurement in statisticsScales of measurement in statistics
Scales of measurement in statistics
 
Tutorial parametric v. non-parametric
Tutorial   parametric v. non-parametricTutorial   parametric v. non-parametric
Tutorial parametric v. non-parametric
 
Reporting a single sample t-test
Reporting a single sample t-testReporting a single sample t-test
Reporting a single sample t-test
 
Null hypothesis for single linear regression
Null hypothesis for single linear regressionNull hypothesis for single linear regression
Null hypothesis for single linear regression
 
Measurement scales (1)
Measurement scales (1)Measurement scales (1)
Measurement scales (1)
 
Quick reminder diff-rel-ind-gd of fit (spanish in four slides) (2)
Quick reminder   diff-rel-ind-gd of fit (spanish in four slides) (2)Quick reminder   diff-rel-ind-gd of fit (spanish in four slides) (2)
Quick reminder diff-rel-ind-gd of fit (spanish in four slides) (2)
 
Basic Concepts of Inferential statistics
Basic Concepts of Inferential statisticsBasic Concepts of Inferential statistics
Basic Concepts of Inferential statistics
 

Similar a Intro to quant_analysis_students

Bbs11 ppt ch02
Bbs11 ppt ch02Bbs11 ppt ch02
Bbs11 ppt ch02Tuul Tuul
 
Graphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec domsGraphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec domsBabasab Patil
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesUni Azza Aunillah
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for SponsorsDee Daley
 
Presenting Data in Tables and Charts
Presenting Data in Tables and ChartsPresenting Data in Tables and Charts
Presenting Data in Tables and ChartsYesica Adicondro
 
Bj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeBj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeIan Cammack
 
qc-tools.ppt
qc-tools.pptqc-tools.ppt
qc-tools.pptAlpharoot
 
Demand forecasting methods 1 gp
Demand forecasting methods 1 gpDemand forecasting methods 1 gp
Demand forecasting methods 1 gpPUTTU GURU PRASAD
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsRozainita Rosley
 
Newbold_chap02.ppt
Newbold_chap02.pptNewbold_chap02.ppt
Newbold_chap02.pptcfisicaster
 
Engineering Data Analysis-ProfCharlton
Engineering Data  Analysis-ProfCharltonEngineering Data  Analysis-ProfCharlton
Engineering Data Analysis-ProfCharltonCharltonInao1
 
Lecture 1 - Overview.pptx
Lecture 1 - Overview.pptxLecture 1 - Overview.pptx
Lecture 1 - Overview.pptxDrAnisFatima
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptxssuser03ba7c
 
Source of DATA
Source of DATASource of DATA
Source of DATANahid Amin
 
1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docx1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docxjackiewalcutt
 
Business statistics-i-part1-aarhus-bss
Business statistics-i-part1-aarhus-bssBusiness statistics-i-part1-aarhus-bss
Business statistics-i-part1-aarhus-bssAntonio Rivero Ostoic
 

Similar a Intro to quant_analysis_students (20)

Bbs11 ppt ch02
Bbs11 ppt ch02Bbs11 ppt ch02
Bbs11 ppt ch02
 
Graphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec domsGraphs, charts, and tables ppt @ bec doms
Graphs, charts, and tables ppt @ bec doms
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Notes Chapter 3.pptx
Notes Chapter 3.pptxNotes Chapter 3.pptx
Notes Chapter 3.pptx
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for Sponsors
 
Presenting Data in Tables and Charts
Presenting Data in Tables and ChartsPresenting Data in Tables and Charts
Presenting Data in Tables and Charts
 
Bj research session 9 analysing quantitative
Bj research session 9 analysing quantitativeBj research session 9 analysing quantitative
Bj research session 9 analysing quantitative
 
qc-tools.ppt
qc-tools.pptqc-tools.ppt
qc-tools.ppt
 
statistics.pptx
statistics.pptxstatistics.pptx
statistics.pptx
 
Demand forecasting methods 1 gp
Demand forecasting methods 1 gpDemand forecasting methods 1 gp
Demand forecasting methods 1 gp
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Newbold_chap02.ppt
Newbold_chap02.pptNewbold_chap02.ppt
Newbold_chap02.ppt
 
Engineering Data Analysis-ProfCharlton
Engineering Data  Analysis-ProfCharltonEngineering Data  Analysis-ProfCharlton
Engineering Data Analysis-ProfCharlton
 
Basic Statistics to start Analytics
Basic Statistics to start AnalyticsBasic Statistics to start Analytics
Basic Statistics to start Analytics
 
Lecture 1 - Overview.pptx
Lecture 1 - Overview.pptxLecture 1 - Overview.pptx
Lecture 1 - Overview.pptx
 
2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx2-L2 Presentation of data.pptx
2-L2 Presentation of data.pptx
 
Stats chapter 1
Stats chapter 1Stats chapter 1
Stats chapter 1
 
Source of DATA
Source of DATASource of DATA
Source of DATA
 
1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docx1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docx
 
Business statistics-i-part1-aarhus-bss
Business statistics-i-part1-aarhus-bssBusiness statistics-i-part1-aarhus-bss
Business statistics-i-part1-aarhus-bss
 

Último

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Intro to quant_analysis_students

  • 1. Week 11: Basic Descriptive Quantitative Data Analysis Tables, Graphs, & Summary Statistics 1
  • 2. Objectives  Learn about basic descriptive quantitative analysis  How to perform these tasks in Excel  Starting point for 502B  Excel knowledge and quantitative skills are highly desired by Employers  EC stream 2
  • 3. Introduction 3  Without data, it is anyone’s opinion  Why use tables, graphs, summary stats? “At their best, tables, graphs, and statistics are instruments for reasoning about complex quantitative information.”  Why learn how to design them appropriately? “At their worst, tables, graphs and summary statistics are instruments of evil used for deceiving a naive viewer.”  Does your mindset match my dataset!  http://www.ted.com/talks/hans_rosling_at_state.html
  • 7. Frequency Distribution Page 7  A convenient way of summarizing a lot of tabular data  What is a Frequency Distribution?  A frequency distribution is a list or a table …  containing class groupings (categories or ranges within which the data fall) ...  and the corresponding frequencies with which data fall within each class or category  For nominal/ordinal data
  • 9. Page 9 Table 1 Univariate Frequencies of Percentage of Sales Reported to Tax Authorities Source: 1999 World Bank World Business Environment Survey (WBES), excludes missing observations % of Sales Reported 100% 90-99% 80-89% 70-79% 60-69% 50-59% <50% Total Frequency 3307 1096 916 703 501 694 936 8153 Percent (%) 40.56 13.44 11.24 8.62 6.14 8.51 11.48 100 http://www.enterprisesurveys.org/
  • 10. Contingency/Pivot/Cross Table 10  May also want to produce a table with more categories  Cross table or Contingency table or Pivot table  Suitable if you have two nominal/ordinal variables  Simple extension to a univariate table  Considers relationship between two variables  Row variable (Dependent)  Column variable (Independent)
  • 11. Table2 Percentage of Sales Reported to Tax Authorities by Region Page 11 Africa Transition Asia Latin OECD Former Total Europe America Soviet Countries 100% 490 554 416 794 446 607 3,307 90-99% 266 196 142 119 145 228 1,096 80-89% 158 152 117 192 73 224 916 70-79% 162 117 103 153 43 125 703 60-69% 140 69 70 115 22 85 501 50-59% 140 105 141 118 16 174 694 <50% 100 106 283 296 25 126 936 Total 1,456 1,299 1,272 1,787 770 1,569 8,153 Source: 1999 World Bank World Business Environment Survey (WBES) * Excludes missing observations
  • 12. Features of a Table 12  Title that accurately summarizes the data  Simple, indicates major variables, and time frame (if applicable)  Source: data set or origin of table  Explanatory footnotes  Easy to read & separated from text  Properly formatted for style (see APA Rules)  Necessary to advance analysis  See Module 7 for APA Table Checklist  Reproduced from APA manual
  • 14. Bar Graph Page 14  Often used to describe categorical data  Ordinal/Nominal  Draws attention to the frequency of each category
  • 15. Page 15 Table 1 Univariate Frequencies of Percentage of Sales Reported to Tax Authorities Source: 1999 World Bank World Business Environment Survey (WBES), excludes missing observations % of Sales Reported 100% 90-99% 80-89% 70-79% 60-69% 50-59% <50% Total Frequency 3307 1096 916 703 501 694 936 8153 Percent (%) 40.56 13.44 11.24 8.62 6.14 8.51 11.48 100 http://www.enterprisesurveys.org/
  • 16. Bar Graph Page 16 Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 18. Pie Graph Page 18  Emphasizes the proportion of each category  Something that may be good for our tax evasion data  Circle represents the total  Segments the shares of the total  Segment size is proportional to frequency
  • 19. Pie Graph 19 Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 20. Page 2020 Pie Graph Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 21. Page 2121 Pie Graph Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 23. Table2 Percentage of Sales Reported to Tax Authorities by Region Page 23 Africa Transition Asia Latin OECD Former Total Europe America Soviet Countries 100% 490 554 416 794 446 607 3,307 90-99% 266 196 142 119 145 228 1,096 80-89% 158 152 117 192 73 224 916 70-79% 162 117 103 153 43 125 703 60-69% 140 69 70 115 22 85 501 50-59% 140 105 141 118 16 174 694 <50% 100 106 283 296 25 126 936 Total 1,456 1,299 1,272 1,787 770 1,569 8,153
  • 24. Bar Graph Page 24 Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 25. Page 2525 Segmented Bar Chart Figure 1 Percentage of sales reported to tax authority Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 26. Pie Graph Page 26 Figure 2 Percentage of sales reported to tax authority by region Source: 1999 World Bank World Business Environment Survey (WBES) Note. Excludes missing observations. n = 8314
  • 29. Time Series Graph Page 29  Time series are often used in social sciences  Data collected at various time period: daily, weekly, monthly, quarterly, annually, etc.  Examples include GDP, Unemployment, University Tuition  Plot series of interest over time  Let’s look at a graph of the unemployment rate by gender and age
  • 31. InstructorPage 31 Histogram  Used for continuous data  Frequency Distribution for continuous data  Summary graph showing count of the data pints falling in various ranges  Rough approximate of the distribution of the data  A histogram is a way to summarize data  The distribution condenses the raw data into a more useful form...  and allows for a quick visual interpretation of the data
  • 33. InstructorPage 33 Scatter Graphs  Graphs relationship between two continuous variables
  • 35. Principles of Graphical Excellence 35  Well-designed presentation of interesting data  Substance & design  Simplicity of design, complexity of data  Proportion and Balance  Clear, precise, efficient  Know what you are trying to show (have a story)  make sure you graph shows it  Well formatted, professional  Choose format that reflects your data and the story  Informative and legible axis  Fully labelled & legible  Gets across main point(s) in the shortest time with the least ink in the smallest space  Adds information not otherwise available to the reader  But supplemented with text describing the figure  Tells the truth about the data  Limits complexity and confusion  Avoid Chart Junk
  • 36. 36 0 10 20 30 40 50 60 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 0 20 40 60 80 100 120 West North Northeast Southwest Mexico Europe Japan East South International Examples of Chartjunk
  • 37. 37 Examples of Chartjunk 0 10 20 30 40 50 60 70 80 90 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr Gridlines! Vibration Pointless Fake 3-D Effects Filled “Floor” Clip Art In or out? Filled “Walls” Borders and Fills Galore Unintentional Heavy or Double Lines Filled Labels Serif Font with Thin & Thick Lines
  • 38. Displaying Data: “Mistakes” Page 38  Graphs are also instruments of evil used for deceiving a naive viewer.  Non-zero origin  Omitting data that refutes your “evidence”  Limiting scope of data
  • 39. What is Wrong with this Graph? 39 Provincial Personal Income Taxes Single Individual with $45,000 in income claiming basic personal tax credits
  • 41. Exaggerates a change in data Page 41 Source: Statistics Canada, CANSIM II, V31215364
  • 43. Worst Recession Since the Depression (?) 43
  • 45. Describing Data Numerically 45 Simple Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Range Central Tendency Variation Association Covariance Correlation Shape of the Distribution
  • 46. Mode 46  A measure of central tendency  Value that occurs most often  Not affected by extreme values  Used for either numerical or categorical data  There may be no mode or several modes  What are the modes for the displayed data? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
  • 47. Mode 47  A measure of central tendency  Value that occurs most often  Not affected by extreme values  Used for either numerical or categorical data  There may be no mode  There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode
  • 48. Mode 48  There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 5 & 9
  • 49. Mode 49  Caution: Mode may not be representative of the data  {0.1, 0.1, 5000, 4900, 4500, 5200,…}
  • 50. Median 50  In an ordered list, the median is the “middle” number (50% above, 50% below) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 51. Mean 51  The “balancing point” (centre of gravity) of the data  E.g. The data “balances” at 5 1 2 3 4 5 6 7 8 9 -2 -1 +3
  • 52. Arithmetic Mean 52  The arithmetic mean (mean) is the most common measure of central tendency  Calculated by summing the value observations and dividing by the number of observations  For a sample of size n: # of observationsn xxx n x x n21 n 1i i +++ == ∑=  Observed values
  • 53. Arithmetic Mean 53  The most common measure of central tendency  Mean = sum of values divided by the number of values  Affected by extreme values (outliers)  What is the mean for these examples? 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
  • 54. Arithmetic Mean 54  The most common measure of central tendency  Mean = sum of values divided by the number of values  Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 54321 == ++++ 4 5 20 5 104321 == ++++
  • 55. Measures of Central Tendency 55 Central Tendency Mean Median Mode n x x n 1i i∑= = Overview Midpoint of ranked values Most frequently observed valueArithmetic average 50% 50%
  • 56. The “Shape of a Distribution” 56  Use information on mean, median, and mode to “visualize” the data  A data distribution is said to be symmetric if its shape is the same on both sides of the median  Symmetry implies that median=arithmetic mean  If a distribution is uni-modal and symmetric then  Median=mean=mode
  • 57. The “Shape of a Distribution” 57 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 #ofObs. Value MEDIAN50% 50% Symmetric: Median=Mean Sym m etric: Median=M ean UNIMODAL Symmetric & Unimodel: Median=Mean=Mode
  • 58. The “Shape of a Distribution” 58 0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 #ofObs. Value MEDIAN50% 50% Sym m etric: Median=M ean Symmetric: Median=Mean BIMODAL BIMODAL Symmetric & Bimodel: Median=Mean≠Mode
  • 59. The “Shape of a Distribution” 59 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 #ofObs. Values MEDIAN50% 50% Symmetric: Median=Mean Symmetric: Median=Mean MODE? Symmetric & no mode: Median=Mean (Uniform
  • 60. The “Shape of a Distribution” 60  An asymmetric distribution is said to be skewed 1. Negatively if Mean<Median<Mode 2. Positively if Mean>Median>Mode  Hence, by comparing our measures of cental tendancy, we can start to visualize the shape and characteristics of the data
  • 61. The “Shape of a Distribution” 61 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 MODE=2 MEDIAN=3 50% 50% MEAN=3.2 MODE < MEDIAN < MEAN = POSITIVELY SKEWED DISTRIBUTION
  • 62. Example: Positively skewed variable 62  The Distribution of After-Tax Income  shows the distribution of income across all Canadian households
  • 63. Example: Positively skewed variable 63  The mode income is the most common income and was in the range from $15,000 to $19,999.  The median income is the level of income that separates the population into two groups of equal size and was $39,700.  The mean income is the average income and was $48,400.
  • 64. Example: Positively skewed variable 64  A distribution in which the mean exceeds the median and the median exceeds the mode is positively skewed, which means it has a long tail of high values.  The distribution of income in Canada is positively skewed.  Most likely to report median rather than mean since long tail distorts average
  • 65. Example: Positively skewed variable 65  Volunteer hours  Charitable contributions  # of Cigarette packs smoked (excluding 0)  Collective bargaining agreement duration (in years)  # of beers consumed on a Saturday night  Duration of low income (in years)  Number of children
  • 66. The “Shape of a Distribution” 66 0 2 4 6 8 10 12 0 1 2 3 4 5 6 7 MODE=6 MEDIAN=5 50% 50% MEAN=4.7 Mean< MEDIAN < Mode = NEGATIVELY SKEWED DISTRIBUTION
  • 67. Examples 67  University Grades  Age  Years in school  Etc.
  • 68. Describing Data Numerically 68 Simple Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Range Central Tendency Variation Association Covariance Correlation Shape of the Distribution
  • 69. Same center, different variation Measures of Dispersion/Variability 69 Variation Variance Standard Deviation Range  Measures of variation give information on the spread or variability of the data values.
  • 70. Range 70  Simplest measure of variation  Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Example:
  • 71. Range 71  Simplest measure of variation  Difference between the largest and the smallest observations: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:
  • 72. The Range 72 • Problem • Ignores all but two data points • These values may be “outliers” (i.e. not representative)
  • 73. Disadvantages of the Range 73  Ignores the way in which data are distributed  Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
  • 74. The Variance 74 • A single summary measure of dispersion would be more helpful • Takes account of all data Values
  • 75. The Variance 1. Variance 2. Standard Deviation ∑= − − = N i i Xx n s 1 22 )( 1 1 75 siancedeviationdards == vartan
  • 76. Measuring variation 76 Small standard deviation Large standard deviation
  • 77. Comparing Standard Deviations 77 Mean = 15.5 s = 3.33811 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = 0.926 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.570 Data C
  • 78. Describing Data Numerically 78 Simple Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Range Central Tendency Variation Association Covariance Correlation Shape of the Distribution
  • 79. The Sample Covariance 79  The covariance measures the strength of the linear relationship between two variables  The sample covariance:  Only concerned with the strength of the relationship  No causal effect is implied 1n )y)(yx(x sy),(xCov n 1i ii xy − −− == ∑=
  • 80. Interpreting Covariance 80 Covariance between two variables: Cov(x,y) > 0 x and y tend to move in the same direction Cov(x,y) < 0 x and y tend to move in opposite directions Cov(x,y) = 0 x and y are independent
  • 81. Coefficient of Correlation 81  Measures the relative strength of the linear relationship between two variable  Sample correlation coefficient: YX ss y),(xCov r =
  • 82. Features of Correlation Coefficient, r 82  Unit free  Ranges between –1 and 1  The closer to –1, the stronger the negative linear relationship  The closer to 1, the stronger the positive linear relationship  The closer to 0, the weaker any positive linear relationship
  • 83. Interpreting the Correlation Coefficient, r 83
  • 84. Scatter Plots of Data with Variou Correlation Coefficients 84 Y X Y X Y X Y X Y X r = -1 Cov<0 r = -.6 Cov<0 r = 0 Cov=0 r = +.3r = +1 Y X r = 0
  • 86. Fun with Graphs 86  Does your mindset match my dataset!  http://www.ted.com/talks/hans_rosling_at_state.html
  • 87. Looking ahead  SRs to client (cc) and Turnitin on Wednesday by noon  No class next week  Work on 598 critiques  598 Critiques due in class & Turnitin Nov. 30  Comments on your SRs will be ready Nov. 30  Final SRs (if required) due Dec. 8 @11:55PM PST  Note carefully the requirements  Moodle site will be inaccessible sometime in December  Final Grades reported via usource once approved by the Director 87

Notas del editor

  1. Graph makes the frequencies pop more
  2. Or that which could have been a bar chart can be made into a line by connect the midpoints
  3. Remember our cross table? Can we present this graphically?
  4. Note legend is on right as no room on left hand side
  5. Or we can display this as a stacked bar where the proportion of each region in each category is displayed. Called a segmented bar chart
  6. Mancession Video 4 minutes Unemployment Rates sheet ExcelTutorial5_timeseriesgraph
  7. The main defences of the lying graph is that at least it was approximately corret, we were just trying to show the general direction of change or magnitidue.
  8. So yes, taxes are low in BC but not as low as show in the original graph Non zerio origins are a great way to lie Very popular in government
  9. Remember this time series graph. Look at what happens if we change the scale on the Y axis Boy, that really changes your impression of the data and the underlying trend. The drop from 1992 to 1997 was 7%. Does this graph under or overstate a 7% change over this period?
  10. Dr. Kendall used his diagram to demonstrate that we are drinking too much when really there are more people drinking due to population growth
  11. 9 No mode
  12. If the mean=median and there is no mode, your distribution looks something like this
  13. Not as frequently occuring in economic data so I actually do not have many examples
  14. What does the standard deviation tell us? It tells us how far from the mean the data points tend to be . A bigger number tells us that the observations are further away from the mean than if there is a small standard deviation. Tells us HOW representative of the data the mean is.
  15. Since the standard deviation can be thought of measuring how far the data values lie from the mean, we take the mean and move one standard deviation in either direction.  The mean for this example was about 15.5 For the first distribution we have 15.5+3.338= 18.838 and 15.5-3.338=12.162   Assuming this is how much restaurant patrons spend, what this means is that most of the patrons probably spend between $12.16 and $18.84. In the second example, we have 15.5+0.926=16.43 and 15.5-0.926=14.57 which as you can see shows less spread in the data. In the third example we have 15.5+4.57=20.07 and 15.5-4.57=10.93 which is the most spread. Excel 4 minutes Food Expenditures 2 ExcelTutorial9_Dispersion.mp4
  16. Measures of Relationships Between Variables More often than not, we are interested in describing relationship between variables On Oct. 28 we learned about scatter plots as a graphical way to describe a relationship between two variables. We also learned about cross tabs aka contingency tables for nominal/ordinal variables Let’s look a little more closely at measure of relationships for ratio level data
  17. Excel Food&amp;Income 2 sheet