SlideShare una empresa de Scribd logo
1 de 95
Chapter 4
More about relationships between
2 variables
4.1 TRANSFORMING TO ACHIEVE
LINEARITY
What if the scatterplot is not
linear?
• Of course not all data is linear!
• Our method in statistics will involving
mathematically operating on one or both
of the explanatory and response variables
• An inverse transformation will be used to
create a non-linear regression model
• This will be a little “mathy”
Transformations
• Before we begin transformations,
remember that some well known
phenomenon act in predictable ways
– I.e. when working with time and gravity, you
should know that there is a square
relationship between distance and time!
The Basics
• The data from measurements (raw data)
must be operated on.
• Apply the same mathematical
transformation on the raw data
– Ex. “Square every response”
• Use methods from the previous chapter to
find the LSRL for the transformed data
• Analyze your regression to ensure the LSRL is
appropriate
• Apply an inverse transformation on the LSRL
to find the regression for the raw data.
Example
Please refer to p 265 exercise 4.2
Length (cm) Period (s)
16.5 0.777
17.5 0.839
19.5 0.912
22.5 0.878
28.5 1.004
31.5 1.087
34.5 1.129
37.5 1.111
43.5 1.290
46.5 1.371
106.5 2.115
Example
• Data inputted into L1
and L2
• Scatterplot
• Looks pretty good,
right?
Exercise
• LSRL
• Y=.6+.015X
r = 0.991
• Residual Plot
• Perhaps we can do
better!
Example
• L3 = L2^.5 (square
root)
• LinReg L1, L3
• Note that the value
of r2 has increased
• Note that the value
of the residual of the
last point has
decreased
Exponential Models
• Many natural phenomenon are explained
by an exponential model.
• Exponential models are marked by sharp
increases in growth and decay.
• Basic model: y = A·Bx
• For this transformation, you need to take
the logarithm of the response data.
• You may use “log10” or “ln” your choice.
– I prefer “ln” (of course)
Exponential Models
After the transformation, we have the
following linear model: ln(y) = a + b·x
1. ln(y) = a + b·x
2. eln(y) = e(a + b·x) exponentiate
3. y = ea · ebx property of logarithms
4. Let ‘A’ = ea redefine variables
‘B’ = eb
5. y = A·Bx this is our model
Exponential Models
• Since this is an ‘applied math’ course,
you need not remember how to apply the
inverse transformation
• Whew
• BUT you do need to memorize:
when ln(y) = a + bx
y = A·Bx
where ‘A’ = ea and ‘B’ = eb
Exponential Models
Let’s try this data
Exponential Models
Take the ln of L2- the
response list and store in L3
Exponential Models
These are our “transformed
responses”
Exponential Models
From our homescreen, we
perform an LSRL using the
transformed data
Exponential Models
We don’t have to store this
regression for transformed data
Exponential Models
Take note of the values of
‘a’ and ‘b’
Exponential Models
A quick look at the
residuals
Exponential Models
The values of the residuals are
small .. . no defined pattern
Exponential Models
• Our regression model is exponential
y = A·Bx
Where A = ea and B = eb
• y = e0.701 · (e0.184)x
Exponential Models
• Our regression model is exponential
y = A·Bx
Where A = ea and B = eb
• y = e0.701 x (e0.184)x
Exponential Models
• Our regression model is exponential
y = A·Bx
Where A = ea and B = eb
• y = e0.701 x (e0.184)x
• Or
y = 2.06 · (1.20)x
Exponential Models
Put our regression in Y1
Exponential Models
Change Plot1 from a
resid. to a scatter plot
Exponential Models
Looks pretty good, eh?
Power Models
• These models are used when the rate of
increase is less severe than an
exponential model, or if you suspect a
‘root’ model
• For this model, you will find the
logarithms of both the expl var and the
resp var
Power models
LSRL on transformed data yields:
ln(y) = a + b·ln(x)
1. ln(y) = a + b·ln(x)
2. e ln(y) = e(a + b·ln(x))
3. y = ea·eln(x^b)
4. y = ea ·xb
5. Let ‘A’ = ea
6. y = A · xb
Power models
Let’s use this data to find a power model
Power models
This time we need to transform both lists
Power models
This time we need to transform both lists
Power models
Transformed exp = L3
Transformed resp = L4
Power models
LSRL on transformed data
no need to store in Y1
Power models
Take note of the values of ‘a’ and ‘b’
Power models
A quick look at the residuals
Power models
Note that we use the transformed exp var
Power models
No defined pattern
Power models
Residuals are all small in size
Power models
• When ln(y) = a + b·ln(x),
y = A · xb
where ‘A’ = ea
Our model is y = (e1.31)· x1.27
Power models
• When ln(y) = a + b·ln(x),
y = A · xb
where ‘A’ = ea
Our model is y = (e1.31) · x1.27
Power models
• When ln(y) = a + b·ln(x),
y = A · xb
where ‘A’ = ea
Our model is y = (e1.31) · x1.27
Or y = 3.71 · x1.27
Power models
Regression in Y1
Power models
Change from resid to scatter plot
Power models
(notice L1 and L2)
Power models
Looks pretty good!
Power models
• Much like the exponential model, you
only need to know how the transformed
model becomes the model for the raw
data.
• When ln(y) = a + b·ln(x),
y = A · xb
where ‘A’ = ea
Transformation thoughts
• Although this is not a major topic for the
course, you still need to be able to apply
these two transformations (exp and power)
• Be sure to check the residuals for the LSRL
on transformed data! You may have picked
the wrong model :/
• If one model doesn’t work, try the other. I
would start with the exponential model.
• Don’t transform into a cockroach. Ask Kafka!
Assn 4.1
• pg 276 #5, 8, 9, 11, 12
4.2 RELATIONSHIPS BETWEEN
CATEGORICAL VARIABLES
Marginal Distributions
• Tables that relate two categorical variables
are called “Two-Way Tables”
– Ex 4.11 pg 292
• Marginal Distribution
– Very fancy term for “row totals and column
totals”
– Named because the totals appear in the margins
of the table. Wow.
• Often, the percentage of the row or column
table is very informative
Marginal Distributions
Age
Group
Female Male Total
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Marginal Distributions
Age
Group
Female Male Total
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Column Totals
Marginal Distributions
Age
Group
Female Male Total
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Row Totals
Marginal Distributions
Age
Group
Female Male Total
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Grand Total
Marginal Distributions “Age Group”
Marginal Distributions “Age Group”
Age
Group
Female Male Total Marg. Dist.
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Marginal Distributions “Age Group”
Age
Group
Female Male Total Marg. Dist.
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Row total / grand total
150/16639=0.009
Marginal Distributions “Age Group”
Age
Group
Female Male Total Marg. Dist.
15-17 89 61 150 0.9%
18-24 5668 4697 10365
25-34 1904 1589 3494
35 or
older
1660 970 2630
Totals 9321 7317 16639
Row total / grand total
150/16639=0.009
Marginal Distributions “Age Group”
Age
Group
Female Male Total Marg. Dist.
15-17 89 61 150 0.9%
18-24 5668 4697 10365 62.3%
25-34 1904 1589 3494 21.0%
35 or
older
1660 970 2630 15.8%
Totals 9321 7317 16639 100%
Adds to 100%
Marginal Distributions “Gender”
Age
Group
Female Male Total
15-17 89 61 150
18-24 5668 4697 10365
25-34 1904 1589 3494
35 &up 1660 970 2630
Totals 9321 7317 16639
Margin
dist.
56% 44% 100%
Similarly for columns
Describing Relationships
• Some relationships are easier to see when
we look at the proportions within each
group
• These distributions are called “Conditional
Distributions”
• To find a conditional distribution, find each
percentage of the row or column total.
• Let’s look at the same table, and find the
conditional distribution of gender, given
each age group
Conditional Distributions
Age
Group
Female Male Total
15-17 89 61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
Conditional Distributions
Age
Group
Female Male Total
15-17 89 61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
We will look at the
conditional
distribution for this
row
Conditional Distributions
Age
Group
Female Male Total
15-17 89 61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
This cell is 89/150
(cell total /row total)
=53.9%
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
This cell is 89/150
(cell total /row total)
=59.3%
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
This cell is 61/150
(cell total /row total)
=40.7%
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
This cell is 61/150
(cell total /row total)
=40.7%
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
The table with
complete conditional
distributions for each
row
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
For an analysis of the
effect of age groups,
compare a row’s
conditional
distribution…
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
With the marginal
distribution for the
columns…
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
They should be close
…
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
… unless there is an
effect caused by the
age group (?)
Conditional Distributions
Age
Group
Female Male Total
15-17 89
(59.3%)
61
(40.7%)
150
(100%)
18-24 5668
(54.7%)
4697
(45.3%)
10365
(100%)
25-34 1904
(54.5%)
1589
(45.5%)
3494
(100%)
35 or
older
1660
(63.1%)
970
(36.9%)
2630
(100%)
Totals 9321
(56%)
7317
(44%)
16639
(100%)
… and these are not
close to the marginal
distribution!
Conditional Distributions
• Based on the previous table, the
distribution of “gender given age group”
are not that different.
• We can see that the “35 and older” group
seems to differ slightly from the overall
trend.
Conditional Distributions
“age group given gender”
Age
Group
Female Male Total
15-17 89
(1%)
61
(0.8%)
150
(0.9%)
18-24 5668
(60.8%)
4697
(64.2%)
10365
(62.3%)
25-34 1904
(20.4%)
1589
(21.7%)
3494
(21.0%)
35 or
older
1660
(17.8%)
970
(13.3%)
2630
(15.8%)
Totals 9321
(100%)
7317
(100%)
16639
(100%)
Conditional Distributions
“age group given gender”
Age
Group
Female Male Total
15-17 89
(1%)
61
(0.8%)
150
(0.9%)
18-24 5668
(60.8%)
4697
(64.2%)
10365
(62.3%)
25-34 1904
(20.4%)
1589
(21.7%)
3494
(21.0%)
35 or
older
1660
(17.8%)
970
(13.3%)
2630
(15.8%)
Totals 9321
(100%)
7317
(100%)
16639
(100%)
Here is the same chart
with the conditional
distributions by
gender…
Conditional Distributions
“age group given gender”
Age
Group
Female Male Total
15-17 89
(1%)
61
(0.8%)
150
(0.9%)
18-24 5668
(60.8%)
4697
(64.2%)
10365
(62.3%)
25-34 1904
(20.4%)
1589
(21.7%)
3494
(21.0%)
35 or
older
1660
(17.8%)
970
(13.3%)
2630
(15.8%)
Totals 9321
(100%)
7317
(100%)
16639
(100%)
Is there a gender
effect noticeable from
this table?
Conditional Distributions
“age group given gender”
Age
Group
Female Male Total
15-17 89
(1%)
61
(0.8%)
150
(0.9%)
18-24 5668
(60.8%)
4697
(64.2%)
10365
(62.3%)
25-34 1904
(20.4%)
1589
(21.7%)
3494
(21.0%)
35 or
older
1660
(17.8%)
970
(13.3%)
2630
(15.8%)
Totals 9321
(100%)
7317
(100%)
16639
(100%)
Conditional Distribution
Conclusions from the previous chart
• Females are more likely to be in the “35 and
older group” and less likely to be in the “18
to 24” group
• Males are more likely to be in the “18 to 24”
group and less likely to be in the “35 and
older” group
• These differences appear slight. Are
actually “significant” with respect to the
overall distribution?
Conditional Distribution
• No single graph portrays the form of the
relationship between categorical
variables.
• No single numerical measure (such as
correlation) summarizes the strength of
the association.
Simpson’s Paradox
• Associations that hold true for all of
several groups can reverse direction
when teh data is combined to form a
single group.
• EX 4.15 pg 299
• This phenomenon is often the result of an
“unaccounted” variable.
Assignment 4.2
• Pg 298 #23-25, 29, 31-35
4.3 ESTABLISHING CAUSATION
Different Relationships
• Suppose two variables (X and Y) have
some correlation
– i.e. when X increases in value, Y increases as
well
– One of the following relationships may hold.
Different Relationships
Causation
• In this relationship, the explanatory
variable is somehow affecting the
response variable.
• In most instances, we are looking to find
evidence of a causation relationship
Different Relationships
Causation
Different Relationships
Common Response
• In this relationship, both X and Y are
correlated to a third (unknown) variable
(Z).
• EX, When Z increases X increases and Y
increases.
• Unless we known about Z, it appears as
though X and Y have a causation
relationship.
Different Relationships
Common Response
Different Relationships
Confounding
• X and Y have correlation,
• An (often unknown) third variable ‘Z”
also has correlation with Y
• Is X the explanatory variable, or is Z the
explanatory variable, or are the both
explanatory variables?
Different Relationships
Confounding
Causation
• The best way to establish causation is
with a carefully designed experiment
– Possible ‘lurking variables’ are controlled
• Experiments cannot always be conducted
– Many times, they are costly or even unethical
• Some guidelines need to be established in
cases where an observational study is the
only method to measure variables.
Causation- some criteria
• Association is strong
• Association is consistent (among different
studies)
• Large values of the response variable are
associated with stronger responses
(typo?)
• The alleged cause precedes the effect in
time
• The alleged cause is probable
Assignment 4.3
Pg312 #41, 45, 50, 51
Chapter 4 Review
• #37, 53, 54, 57

Más contenido relacionado

La actualidad más candente

Section 10: Lagrange's Theorem
Section 10: Lagrange's TheoremSection 10: Lagrange's Theorem
Section 10: Lagrange's TheoremKevin Johnson
 
Roots and Radical Expressions Notes
Roots and Radical Expressions NotesRoots and Radical Expressions Notes
Roots and Radical Expressions Notescmorgancavo
 
7.6 solving radical equations
7.6 solving radical equations7.6 solving radical equations
7.6 solving radical equationshisema01
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relationnuwan udugampala
 
Functional Notations
Functional NotationsFunctional Notations
Functional NotationsGalina Panela
 
Exponents and Radicals (Class 8th)
Exponents and Radicals (Class 8th)Exponents and Radicals (Class 8th)
Exponents and Radicals (Class 8th)Lugiano
 
Implementation of Gauss Elimination Method in c++
Implementation of Gauss Elimination Method in c++Implementation of Gauss Elimination Method in c++
Implementation of Gauss Elimination Method in c++vishal kvs
 
Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2SamsonAjibola
 
Rational exponents
Rational exponentsRational exponents
Rational exponentsDeepak Kumar
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation AnalysisSuresh Babu
 
Logarithmic transformations
Logarithmic transformationsLogarithmic transformations
Logarithmic transformationsamylute
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAjendra7846
 

La actualidad más candente (20)

Section 10: Lagrange's Theorem
Section 10: Lagrange's TheoremSection 10: Lagrange's Theorem
Section 10: Lagrange's Theorem
 
Roots and Radical Expressions Notes
Roots and Radical Expressions NotesRoots and Radical Expressions Notes
Roots and Radical Expressions Notes
 
7.6 solving radical equations
7.6 solving radical equations7.6 solving radical equations
7.6 solving radical equations
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
Functional Notations
Functional NotationsFunctional Notations
Functional Notations
 
Exponents and Radicals (Class 8th)
Exponents and Radicals (Class 8th)Exponents and Radicals (Class 8th)
Exponents and Radicals (Class 8th)
 
Me314 week09-root locusanalysis
Me314 week09-root locusanalysisMe314 week09-root locusanalysis
Me314 week09-root locusanalysis
 
Implementation of Gauss Elimination Method in c++
Implementation of Gauss Elimination Method in c++Implementation of Gauss Elimination Method in c++
Implementation of Gauss Elimination Method in c++
 
Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2Numerical solution of eigenvalues and applications 2
Numerical solution of eigenvalues and applications 2
 
Rational exponents
Rational exponentsRational exponents
Rational exponents
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Ch 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regressionCh 7 correlation_and_linear_regression
Ch 7 correlation_and_linear_regression
 
Statistics (recap)
Statistics (recap)Statistics (recap)
Statistics (recap)
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation in Statistics
Correlation in StatisticsCorrelation in Statistics
Correlation in Statistics
 
A correlation analysis.ppt 2018
A correlation analysis.ppt 2018A correlation analysis.ppt 2018
A correlation analysis.ppt 2018
 
2 rules for radicals
2 rules for radicals2 rules for radicals
2 rules for radicals
 
Logarithmic transformations
Logarithmic transformationsLogarithmic transformations
Logarithmic transformations
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 

Destacado

Who, what, why, and where
Who, what, why, and whereWho, what, why, and where
Who, what, why, and whereLo-Ann Placido
 
I/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason ManaoisI/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason ManaoisRoi Xcel
 
01 psychological statistics 1
01 psychological statistics 101 psychological statistics 1
01 psychological statistics 1Noushad Feroke
 
Module 1 statistics
Module 1   statisticsModule 1   statistics
Module 1 statisticsdionesioable
 
Psychological Testing Techniques
Psychological Testing TechniquesPsychological Testing Techniques
Psychological Testing Techniquespsychegames2
 
Nature and use of Psychological Tests
Nature and use of Psychological TestsNature and use of Psychological Tests
Nature and use of Psychological TestsLenie Rose Julia
 
Psychological assessment and test
Psychological assessment and testPsychological assessment and test
Psychological assessment and testAashish Parihar
 
Psychological testing, meaning, advantages and limitations
Psychological testing, meaning, advantages and limitationsPsychological testing, meaning, advantages and limitations
Psychological testing, meaning, advantages and limitationsUsman Public School System
 
Correlation
CorrelationCorrelation
CorrelationTech_MX
 
Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)Namrata Saxena
 
Psychological test meaning, concept, need & importance
Psychological test   meaning, concept, need & importancePsychological test   meaning, concept, need & importance
Psychological test meaning, concept, need & importancejd singh
 
correlation_and_covariance
correlation_and_covariancecorrelation_and_covariance
correlation_and_covarianceEkta Doger
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
 

Destacado (18)

Who, what, why, and where
Who, what, why, and whereWho, what, why, and where
Who, what, why, and where
 
I/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason ManaoisI/O chapter 2 by Jason Manaois
I/O chapter 2 by Jason Manaois
 
01 psychological statistics 1
01 psychological statistics 101 psychological statistics 1
01 psychological statistics 1
 
Introduction totestingandassessment
Introduction totestingandassessmentIntroduction totestingandassessment
Introduction totestingandassessment
 
Module 1 statistics
Module 1   statisticsModule 1   statistics
Module 1 statistics
 
Psychological Testing Techniques
Psychological Testing TechniquesPsychological Testing Techniques
Psychological Testing Techniques
 
Nature and use of Psychological Tests
Nature and use of Psychological TestsNature and use of Psychological Tests
Nature and use of Psychological Tests
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 
Psychological assessment and test
Psychological assessment and testPsychological assessment and test
Psychological assessment and test
 
Psychological testing, meaning, advantages and limitations
Psychological testing, meaning, advantages and limitationsPsychological testing, meaning, advantages and limitations
Psychological testing, meaning, advantages and limitations
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)Correlation of subjects in school (b.ed notes)
Correlation of subjects in school (b.ed notes)
 
Psychological test meaning, concept, need & importance
Psychological test   meaning, concept, need & importancePsychological test   meaning, concept, need & importance
Psychological test meaning, concept, need & importance
 
Correlation ppt...
Correlation ppt...Correlation ppt...
Correlation ppt...
 
correlation_and_covariance
correlation_and_covariancecorrelation_and_covariance
correlation_and_covariance
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar a Stats chapter 4

Mathematics in the Modern World (Patterns and Sequences).pptx
Mathematics in the Modern World (Patterns and Sequences).pptxMathematics in the Modern World (Patterns and Sequences).pptx
Mathematics in the Modern World (Patterns and Sequences).pptxReignAnntonetteYayai
 
KAREN AND HANNAEH'S INSET SLIDES.pptx
KAREN AND HANNAEH'S INSET SLIDES.pptxKAREN AND HANNAEH'S INSET SLIDES.pptx
KAREN AND HANNAEH'S INSET SLIDES.pptxMeimeiMC
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Workshop 4
Workshop 4Workshop 4
Workshop 4eeetq
 
Lecture 01 - Linear Equations.ppt
Lecture 01 - Linear Equations.pptLecture 01 - Linear Equations.ppt
Lecture 01 - Linear Equations.pptAdeelIftikhar8
 
Sorting-algorithmbhddcbjkmbgjkuygbjkkius.pdf
Sorting-algorithmbhddcbjkmbgjkuygbjkkius.pdfSorting-algorithmbhddcbjkmbgjkuygbjkkius.pdf
Sorting-algorithmbhddcbjkmbgjkuygbjkkius.pdfArjunSingh81957
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)Alona Hall
 
AP Advantage: AP Calculus
AP Advantage: AP CalculusAP Advantage: AP Calculus
AP Advantage: AP CalculusShashank Patil
 

Similar a Stats chapter 4 (20)

R meetup lm
R meetup lmR meetup lm
R meetup lm
 
26 assumptions
26 assumptions26 assumptions
26 assumptions
 
5954987.ppt
5954987.ppt5954987.ppt
5954987.ppt
 
Mathematics in the Modern World (Patterns and Sequences).pptx
Mathematics in the Modern World (Patterns and Sequences).pptxMathematics in the Modern World (Patterns and Sequences).pptx
Mathematics in the Modern World (Patterns and Sequences).pptx
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
KAREN AND HANNAEH'S INSET SLIDES.pptx
KAREN AND HANNAEH'S INSET SLIDES.pptxKAREN AND HANNAEH'S INSET SLIDES.pptx
KAREN AND HANNAEH'S INSET SLIDES.pptx
 
Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes Brock Butlett Time Series-Great Lakes
Brock Butlett Time Series-Great Lakes
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Transformation of variables
Transformation of variablesTransformation of variables
Transformation of variables
 
Workshop 4
Workshop 4Workshop 4
Workshop 4
 
Av 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background MaterialAv 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background Material
 
Lecture 01 - Linear Equations.ppt
Lecture 01 - Linear Equations.pptLecture 01 - Linear Equations.ppt
Lecture 01 - Linear Equations.ppt
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Sorting-algorithmbhddcbjkmbgjkuygbjkkius.pdf
Sorting-algorithmbhddcbjkmbgjkuygbjkkius.pdfSorting-algorithmbhddcbjkmbgjkuygbjkkius.pdf
Sorting-algorithmbhddcbjkmbgjkuygbjkkius.pdf
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
Lecture 4
Lecture 4Lecture 4
Lecture 4
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)Changing the subject of a formula (Simple Formulae)
Changing the subject of a formula (Simple Formulae)
 
AP Advantage: AP Calculus
AP Advantage: AP CalculusAP Advantage: AP Calculus
AP Advantage: AP Calculus
 

Más de Richard Ferreria (20)

Chapter6
Chapter6Chapter6
Chapter6
 
Chapter2
Chapter2Chapter2
Chapter2
 
Chapter3
Chapter3Chapter3
Chapter3
 
Chapter8
Chapter8Chapter8
Chapter8
 
Chapter1
Chapter1Chapter1
Chapter1
 
Chapter4
Chapter4Chapter4
Chapter4
 
Chapter7
Chapter7Chapter7
Chapter7
 
Chapter5
Chapter5Chapter5
Chapter5
 
Chapter9
Chapter9Chapter9
Chapter9
 
Chapter14
Chapter14Chapter14
Chapter14
 
Chapter15
Chapter15Chapter15
Chapter15
 
Chapter11
Chapter11Chapter11
Chapter11
 
Chapter12
Chapter12Chapter12
Chapter12
 
Chapter10
Chapter10Chapter10
Chapter10
 
Chapter13
Chapter13Chapter13
Chapter13
 
Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)Adding grades to your google site v2 (dropbox)
Adding grades to your google site v2 (dropbox)
 
Stats chapter 14
Stats chapter 14Stats chapter 14
Stats chapter 14
 
Stats chapter 15
Stats chapter 15Stats chapter 15
Stats chapter 15
 
Stats chapter 13
Stats chapter 13Stats chapter 13
Stats chapter 13
 
Stats chapter 12
Stats chapter 12Stats chapter 12
Stats chapter 12
 

Último

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Último (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Stats chapter 4

  • 1. Chapter 4 More about relationships between 2 variables
  • 2. 4.1 TRANSFORMING TO ACHIEVE LINEARITY
  • 3. What if the scatterplot is not linear? • Of course not all data is linear! • Our method in statistics will involving mathematically operating on one or both of the explanatory and response variables • An inverse transformation will be used to create a non-linear regression model • This will be a little “mathy”
  • 4. Transformations • Before we begin transformations, remember that some well known phenomenon act in predictable ways – I.e. when working with time and gravity, you should know that there is a square relationship between distance and time!
  • 5. The Basics • The data from measurements (raw data) must be operated on. • Apply the same mathematical transformation on the raw data – Ex. “Square every response” • Use methods from the previous chapter to find the LSRL for the transformed data • Analyze your regression to ensure the LSRL is appropriate • Apply an inverse transformation on the LSRL to find the regression for the raw data.
  • 6. Example Please refer to p 265 exercise 4.2 Length (cm) Period (s) 16.5 0.777 17.5 0.839 19.5 0.912 22.5 0.878 28.5 1.004 31.5 1.087 34.5 1.129 37.5 1.111 43.5 1.290 46.5 1.371 106.5 2.115
  • 7. Example • Data inputted into L1 and L2 • Scatterplot • Looks pretty good, right?
  • 8. Exercise • LSRL • Y=.6+.015X r = 0.991 • Residual Plot • Perhaps we can do better!
  • 9. Example • L3 = L2^.5 (square root) • LinReg L1, L3 • Note that the value of r2 has increased • Note that the value of the residual of the last point has decreased
  • 10. Exponential Models • Many natural phenomenon are explained by an exponential model. • Exponential models are marked by sharp increases in growth and decay. • Basic model: y = A·Bx • For this transformation, you need to take the logarithm of the response data. • You may use “log10” or “ln” your choice. – I prefer “ln” (of course)
  • 11. Exponential Models After the transformation, we have the following linear model: ln(y) = a + b·x 1. ln(y) = a + b·x 2. eln(y) = e(a + b·x) exponentiate 3. y = ea · ebx property of logarithms 4. Let ‘A’ = ea redefine variables ‘B’ = eb 5. y = A·Bx this is our model
  • 12. Exponential Models • Since this is an ‘applied math’ course, you need not remember how to apply the inverse transformation • Whew • BUT you do need to memorize: when ln(y) = a + bx y = A·Bx where ‘A’ = ea and ‘B’ = eb
  • 14. Exponential Models Take the ln of L2- the response list and store in L3
  • 15. Exponential Models These are our “transformed responses”
  • 16. Exponential Models From our homescreen, we perform an LSRL using the transformed data
  • 17. Exponential Models We don’t have to store this regression for transformed data
  • 18. Exponential Models Take note of the values of ‘a’ and ‘b’
  • 19. Exponential Models A quick look at the residuals
  • 20. Exponential Models The values of the residuals are small .. . no defined pattern
  • 21. Exponential Models • Our regression model is exponential y = A·Bx Where A = ea and B = eb • y = e0.701 · (e0.184)x
  • 22. Exponential Models • Our regression model is exponential y = A·Bx Where A = ea and B = eb • y = e0.701 x (e0.184)x
  • 23. Exponential Models • Our regression model is exponential y = A·Bx Where A = ea and B = eb • y = e0.701 x (e0.184)x • Or y = 2.06 · (1.20)x
  • 24. Exponential Models Put our regression in Y1
  • 25. Exponential Models Change Plot1 from a resid. to a scatter plot
  • 27. Power Models • These models are used when the rate of increase is less severe than an exponential model, or if you suspect a ‘root’ model • For this model, you will find the logarithms of both the expl var and the resp var
  • 28. Power models LSRL on transformed data yields: ln(y) = a + b·ln(x) 1. ln(y) = a + b·ln(x) 2. e ln(y) = e(a + b·ln(x)) 3. y = ea·eln(x^b) 4. y = ea ·xb 5. Let ‘A’ = ea 6. y = A · xb
  • 29. Power models Let’s use this data to find a power model
  • 30. Power models This time we need to transform both lists
  • 31. Power models This time we need to transform both lists
  • 32. Power models Transformed exp = L3 Transformed resp = L4
  • 33. Power models LSRL on transformed data no need to store in Y1
  • 34. Power models Take note of the values of ‘a’ and ‘b’
  • 35. Power models A quick look at the residuals
  • 36. Power models Note that we use the transformed exp var
  • 38. Power models Residuals are all small in size
  • 39. Power models • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea Our model is y = (e1.31)· x1.27
  • 40. Power models • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea Our model is y = (e1.31) · x1.27
  • 41. Power models • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea Our model is y = (e1.31) · x1.27 Or y = 3.71 · x1.27
  • 43. Power models Change from resid to scatter plot
  • 46. Power models • Much like the exponential model, you only need to know how the transformed model becomes the model for the raw data. • When ln(y) = a + b·ln(x), y = A · xb where ‘A’ = ea
  • 47. Transformation thoughts • Although this is not a major topic for the course, you still need to be able to apply these two transformations (exp and power) • Be sure to check the residuals for the LSRL on transformed data! You may have picked the wrong model :/ • If one model doesn’t work, try the other. I would start with the exponential model. • Don’t transform into a cockroach. Ask Kafka!
  • 48. Assn 4.1 • pg 276 #5, 8, 9, 11, 12
  • 50. Marginal Distributions • Tables that relate two categorical variables are called “Two-Way Tables” – Ex 4.11 pg 292 • Marginal Distribution – Very fancy term for “row totals and column totals” – Named because the totals appear in the margins of the table. Wow. • Often, the percentage of the row or column table is very informative
  • 51. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639
  • 52. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Column Totals
  • 53. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Row Totals
  • 54. Marginal Distributions Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Grand Total
  • 56. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639
  • 57. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Row total / grand total 150/16639=0.009
  • 58. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 0.9% 18-24 5668 4697 10365 25-34 1904 1589 3494 35 or older 1660 970 2630 Totals 9321 7317 16639 Row total / grand total 150/16639=0.009
  • 59. Marginal Distributions “Age Group” Age Group Female Male Total Marg. Dist. 15-17 89 61 150 0.9% 18-24 5668 4697 10365 62.3% 25-34 1904 1589 3494 21.0% 35 or older 1660 970 2630 15.8% Totals 9321 7317 16639 100% Adds to 100%
  • 60. Marginal Distributions “Gender” Age Group Female Male Total 15-17 89 61 150 18-24 5668 4697 10365 25-34 1904 1589 3494 35 &up 1660 970 2630 Totals 9321 7317 16639 Margin dist. 56% 44% 100% Similarly for columns
  • 61. Describing Relationships • Some relationships are easier to see when we look at the proportions within each group • These distributions are called “Conditional Distributions” • To find a conditional distribution, find each percentage of the row or column total. • Let’s look at the same table, and find the conditional distribution of gender, given each age group
  • 62. Conditional Distributions Age Group Female Male Total 15-17 89 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%)
  • 63. Conditional Distributions Age Group Female Male Total 15-17 89 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) We will look at the conditional distribution for this row
  • 64. Conditional Distributions Age Group Female Male Total 15-17 89 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 89/150 (cell total /row total) =53.9%
  • 65. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 89/150 (cell total /row total) =59.3%
  • 66. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 61/150 (cell total /row total) =40.7%
  • 67. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) This cell is 61/150 (cell total /row total) =40.7%
  • 68. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%)
  • 69. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) The table with complete conditional distributions for each row
  • 70. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) For an analysis of the effect of age groups, compare a row’s conditional distribution…
  • 71. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) With the marginal distribution for the columns…
  • 72. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) They should be close …
  • 73. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) … unless there is an effect caused by the age group (?)
  • 74. Conditional Distributions Age Group Female Male Total 15-17 89 (59.3%) 61 (40.7%) 150 (100%) 18-24 5668 (54.7%) 4697 (45.3%) 10365 (100%) 25-34 1904 (54.5%) 1589 (45.5%) 3494 (100%) 35 or older 1660 (63.1%) 970 (36.9%) 2630 (100%) Totals 9321 (56%) 7317 (44%) 16639 (100%) … and these are not close to the marginal distribution!
  • 75. Conditional Distributions • Based on the previous table, the distribution of “gender given age group” are not that different. • We can see that the “35 and older” group seems to differ slightly from the overall trend.
  • 76. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%)
  • 77. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%) Here is the same chart with the conditional distributions by gender…
  • 78. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%) Is there a gender effect noticeable from this table?
  • 79. Conditional Distributions “age group given gender” Age Group Female Male Total 15-17 89 (1%) 61 (0.8%) 150 (0.9%) 18-24 5668 (60.8%) 4697 (64.2%) 10365 (62.3%) 25-34 1904 (20.4%) 1589 (21.7%) 3494 (21.0%) 35 or older 1660 (17.8%) 970 (13.3%) 2630 (15.8%) Totals 9321 (100%) 7317 (100%) 16639 (100%)
  • 80. Conditional Distribution Conclusions from the previous chart • Females are more likely to be in the “35 and older group” and less likely to be in the “18 to 24” group • Males are more likely to be in the “18 to 24” group and less likely to be in the “35 and older” group • These differences appear slight. Are actually “significant” with respect to the overall distribution?
  • 81. Conditional Distribution • No single graph portrays the form of the relationship between categorical variables. • No single numerical measure (such as correlation) summarizes the strength of the association.
  • 82. Simpson’s Paradox • Associations that hold true for all of several groups can reverse direction when teh data is combined to form a single group. • EX 4.15 pg 299 • This phenomenon is often the result of an “unaccounted” variable.
  • 83. Assignment 4.2 • Pg 298 #23-25, 29, 31-35
  • 85. Different Relationships • Suppose two variables (X and Y) have some correlation – i.e. when X increases in value, Y increases as well – One of the following relationships may hold.
  • 86. Different Relationships Causation • In this relationship, the explanatory variable is somehow affecting the response variable. • In most instances, we are looking to find evidence of a causation relationship
  • 88. Different Relationships Common Response • In this relationship, both X and Y are correlated to a third (unknown) variable (Z). • EX, When Z increases X increases and Y increases. • Unless we known about Z, it appears as though X and Y have a causation relationship.
  • 90. Different Relationships Confounding • X and Y have correlation, • An (often unknown) third variable ‘Z” also has correlation with Y • Is X the explanatory variable, or is Z the explanatory variable, or are the both explanatory variables?
  • 92. Causation • The best way to establish causation is with a carefully designed experiment – Possible ‘lurking variables’ are controlled • Experiments cannot always be conducted – Many times, they are costly or even unethical • Some guidelines need to be established in cases where an observational study is the only method to measure variables.
  • 93. Causation- some criteria • Association is strong • Association is consistent (among different studies) • Large values of the response variable are associated with stronger responses (typo?) • The alleged cause precedes the effect in time • The alleged cause is probable
  • 95. Chapter 4 Review • #37, 53, 54, 57

Notas del editor

  1. Chapter begins Pg 256