SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Jennifer Siegel
Statistical background
Z-Test
T-Test
Anovas
 Science tries to predict the future
 Genuine effect?
 Attempt to strengthen predictions with stats
 Use P-Value to indicate our level of certainty that result =
genuine effect on whole population (more on this later…)
 Develop an experimental hypothesis
 H0 = null hypothesis
 H1 = alternative hypothesis
 Statistically significant result
 P Value = .05
 Probability that observed result is true
 Level = .05 or 5%
 95% certain our experimental effect is genuine
 Type 1 = false positive
 Type 2 = false negative
 P = 1 – Probability of Type 1 error
 Let’s pretend you came up with the following theory…
Having a baby increases brain volume (associated with
possible structural changes)
Z - test
T - test
 Population




x
z
 Cost
 Not able to include everyone
 Too time consuming
 Ethical right to privacy
Realistically researchers can only do sample based
studies
 T = differences between sample means / standard error
of sample means
 Degrees of freedom = sample size - 1
2
1
2
1
x
x
s
x
x
t



means
between
s
difference
of
error
dard
tan
s
estimated
means
sample
between
s
difference
t
_
_
_
_
_
_
_
_
_

2
2
2
1
2
1
2
1
n
s
n
s
s x
x 


 H0 = There is no difference in brain size before or after
giving birth
 H1 = The brain is significantly smaller or significantly
larger after giving birth (difference detected)
Before Delivery 6 Weeks After Delivery Difference
1437.4 1494.5 57.1
1089.2 1109.7 20.5
1201.7 1245.4 43.7
1371.8 1383.6 11.8
1207.9 1237.7 29.8
1150.7 1180.1 29.4
1221.9 1268.8 46.9
1208.7 1248.3 39.6
Sum 9889.3 10168.1 278.8
Mean 1236.1625 1271.0125 34.85
SD 113.8544928 119.0413426 5.18685
T=(1271-1236)/(119-113)
T 6.718914454
DF 7
http://www.danielsoper.com/statcalc/calc08.aspx
Women have a significantly larger brain after giving birth
 One-sample (sample vs. hypothesized mean)
 Independent groups (2 separate groups)
 Repeated measures (same group, different
measure)
 ANalysis Of VAriance
 Factor = what is being compared (type of pregnancy)
 Levels = different elements of a factor (age of mother)
 F-Statistic
 Post hoc testing
 1 Way Anova
 1 factor with more than 2 levels
 Factorial Anova
 More than 1 factor
 Mixed Design Anovas
 Some factors are independent, others are related
 There is a significant difference somewhere between
groups
 NOT where the difference lies
 Finding exactly where the difference lies requires
further statistical analysis = post hoc analysis
 Z-Tests for populations
 T-Tests for samples
 ANOVAS compare more than 2 groups in more
complicated scenarios
Varun V.Sethi
Objective
Correlation
Linear Regression
Take Home Points.
Correlation
- How much linear is the relationship of two
variables? (descriptive)
Regression
- How good is a linear model to explain my data?
(inferential)
Correlation
Correlation reflects the noisiness and direction of a linear relationship (top row),
but not the slope of that relationship (middle), nor many aspects of nonlinear
relationships (bottom).
 Strength and direction of the relationship between
variables
 Scattergrams
Y
X
Y
Y
X
Y
Y Y
Positive correlation Negative correlation No correlation
Measures of Correlation
1) Covariance
2) Pearson Correlation Coefficient (r)
1) Covariance
- The covariance is a statistic representing the degree to which 2
variables vary together
{Note that Sx
2 = cov(x,x) }
n
y
y
x
x
y
x
i
n
i
i )
)(
(
)
,
cov( 1





 A statistic representing the degree to which 2 variables
vary together
 Covariance formula
 cf. variance formula
n
y
y
x
x
y
x
i
n
i
i )
)(
(
)
,
cov( 1





n
x
x
S
n
i
i
x
2
1
2
)
(




2) Pearson correlation coefficient (r)
- r is a kind of ‘normalised’ (dimensionless) covariance
- r takes values fom -1 (perfect negative correlation) to 1 (perfect
positive correlation). r=0 means no correlation
y
x
xy
s
s
y
x
r
)
,
cov(
 (S = st dev of sample)
Limitations:
Sensitive to extreme values
Relationship not a prediction.
Not Causality
Regression: Prediction of one variable from knowledge of one or
more other variables
How good is a linear model (y=ax+b) to explain the relationship of two
variables?
- If there is such a relationship, we can ‘predict’ the value y for a given x.
(25, 7.498)
Linear dependence between 2 variables
Two variables are linearly dependent when the increase of one variable
is proportional to the increase of the other one
x
y
Samples: - Energy needed to boil water
- Money needed to buy coffeepots
Fiting data to a straight line (o viceversa):
Here, ŷ = ax + b
– ŷ : predicted value of y
– a: slope of regression line
– b: intercept
Residual error (εi): Difference between obtained and predicted values of y (i.e. yi- ŷi)
Best fit line (values of b and a) is the one that minimises the sum of squared errors
(SSerror) (yi- ŷi)2
ε i
εi = residual
= yi , observed
= ŷi, predicted
ŷ = ax + b
Adjusting the straight line to data:
• Minimise (yi- ŷi)2 , which is (yi-axi+b)2
• Minimum SSerror is at the bottom of the curve where the gradient is zero
– and this can found with calculus
• Take partial derivatives of (yi-axi-b)2 respect parametres a and b and
solve for 0 as simultaneous equations, giving:
• This can always be done
x
y
s
rs
a  x
a
y
b 

 We can calculate the regression line for any data, but how well does it fit the
data?
 Total variance = predicted variance + error variance
sy
2 = sŷ
2 + ser
2
 Also, it can be shown that r2 is the proportion of the variance in y that is
explained by our regression model
r2 = sŷ
2 / sy
2
 Insert r2 sy
2 into sy
2 = sŷ
2 + ser
2 and rearrange to get:
ser
2 = sy
2 (1 – r2)
From this we can see that the greater the correlation
the smaller the error variance, so the better our
prediction
 Do we get a significantly better prediction of y
from our regression equation than by just
predicting the mean?
F-statistic
 Prediction / Forecasting
 Quantify strength between y and Xj ( X1, X2, X3 )
 A General Linear Model is just any model that
describes the data in terms of a straight line
 Linear regression is actually a form of the General
Linear Model where the parameters are b, the slope of
the line, and a, the intercept.
y = bx + a +ε
 Multiple regression is used to determine the effect of a
number of independent variables, x1, x2, x3 etc., on a single
dependent variable, y
 The different x variables are combined in a linear way and
each has its own regression coefficient:
y = b0 + b1x1+ b2x2 +…..+ bnxn + ε
 The a parameters reflect the independent contribution of
each independent variable, x, to the value of the dependent
variable, y.
 i.e. the amount of variance in y that is accounted for by each x
variable after all the other x variables have been accounted for
Take Home Points
- Correlated doesn’t mean related.
e.g, any two variables increasing or decreasing over time would show a
nice correlation: C02 air concentration in Antartica and lodging rental cost
in London. Beware in longitudinal studies!!!
- Relationship between two variables doesn’t mean causality
(e.g leaves on the forest floor and hours of sun)
 Linear regression is a GLM that models the effect of one
independent variable, x, on one dependent variable, y
 Multiple Regression models the effect of several
independent variables, x1, x2 etc, on one dependent
variable, y
 Both are types of General Linear Model
Thank You

Más contenido relacionado

Similar a TTests.ppt

SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptAdnanAli861711
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.pptbranlymbunga1
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptrahulrkmgb09
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.pptarkian3
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and RegressionShubham Mehta
 
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...Ghaneshwer Jharbade
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
Statistical Relationships
Statistical RelationshipsStatistical Relationships
Statistical Relationshipsmandrewmartin
 
Calculation of covariance and simple linear models
Calculation of covariance and simple linear models Calculation of covariance and simple linear models
Calculation of covariance and simple linear models Thendral Kumaresan
 

Similar a TTests.ppt (20)

Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
SimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.pptSimpleLinearRegressionAnalysisWithExamples.ppt
SimpleLinearRegressionAnalysisWithExamples.ppt
 
Linear regression.ppt
Linear regression.pptLinear regression.ppt
Linear regression.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Slideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.pptSlideset Simple Linear Regression models.ppt
Slideset Simple Linear Regression models.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
lecture13.ppt
lecture13.pptlecture13.ppt
lecture13.ppt
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
Statistics Statistics
Statistics
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...correlation and r3433333333333333333333333333333333333333333333333egratio111n...
correlation and r3433333333333333333333333333333333333333333333333egratio111n...
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Correlation
CorrelationCorrelation
Correlation
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
Statistical Relationships
Statistical RelationshipsStatistical Relationships
Statistical Relationships
 
Reg
RegReg
Reg
 
Chi2 Anova
Chi2 AnovaChi2 Anova
Chi2 Anova
 
Calculation of covariance and simple linear models
Calculation of covariance and simple linear models Calculation of covariance and simple linear models
Calculation of covariance and simple linear models
 

Último

4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptxmary850239
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressMaria Paula Aroca
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfChristalin Nelson
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...Nguyen Thanh Tu Collection
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 

Último (20)

4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx4.9.24 Social Capital and Social Exclusion.pptx
4.9.24 Social Capital and Social Exclusion.pptx
 
Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,Spearman's correlation,Formula,Advantages,
Spearman's correlation,Formula,Advantages,
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
The Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian CongressThe Emergence of Legislative Behavior in the Colombian Congress
The Emergence of Legislative Behavior in the Colombian Congress
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
DBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdfDBMSArchitecture_QueryProcessingandOptimization.pdf
DBMSArchitecture_QueryProcessingandOptimization.pdf
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC ...
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 

TTests.ppt

  • 3.  Science tries to predict the future  Genuine effect?  Attempt to strengthen predictions with stats  Use P-Value to indicate our level of certainty that result = genuine effect on whole population (more on this later…)
  • 4.
  • 5.  Develop an experimental hypothesis  H0 = null hypothesis  H1 = alternative hypothesis  Statistically significant result  P Value = .05
  • 6.  Probability that observed result is true  Level = .05 or 5%  95% certain our experimental effect is genuine
  • 7.  Type 1 = false positive  Type 2 = false negative  P = 1 – Probability of Type 1 error
  • 8.
  • 9.  Let’s pretend you came up with the following theory… Having a baby increases brain volume (associated with possible structural changes)
  • 10. Z - test T - test
  • 12.  Cost  Not able to include everyone  Too time consuming  Ethical right to privacy Realistically researchers can only do sample based studies
  • 13.  T = differences between sample means / standard error of sample means  Degrees of freedom = sample size - 1 2 1 2 1 x x s x x t    means between s difference of error dard tan s estimated means sample between s difference t _ _ _ _ _ _ _ _ _  2 2 2 1 2 1 2 1 n s n s s x x   
  • 14.
  • 15.  H0 = There is no difference in brain size before or after giving birth  H1 = The brain is significantly smaller or significantly larger after giving birth (difference detected)
  • 16. Before Delivery 6 Weeks After Delivery Difference 1437.4 1494.5 57.1 1089.2 1109.7 20.5 1201.7 1245.4 43.7 1371.8 1383.6 11.8 1207.9 1237.7 29.8 1150.7 1180.1 29.4 1221.9 1268.8 46.9 1208.7 1248.3 39.6 Sum 9889.3 10168.1 278.8 Mean 1236.1625 1271.0125 34.85 SD 113.8544928 119.0413426 5.18685 T=(1271-1236)/(119-113)
  • 17. T 6.718914454 DF 7 http://www.danielsoper.com/statcalc/calc08.aspx Women have a significantly larger brain after giving birth
  • 18.  One-sample (sample vs. hypothesized mean)  Independent groups (2 separate groups)  Repeated measures (same group, different measure)
  • 19.
  • 20.  ANalysis Of VAriance  Factor = what is being compared (type of pregnancy)  Levels = different elements of a factor (age of mother)  F-Statistic  Post hoc testing
  • 21.  1 Way Anova  1 factor with more than 2 levels  Factorial Anova  More than 1 factor  Mixed Design Anovas  Some factors are independent, others are related
  • 22.  There is a significant difference somewhere between groups  NOT where the difference lies  Finding exactly where the difference lies requires further statistical analysis = post hoc analysis
  • 23.
  • 24.  Z-Tests for populations  T-Tests for samples  ANOVAS compare more than 2 groups in more complicated scenarios
  • 27.
  • 28. Correlation - How much linear is the relationship of two variables? (descriptive) Regression - How good is a linear model to explain my data? (inferential)
  • 29.
  • 30. Correlation Correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom).
  • 31.  Strength and direction of the relationship between variables  Scattergrams Y X Y Y X Y Y Y Positive correlation Negative correlation No correlation
  • 32.
  • 33. Measures of Correlation 1) Covariance 2) Pearson Correlation Coefficient (r)
  • 34. 1) Covariance - The covariance is a statistic representing the degree to which 2 variables vary together {Note that Sx 2 = cov(x,x) } n y y x x y x i n i i ) )( ( ) , cov( 1     
  • 35.  A statistic representing the degree to which 2 variables vary together  Covariance formula  cf. variance formula n y y x x y x i n i i ) )( ( ) , cov( 1      n x x S n i i x 2 1 2 ) (    
  • 36. 2) Pearson correlation coefficient (r) - r is a kind of ‘normalised’ (dimensionless) covariance - r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation y x xy s s y x r ) , cov(  (S = st dev of sample)
  • 37.
  • 38. Limitations: Sensitive to extreme values Relationship not a prediction. Not Causality
  • 39.
  • 40. Regression: Prediction of one variable from knowledge of one or more other variables
  • 41. How good is a linear model (y=ax+b) to explain the relationship of two variables? - If there is such a relationship, we can ‘predict’ the value y for a given x. (25, 7.498)
  • 42. Linear dependence between 2 variables Two variables are linearly dependent when the increase of one variable is proportional to the increase of the other one x y Samples: - Energy needed to boil water - Money needed to buy coffeepots
  • 43.
  • 44. Fiting data to a straight line (o viceversa): Here, ŷ = ax + b – ŷ : predicted value of y – a: slope of regression line – b: intercept Residual error (εi): Difference between obtained and predicted values of y (i.e. yi- ŷi) Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (yi- ŷi)2 ε i εi = residual = yi , observed = ŷi, predicted ŷ = ax + b
  • 45.
  • 46. Adjusting the straight line to data: • Minimise (yi- ŷi)2 , which is (yi-axi+b)2 • Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus • Take partial derivatives of (yi-axi-b)2 respect parametres a and b and solve for 0 as simultaneous equations, giving: • This can always be done x y s rs a  x a y b  
  • 47.  We can calculate the regression line for any data, but how well does it fit the data?  Total variance = predicted variance + error variance sy 2 = sŷ 2 + ser 2  Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model r2 = sŷ 2 / sy 2  Insert r2 sy 2 into sy 2 = sŷ 2 + ser 2 and rearrange to get: ser 2 = sy 2 (1 – r2) From this we can see that the greater the correlation the smaller the error variance, so the better our prediction
  • 48.  Do we get a significantly better prediction of y from our regression equation than by just predicting the mean? F-statistic
  • 49.  Prediction / Forecasting  Quantify strength between y and Xj ( X1, X2, X3 )
  • 50.  A General Linear Model is just any model that describes the data in terms of a straight line  Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept. y = bx + a +ε
  • 51.  Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y  The different x variables are combined in a linear way and each has its own regression coefficient: y = b0 + b1x1+ b2x2 +…..+ bnxn + ε  The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y.  i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for
  • 52. Take Home Points - Correlated doesn’t mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C02 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!! - Relationship between two variables doesn’t mean causality (e.g leaves on the forest floor and hours of sun)
  • 53.  Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y  Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y  Both are types of General Linear Model

Notas del editor

  1. We can use information about distributions to decide how probable it is that the results of an experiment looking at variable x support a particular hypothesis about the distribution of variable y in the population. = central aim of experimental science This is how statistical tests work: test a sample distribution (our experimental results) against a hypothesised distribution, resulting in a ‘p’ value for how likely it is that we would obtain our results under the null hypothesis (null hypothesis = there is no effect or difference between conditions) – i.e. how likely it is that our results were a fluke!
  2. Normal distribution The x-axis represents the values of a particular variable The y-axis represents the proportion of members of the population that have each value of the variable The area under the curve represents probability Mean and standard deviation tell you the basic features of a distribution mean = average value of all members of the group standard deviation = a measure of how much the values of individual members vary in relation to the mean The normal distribution is symmetrical about the mean 68% of the normal distribution lies within 1 s.d. of the mean Now, it’s important to remember that not all data has this distribution and if you data doesn’t fit this normal distribution, you will have to use another type of test. Normal distribution of data is an assumption of T-tests. If your data doesn’t look like this, you have to try another statistical test (like Chi Squared)
  3. A hypothesis is a prediction that you have about a specific group. H1 = the experimental hypothesis – significant difference (or between samples) H0 = the null hypothesis states that your experimental group is no different from the rest of the population. statistically significant difference between sample & population (or between samples) To get a statistically significant results, you need to show that your experimental group falls on the higher end of probability. One-sided test One sided test is like it sounds. The value for the null hypothesis is an entire section of the distribution. You would use this if you wanted to know if your experimental group is only significantly greater than another group. A one-sided test is a statistical hypothesis test in which the values for which we can reject the null hypothesis, H0 are located entirely in one tail of the probability distribution. In other words, the critical region for a one-sided test is the set of values less than the critical value of the test, or the set of values greater than the critical value of the test. A one-sided test is also referred to as a one-tailed test of significance. Two-sided test A two-sided test is a statistical hypothesis test in which the values for which we can reject the null hypothesis, H0 are located in both tails of the probability distribution. You would basically use this if you wanted to know if your experimental group is greater than or less than another group. In other words, the critical region for a two-sided test is the set of values less than a first critical value of the test and the set of values greater than a second critical value of the test. A two-sided test is also referred to as a two-tailed test of significance. The choice between a one-sided and a two-sided test is determined by the purpose of the investigation or prior reasons for using a one-sided test.
  4. P values = the probability that the observed result was obtained by chance and it stands for power. The power of a statistical hypothesis test measures the test's ability to reject the null hypothesis when it is actually false - that is, to make a correct decision. α level is set a priori (Usually 0.05). With this P value, we can be 95% certain that our experimental effect is genuine. If p < result is less than the set P value then you reject the null hypothesis and accept the experimental hypothesis If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis
  5. Type I error = false positive. Where we incorrectly accept the alternative/ experimental hypothesis. α level of 0.05 means that there is 5% risk that a type I error will be encountered Type II error = false negative. Reject the alternative/ experimental hypothesis, when it should be accepted. Scientists care more about accepting a false results than rejecting a true one. So Power = 1 – probability of a type I error. The maximum power a test can have is 1, the minimum is 0. Ideally we want a test to have high power, close to 1. Beware of errors α level of 0.05 means that there is 5% risk that a type I error will be encountered. The other type of error is… - type II error = false negative Where we incorrectly reject the exp hypothesis
  6. Here is an example of how these errors work. Let’s say you start dating someone. Imagine, I say, a you just started dating someone. You on their first date he or she mentioned that her birthday was coming up this week. But the you can't remember the exact day. It might be today. Or maybe not. Embarrassed to admit it you decides to make a guess. You have two choices. When you see him or her today you can say "Happy Birthday!" Or he can say nothing, hoping that today isn't her birthday. The reality behind the situation is pretty simple: Either today is her birthday or it isn't. Saying "Happy Birthday!" when it is not her birthday is like a Type 1 error. It is a false positive: I'm saying it is your birthday when, in fact, it isn't. Conversely, staying quiet when today is her birthday is like a Type 2 error. It is a false negative: Today is my birthday and you said nothing to me, you missed it.
  7. Example of a repeated measures T-test. Take same group and test brain volume before birth and after birth.
  8. It’s important to understand the difference between these two. Population is the entire group with everyone included. An example of this would be the US census. If already know the variance of the general population, you can use the Z-test. Realistically researchers can’t sample everyone, so they have the use the T-test when you do not know the variance of a general population and you only have a sub group. However, I should point out here that the strength of this test is dependant on the number of participants.
  9. This is the formula for a 1 sample z-test. Basically, it is a formula that is set up to compare unit groups in a standard way through this linear formula. Again, you use this formula when you have the variance of the entire population. Variance describes how far values lie from the mean. So, you‘ll plub in your mean for the group you are looking at, the mean for the population and the standard variance for the population to get a Z score. One way to make distributions directly comparable, is to standardise them by computing a linear transformation The standardised normal distribution does exactly that This can be thought of as expressing your data in the same ‘units’. Therefore, if you remember from the previous slide, the range of 2 standard deviations around the mean covers approximatley 95%; because the standard deviation of a standardised normal distribution is 1, a z-score of +2 or –2, i.e. 2 std, gives the boundary for our confidence interval Only for 2-tailed tests! See distr. around mean versus area from –infinity to z=2.0
  10. Often you don’t know the s.d. of the hypothesised or comparison population, and so you use a t-test. This uses the sample s.d. or variance instead. This test introduces a source of error, which decreases as your sample size increases. Therefore, the t statistic is distributed differently depending on the size of the sample, like a family of normal curves. The degrees of freedom (df = sample size – 1) represents which of these curves you are relating your t-value to. There are different tables of p-values for different degrees of freedom. larger sample = more ‘squashed’ t-statistic distribution = easier to get significance
  11. So, you would use a two sampled t-test if you wanted to determine if two samples are different. So, we’ll look at our previous hypothesis. Does having a baby increase your brain size.
  12. Because one wants to know if the brain size after giving birth is increase or lessoned than the pre-birth group. This will be a two tailed T-test. If you only wanted to find out if brain size is greater after giving birth, this would be a 1 sample T-test.
  13. This is our sample, taken directly from a paper.
  14. This is particularly important because when using SPSS you will have to specify which type of statistical test you are using. I’m putting this up to show that one can take a group and try to find out if it is different from a hypothesized population. Here in this formula, you have the mean – hypothesized population/ Standard error. This is when you compare the mean of 1 sample to a given value. You might uses this if you were testing sleep behaviour and you thought the sample that you had was not the norm. You could hypothesize that the population sleeps 10 hours and night to determine if your sample was significantly different to this. A one sample t-test is a hypothesis test for answering questions about the mean where the data are a random sample of independent observations from an underlying normal distribution N(µ, sigma^2), where sigma^2 is unknown.
  15. So, let’s say you want to compare more than 2 groups. Lets say you want to look at normal pregnancies vs. Preeclamptic groups. Or if you want to compare several different times a brain scan was taken.
  16. In an experiment with more than 2 samples or more than 2 tasks (or 2 samples and 2 tasks), one could do lots of t-tests and compare all the different groups with each other this way but actually you increase the possibility of accepting the experimental hypothesis when it’s wrong. You’ll remember this as the false negative situation. (this is referred to as familywise/ experimentwise error rate). It is much better to use ANOVA. In its simplest form ANOVA provides a statistical test of whether or not the means of several groups are all equal, and therefore generalizes t-test to more than two groups It looks to see if the variation between groups is different ANOVA is concerned with differences between means of groups, not differences between variances. The name analysis of variance comes from the way the analysis uses variances to decide whether the means are different. The way it works is simple: the statistical proceedure looks to see what the variation (variance) is within the groups, then works out how that variation would translate into variation (i.e. differences) between the groups, taking into account how many subjects there are in the groups. If the observed differences are a lot bigger than what you'd expect by chance, you have statistical significance. (so if the patterns of data spread are similar in your different samples, then the mean won’t be much different, ie the samples are probably from the same population; reversely, if the pattern of variance differs between groups, so will the mean, thus the samples are likely to be drawn from different populations) Terminology: Factors: the overall “things” being compared (type of pregnancy) Levels: the different elements of a factor (young vs old, time to pregnancy) ANOVA tests for one overall effect only so it can tell us if experimental manipulation was generally successful but it doesn’t provide specific information about which specific groups were affected. need for post-hoc testing! ANOVA produces F-statistic or F-ratio which is similar to t-score as it compares the amount of systematic variance in the data to the amount of unsystematic variance. As such, it is the ratio of the experimental effect to the individual differences in performance. If the F=ratio’s value is less than 1, it must represent a non-significant event (so you always want a F-ratio greater than 1, indicating that experimental manipulation had some effect above and beyond the effect of individual differences in performance). To test for significance, compare obtained F-ratio against maximum value one would expect to get by chance alone in an F-distribution with the same degrees of freedom. p-value associated with F is probability that differences between groups could occur by chance if null-hypothesis is correct.
  17. The type that I have described is referred to as a one-way ANOVA because it has one factor which = cartoon characters (with more than 2 levels) Can also have two-way, three-way ANOVAs These = factorial ANOVAs Allow for possible interactions between factors as well as main effects For example you could have 2 factors with 2 levels each This would = a 2 x 2 factorial design Can also have related or independent designs or a mixture
  18. There is a significant difference between the groups NOT where this difference lies Finding exactly where the differences lie requires further statistical analyses. So when you are running a a particular statistical test, you’ll specify that you want to have post hoc values listed. But you’ll need to make sure the overall value is significant.
  19. T-tests assess if two group means differ significantly Can compare two samples or one sample to a given value ANOVAs compare more than two groups or more complicated scenarios