SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Statistics Lab
Rodolfo Metulini
IMT Institute for Advanced Studies, Lucca, Italy
Lesson 3 - Point Estimate, Confidence Interval and Hypotesis
Tests - 20.01.2015
Introduction
Let’s start having empirical data (one variable of length N)
extracted from external file, suppose to consider it to be the
population. We define a sample of size n.
Suppose we do not have information on population (or, better, we
want to check if and how the sample can represent the
population)
We, in other words, want to make inference using the information
contained in the sample, in order to obtain an estimation for the
population.
That sample is one of several samples we can randomly draw from
the population (the sample space).
What are the instruments to obtain infos about the population?
(1) Sample mean (point estimation) (2) Confidence interval (3)
Hypothesis tests
Sample space
In probability theory, the sample space of an experiment or random
trial is the set of all possible outcomes or results of that
experiment.
It is common to refer to a sample space by the labels Ω, (or U, or
S), where Ω is the first element of what we call statistical model:
(Ω, A, {Pθ : θ ∈ Θ). Each element of Ω have its relative θ
value.
For example, for tossing two coins, the corresponding sample space
would be {(head,head), (head,tail), (tail,head), (tail,tail)}, so that
the dimension is 4. dim(Ω) = 4. It means that we can obtain 4
different samples with corresponding 4 different sample means.
Dim(Ω) = xn , where x: number of outcomes in each single
experiment. n: number of experiments.
In practice, we face up with only one sample took at random from
the sample space.
Point estimate
Point estimate (or statistic) permit us to summarize the
information contained in the population (dimension N), throughout
only 1 value constructed using n vales (T = t(X1, ..., Xn))
The most used, unbiased point estimator (statistic) is the sample
mean. ˆXn =
n
1=1 xi
n
Other point estimators are: (1) Sample Median (2) Sample Mode
(3) Geometric mean.
Geometric Mean = Mg = n
i=1 xi
2
= exp[1
n
n
1=1 lnxi ]
An example of what is not an estimator is when you use the
sample mean after sub setting the sample truncating it on a
certain value.
P.S. A Naif definition of estimator: when the estimator is
computed using all the n informations in the sample.
Efficient estimators
The BLUE (Best Linear Unbiased Estimator) is defined as
follow:
1. is a linear function of all the sample values
2. is unbiased (E(T) = θ)
3. has the smallest sample variance among all unbiased
estimators.
The sample mean is BLUE for the parameter µ
Some estimators are biased but consistent: An estimator is
consistent when become unbiased for n −→ ∞
Point estimators - cases
Normal samples: ˆXn is the BLUE estimator for µ parameter
(mean).
Normal samples: s2 = 1
n−1
n
i=1(x1 − ˆXn)2 is the unbiased
estimator of the variance σ2.
Bernoulli samples f (x) = ρx (1 − ρ)1−x : ˆXn is a unbiased
estimator for ρ parameter (frequency).
Poisson samples f (x) =
e−kkx
x!
: ˆXn is a unbiased estimator
for k parameter (which represent both mean and variance of
the distribution).
Exponential samples f (x) = λe−λy 1
ˆXn
:is a unbiased
estimator for λ parameter (density at value 0).
(Chunks 1 to 4)
Confidence interval theory
With point estimators we make use of only one value to infer
about population.
With confidence interval we define a minimum and a maximum
value in which the population parameter we expect to lie.
Formally, we need to calculate:
µ1 = ˆXn − z ∗
σ
√
n
µ2 = ˆXn + z ∗
σ
√
n
and we end up with interval ˆµ = {µ1; µ2}, or I = [T
(1)
α , T
(2)
α ]. It is
used to write P{θ ∈ I} 1 − α
Here: ˆXn is the sample mean; z is the upper (or lower) critical
value of the theoretical distribution. σ is the standard deviation of
the theoretical distribution. n the sample size.
(See the graph)
Confidence interval theory - Gaussian
Remembering that: Theorem if X1, ..., Xn are i.i.d. with
distribution N(µ, σ2), then the distribution of ˆXn is N(µ,
σ2/n)
Let assume that the sample mean is 5, the standard deviation in
population is known and it is equal to 2, and the sample size is
n = 20. In the example below we will use a 95 per cent confidence
level and wish to find the confidence interval.
N.B. Here, since the confidence interval is 95, the z (the critical
value) to consider is the one corresponding with CDF (i.e. dnorm)
= 0.975.
We also can speak of α = 0.05, or 1 − α = 0.95, or
1 − α/2 = 0.975
(Chunk 5)
Confidence interval theory - T-student
We use T − student distribution when n is small and sd is
unknown in population. We need to use a sample variance
estimation: s = (xi − ˆXn)2
n−1
The t-student distribution is more spread out.
In simple words, since we do not know the population sd, we need
for more large intervals (caution - approach).
The only difference with normal distribution, is that we use the
command associated with the t-distribution rather than the normal
distribution. Here we repeat the procedures above, but we will
assume that we are working with a sample standard deviation
rather than an exact standard deviation.
N.B. The T distribution is characterize by its degree of freedom. In
this test the degree are equal to n − 1, because we use 1
estimation (1 constraint)
Confidence interval theory - comparison of two means
In some case we can have an experiment called (for example)
case-control.
Let’s imagine to have the population divided in 2: one is the
treated group, the second is the non treated group.
Suppose to extract two samples from them with aim to test if the
two samples comes from a population with the same mean
parameter (is the treatment effective?)
The output of this test will be a confidence interval representing
the difference between the two means.
N.B. Here, the degree of freedom of the t-distribution are equal to
min(n1, n2) − 1
(Chunk 7)
Formulas
Gaussian confidence interval:
ˆµ = {µ1, µ2} = ˆXn ± z ∗ σ√
n
T - student confidence interval:
ˆµ = {µ1, µ2} = ˆXn ± tn−1 ∗ s√
n
T-student confidence interval for two sample difference:
ˆµdiff = {µdiff 1, µdiff2 } = ( ˆX1 − ˆX2) ± tn−1 ∗ s;
where s = s1 ∗ s1
n1
+ s2 ∗ s2
n2
Gussian confidence interval for proportion (bernoulli
distribution):
ˆρ = {ρ1, ρ2} = ˆf1 ± z ∗ s;
where s = ρ(1−ρ)
n2
Hypotesis testing
Researchers retain or reject hypothesis based on measurements of
observed samples.
The decision is often based on a statistical mechanism called
hypothesis testing.
A type I error is the mishap of falsely rejecting a null hypothesis
when the null hypothesis is true (see the image).
The probability of committing a type I error is called the
significance level of the hypothesis testing, and is denoted by the
Greek letter α (the same used in the confidence intervals).
We demonstrate the procedure of hypothesis testing in R first with
the intuitive critical value approach.
Then we discuss the popular p − value (and very quick) approach
as alternative.
Hypotesis testing - lower tail
The alternative hypothesis of the lower tail test of the population
mean can be expressed as follows:
µ ≥ µ0; where µ0 is a hypothesized lower bound of the true
population mean µ.
Let us define the test statistic z in terms of the sample mean, the
sample size and the population standard deviation σ:
z =
ˆXn−µ0
σ/
√
n
Then the null hypothesis of the lower tail test is to be rejected if
z ≤ zα , where zα is the 100(α) percentile of the standard normal
distribution.
(Chunk 9)
Hypotesis testing - upper tail
The alternative hypothesis of the upper tail test of the population
mean can be expressed as follows:
µ ≤ µ0; where µ0 is a hypothesized upper bound of the true
population mean µ.
Let us define the test statistic z in terms of the sample mean, the
sample size and the population standard deviation σ:
z =
ˆXn−µ0
σ/
√
n
Then the null hypothesis of the upper tail test is to be rejected if
z ≥ z1−α , where z1−α is the 100(1 − α) percentile of the
standard normal distribution.
(Chunk 10)
Hypotesis testing - two tailed
The alternative hypothesis of the two-tailed test of the population
mean can be expressed as follows:
µ = µ0; where µ0 is a hypothesized value of the true population
mean µ. Let us define the test statistic z in terms of the sample
mean, the sample size and the population standard deviation
σ:
z =
ˆXn−µ0
σ/
√
n
Then the null hypothesis of the two-tailed test is to be rejected if
z ≤ zα/2 or z ≥ z1−α/2 , where zα/2 is the 100(α/2) percentile of
the standard normal distribution.
(Chunk 11)
Hypotesis testing - lower tail with Unknown variance
The alternative hypothesis of the lower tail test of the population
mean can be expressed as follows:
µ ≥ µ0; where µ0 is a hypothesized lower bound of the true
population mean µ.
Let us define the test statistic t in terms of the sample mean, the
sample size and the sample standard deviation ˆσ:
t =
ˆXn−µ0
s/
√
n
Then the null hypothesis of the lower tail test is to be rejected if
t ≤ tα , where tα is the 100(α) percentile of the Student t
distribution with n − 1 degrees of freedom.
(Chunk 12)
Hypotesis testing - upper tail with Unknown variance
The alternative hypothesis of the upper tail test of the population
mean can be expressed as follows:
µ ≤ µ0; where µ0 is a hypothesized upper bound of the true
population mean µ.
Let us define the test statistic t in terms of the sample mean, the
sample size and the sample standard deviation ˆσ:
t =
ˆXn−µ0
s/
√
n
Then the null hypothesis of the upper tail test is to be rejected if
t ≥ t1−α , where t1−α is the 100(1 − α) percentile of the Student
t distribution with n1 degrees of freedom.
(Chunk 13)
Hypotesis testing - two tailed with Unknown variance
The alternative hypothesis of the two-tailed test of the population
mean can be expressed as follows:
µ ≥ µ0 or µ ≤ µ0 ; where µ0 is a hypothesized value of the true
population mean µ. Let us define the test statistic t in terms of
the sample mean, the sample size and the sample standard
deviation ˆσ:
t =
ˆXn−µ0
ˆσ/
√
n
Then the null hypothesis of the two-tailed test is to be rejected if
t ≤ tα/2 or t ≥ t1−α/2 , where tα/2 is the 100(α/2) percentile of
the Student t distribution with n − 1 degrees of freedom.
(Chunk 14)
Lower Tail Test of Population Proportion
The alternative hypothesis of the lower tail test about population
proportion can be expressed as follows:
ρ ≥ ρ0; where ρ0 is a hypothesized lower bound of the true
population proportion ρ.
Let us define the test statistic z in terms of the sample proportion
and the sample size:
z = ˆρ−ρ0
ρ0(1−ρ0)
n
Then the null hypothesis of the lower tail test is to be rejected if
z ≤ zα , where zα is the 100(α) percentile of the standard normal
distribution.
(Chunk 15)
Upper Tail Test of Population Proportion
The alternative hypothesis of the upper tail test about population
proportion can be expressed as follows:
ρ ≤ ρ0; where ρ0 is a hypothesized lower bound of the true
population proportion ρ.
Let us define the test statistic z in terms of the sample proportion
and the sample size:
z = ˆρ−ρ0
ρ0(1−ρ0)
n
Then the null hypothesis of the lower tail test is to be rejected if
z ≥ z1−α , where z1−α is the 100(1 − α) percentile of the standard
normal distribution.
(Chunk 16)
Two Tailed Test of Population Proportion
The alternative hypothesis of the upper tail test about population
proportion can be expressed as follows:
ρ ≥ ρ0 or ρ ≤ ρ0; where ρ0 is a hypothesized true population
proportion.
Let us define the test statistic z in terms of the sample proportion
and the sample size:
z = ˆρ−ρ0
ρ0(1−ρ0)
n
Then the null hypothesis of the lower tail test is to be rejected if
z ≤ zα/2 or z ≥ z1−α/2
(Chunk 17)
Sample size definition
The quality of a sample survey can be improved (worsened) by
increasing (decreasing) the sample size.
The formula below provide the sample size needed under the
requirement of (1 − α) confidence level, margin of error E and
planned parameter estimation.
Here, z1−α/2 is the 100(1 − α/2) percentile of the standard normal
distribution.
For mean: n =
z2
1−α/2
∗σ2
E2
For proportion: n =
z2
1−α/2
ρ∗(1−ρ)
E2
n : {P( ˆXn − µ E) = 1 − α}
Sample size definition - Exercises
Mean: Assume the population standard deviation σ of the student
height in survey is 9.48. Find the sample size needed to achieve a
1.2 centimeters margin of error at 95 per cent confidence level.
Since there are two tails of the normal distribution, the 95 per cent
confidence level would imply the 97.5th percentile of the normal
distribution at the upper tail. Therefore, z1−α/2 is given by
qnorm(.975).
Proportion: Using a 50 per cent planned proportion estimate, find
the sample size needed to achieve 5 per cent margin of error for the
female student survey at 95 per cent confidence level.
Since there are two tails of the normal distribution, the 95 per cent
confidence level would imply the 97.5th percentile of the normal
distribution at the upper tail. Therefore, z1−α/2 is given by
qnorm(.975).
(Chunk 18-19)
Homeworks
1: Confidence interval for the proportion. Suppose we have a
sample of size n = 25 of births. 15 of that are female. Define the
interval (at 99 per cent) for the proportion of female in the
population. HINT: Apply with the proper functions in R, the
formula in slide 11.
2: Hypothesis test to compare two proportions. Suppose we have
two schools. Sampling from the first, n = 20 and the Hispanics
students are 8. Sampling from the second, n = 18 and Hispanics
students are 4. Can we state (at 95 per cent) the frequency of
Hispanics are the same in the two schools? N.B.: the test here is
two tailed.
The hypothesis test here is:
z = ˆρ1−ˆρ2
sd ; where sd = ρ(1 − ρ)[ 1
n1
+ 1
n2
];
ρ = (ρ1∗n1+ρ2+n2)
n1+n2
Charts - 1
Figure: Representation of the critical point for the upper tail hypothesis
test
Charts - 2
Figure: Representation of the critical point for the lower tail hypothesis
test
Charts - 3
Figure: Representation of the critical point for the two-tailed hypothesis
test
Charts - 4
Figure: Type I and Type II errors in hypothesis testing

Más contenido relacionado

La actualidad más candente

A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015
Junfeng Liu
 
09 test of hypothesis small sample.ppt
09 test of hypothesis small sample.ppt09 test of hypothesis small sample.ppt
09 test of hypothesis small sample.ppt
Pooja Sakhla
 
Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009
ayimsevenfold
 
law of large number and central limit theorem
 law of large number and central limit theorem law of large number and central limit theorem
law of large number and central limit theorem
lovemucheca
 

La actualidad más candente (19)

Chap09 hypothesis testing
Chap09 hypothesis testingChap09 hypothesis testing
Chap09 hypothesis testing
 
A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015A Geometric Note on a Type of Multiple Testing-07-24-2015
A Geometric Note on a Type of Multiple Testing-07-24-2015
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 
09 test of hypothesis small sample.ppt
09 test of hypothesis small sample.ppt09 test of hypothesis small sample.ppt
09 test of hypothesis small sample.ppt
 
Testing a claim about a mean
Testing a claim about a mean  Testing a claim about a mean
Testing a claim about a mean
 
Chapter 3 sampling and sampling distribution
Chapter 3   sampling and sampling distributionChapter 3   sampling and sampling distribution
Chapter 3 sampling and sampling distribution
 
Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009Chapter 2 discrete_random_variable_2009
Chapter 2 discrete_random_variable_2009
 
law of large number and central limit theorem
 law of large number and central limit theorem law of large number and central limit theorem
law of large number and central limit theorem
 
Bba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distributionBba 3274 qm week 3 probability distribution
Bba 3274 qm week 3 probability distribution
 
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
 
Sampling distributions stat ppt @ bec doms
Sampling distributions stat ppt @ bec domsSampling distributions stat ppt @ bec doms
Sampling distributions stat ppt @ bec doms
 
Chapter1
Chapter1Chapter1
Chapter1
 
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
 
An Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) TestingAn Introduction to Mis-Specification (M-S) Testing
An Introduction to Mis-Specification (M-S) Testing
 
Paper06
Paper06Paper06
Paper06
 
Chap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributionsChap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributions
 
Estimation
EstimationEstimation
Estimation
 

Similar a Talk 3

Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
University of Salerno
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
simonithomas47935
 
C2 st lecture 11 the t-test handout
C2 st lecture 11   the t-test handoutC2 st lecture 11   the t-test handout
C2 st lecture 11 the t-test handout
fatima d
 
Applications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large NumbersApplications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large Numbers
University of Salerno
 
Test of hypothesis (t)
Test of hypothesis (t)Test of hypothesis (t)
Test of hypothesis (t)
Marlon Gomez
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
fatima d
 
L3 sampling fundamentals and estimation
L3 sampling fundamentals and estimationL3 sampling fundamentals and estimation
L3 sampling fundamentals and estimation
Jags Jagdish
 

Similar a Talk 3 (20)

Point Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis testsPoint Estimate, Confidence Interval, Hypotesis tests
Point Estimate, Confidence Interval, Hypotesis tests
 
Descriptive Statistics Formula Sheet Sample Populatio.docx
Descriptive Statistics Formula Sheet    Sample Populatio.docxDescriptive Statistics Formula Sheet    Sample Populatio.docx
Descriptive Statistics Formula Sheet Sample Populatio.docx
 
Inferential statistics-estimation
Inferential statistics-estimationInferential statistics-estimation
Inferential statistics-estimation
 
Lect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spreadLect w2 measures_of_location_and_spread
Lect w2 measures_of_location_and_spread
 
hypothesisTestPPT.pptx
hypothesisTestPPT.pptxhypothesisTestPPT.pptx
hypothesisTestPPT.pptx
 
U unit8 ksb
U unit8 ksbU unit8 ksb
U unit8 ksb
 
Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2Application of Statistical and mathematical equations in Chemistry Part 2
Application of Statistical and mathematical equations in Chemistry Part 2
 
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and BootstrapConfidence Intervals––Exact Intervals, Jackknife, and Bootstrap
Confidence Intervals––Exact Intervals, Jackknife, and Bootstrap
 
C2 st lecture 11 the t-test handout
C2 st lecture 11   the t-test handoutC2 st lecture 11   the t-test handout
C2 st lecture 11 the t-test handout
 
Talk 2
Talk 2Talk 2
Talk 2
 
tps5e_Ch10_2.ppt
tps5e_Ch10_2.ppttps5e_Ch10_2.ppt
tps5e_Ch10_2.ppt
 
Applications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large NumbersApplications to Central Limit Theorem and Law of Large Numbers
Applications to Central Limit Theorem and Law of Large Numbers
 
Test of hypothesis (t)
Test of hypothesis (t)Test of hypothesis (t)
Test of hypothesis (t)
 
Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Pertemuan 10 new - Komputasi Statistik.pptx
Pertemuan 10 new - Komputasi Statistik.pptxPertemuan 10 new - Komputasi Statistik.pptx
Pertemuan 10 new - Komputasi Statistik.pptx
 
C2 st lecture 10 basic statistics and the z test handout
C2 st lecture 10   basic statistics and the z test handoutC2 st lecture 10   basic statistics and the z test handout
C2 st lecture 10 basic statistics and the z test handout
 
L3 sampling fundamentals and estimation
L3 sampling fundamentals and estimationL3 sampling fundamentals and estimation
L3 sampling fundamentals and estimation
 
Categorical data analysis full lecture note PPT.pptx
Categorical data analysis full lecture note  PPT.pptxCategorical data analysis full lecture note  PPT.pptx
Categorical data analysis full lecture note PPT.pptx
 
Montecarlophd
MontecarlophdMontecarlophd
Montecarlophd
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
 

Más de University of Salerno

Poster venezia
Poster veneziaPoster venezia
Poster venezia
University of Salerno
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
University of Salerno
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
University of Salerno
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
University of Salerno
 

Más de University of Salerno (20)

Modelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large dataModelling traffic flows with gravity models and mobile phone large data
Modelling traffic flows with gravity models and mobile phone large data
 
Regression models for panel data
Regression models for panel dataRegression models for panel data
Regression models for panel data
 
Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2Carpita metulini 111220_dssr_bari_version2
Carpita metulini 111220_dssr_bari_version2
 
A strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census dataA strategy for the matching of mobile phone signals with census data
A strategy for the matching of mobile phone signals with census data
 
Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...Detecting and classifying moments in basketball matches using sensor tracked ...
Detecting and classifying moments in basketball matches using sensor tracked ...
 
BASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORSBASKETBALL SPATIAL PERFORMANCE INDICATORS
BASKETBALL SPATIAL PERFORMANCE INDICATORS
 
Human activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone dataHuman activity spatio-temporal indicators using mobile phone data
Human activity spatio-temporal indicators using mobile phone data
 
Poster venezia
Poster veneziaPoster venezia
Poster venezia
 
Metulini280818 iasi
Metulini280818 iasiMetulini280818 iasi
Metulini280818 iasi
 
Players Movements and Team Performance
Players Movements and Team PerformancePlayers Movements and Team Performance
Players Movements and Team Performance
 
Big Data Analytics for Smart Cities
Big Data Analytics for Smart CitiesBig Data Analytics for Smart Cities
Big Data Analytics for Smart Cities
 
Meeting progetto ode_sm_rm
Meeting progetto ode_sm_rmMeeting progetto ode_sm_rm
Meeting progetto ode_sm_rm
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Sensor Analytics in Basket...
 
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
Metulini, R., Manisera, M., Zuccolotto, P. (2017), Space-Time Analysis of Mov...
 
Metulini1503
Metulini1503Metulini1503
Metulini1503
 
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
A Spatial Filtering Zero-Inflated approach to the estimation of the Gravity M...
 
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
The Water Suitcase of Migrants: Assessing Virtual Water Fluxes Associated to ...
 
The Global Virtual Water Network
The Global Virtual Water NetworkThe Global Virtual Water Network
The Global Virtual Water Network
 
The Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with KriskogramThe Worldwide Network of Virtual Water with Kriskogram
The Worldwide Network of Virtual Water with Kriskogram
 
Ad b 1702_metu_v2
Ad b 1702_metu_v2Ad b 1702_metu_v2
Ad b 1702_metu_v2
 

Último

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Talk 3

  • 1. Statistics Lab Rodolfo Metulini IMT Institute for Advanced Studies, Lucca, Italy Lesson 3 - Point Estimate, Confidence Interval and Hypotesis Tests - 20.01.2015
  • 2. Introduction Let’s start having empirical data (one variable of length N) extracted from external file, suppose to consider it to be the population. We define a sample of size n. Suppose we do not have information on population (or, better, we want to check if and how the sample can represent the population) We, in other words, want to make inference using the information contained in the sample, in order to obtain an estimation for the population. That sample is one of several samples we can randomly draw from the population (the sample space). What are the instruments to obtain infos about the population? (1) Sample mean (point estimation) (2) Confidence interval (3) Hypothesis tests
  • 3. Sample space In probability theory, the sample space of an experiment or random trial is the set of all possible outcomes or results of that experiment. It is common to refer to a sample space by the labels Ω, (or U, or S), where Ω is the first element of what we call statistical model: (Ω, A, {Pθ : θ ∈ Θ). Each element of Ω have its relative θ value. For example, for tossing two coins, the corresponding sample space would be {(head,head), (head,tail), (tail,head), (tail,tail)}, so that the dimension is 4. dim(Ω) = 4. It means that we can obtain 4 different samples with corresponding 4 different sample means. Dim(Ω) = xn , where x: number of outcomes in each single experiment. n: number of experiments. In practice, we face up with only one sample took at random from the sample space.
  • 4. Point estimate Point estimate (or statistic) permit us to summarize the information contained in the population (dimension N), throughout only 1 value constructed using n vales (T = t(X1, ..., Xn)) The most used, unbiased point estimator (statistic) is the sample mean. ˆXn = n 1=1 xi n Other point estimators are: (1) Sample Median (2) Sample Mode (3) Geometric mean. Geometric Mean = Mg = n i=1 xi 2 = exp[1 n n 1=1 lnxi ] An example of what is not an estimator is when you use the sample mean after sub setting the sample truncating it on a certain value. P.S. A Naif definition of estimator: when the estimator is computed using all the n informations in the sample.
  • 5. Efficient estimators The BLUE (Best Linear Unbiased Estimator) is defined as follow: 1. is a linear function of all the sample values 2. is unbiased (E(T) = θ) 3. has the smallest sample variance among all unbiased estimators. The sample mean is BLUE for the parameter µ Some estimators are biased but consistent: An estimator is consistent when become unbiased for n −→ ∞
  • 6. Point estimators - cases Normal samples: ˆXn is the BLUE estimator for µ parameter (mean). Normal samples: s2 = 1 n−1 n i=1(x1 − ˆXn)2 is the unbiased estimator of the variance σ2. Bernoulli samples f (x) = ρx (1 − ρ)1−x : ˆXn is a unbiased estimator for ρ parameter (frequency). Poisson samples f (x) = e−kkx x! : ˆXn is a unbiased estimator for k parameter (which represent both mean and variance of the distribution). Exponential samples f (x) = λe−λy 1 ˆXn :is a unbiased estimator for λ parameter (density at value 0). (Chunks 1 to 4)
  • 7. Confidence interval theory With point estimators we make use of only one value to infer about population. With confidence interval we define a minimum and a maximum value in which the population parameter we expect to lie. Formally, we need to calculate: µ1 = ˆXn − z ∗ σ √ n µ2 = ˆXn + z ∗ σ √ n and we end up with interval ˆµ = {µ1; µ2}, or I = [T (1) α , T (2) α ]. It is used to write P{θ ∈ I} 1 − α Here: ˆXn is the sample mean; z is the upper (or lower) critical value of the theoretical distribution. σ is the standard deviation of the theoretical distribution. n the sample size. (See the graph)
  • 8. Confidence interval theory - Gaussian Remembering that: Theorem if X1, ..., Xn are i.i.d. with distribution N(µ, σ2), then the distribution of ˆXn is N(µ, σ2/n) Let assume that the sample mean is 5, the standard deviation in population is known and it is equal to 2, and the sample size is n = 20. In the example below we will use a 95 per cent confidence level and wish to find the confidence interval. N.B. Here, since the confidence interval is 95, the z (the critical value) to consider is the one corresponding with CDF (i.e. dnorm) = 0.975. We also can speak of α = 0.05, or 1 − α = 0.95, or 1 − α/2 = 0.975 (Chunk 5)
  • 9. Confidence interval theory - T-student We use T − student distribution when n is small and sd is unknown in population. We need to use a sample variance estimation: s = (xi − ˆXn)2 n−1 The t-student distribution is more spread out. In simple words, since we do not know the population sd, we need for more large intervals (caution - approach). The only difference with normal distribution, is that we use the command associated with the t-distribution rather than the normal distribution. Here we repeat the procedures above, but we will assume that we are working with a sample standard deviation rather than an exact standard deviation. N.B. The T distribution is characterize by its degree of freedom. In this test the degree are equal to n − 1, because we use 1 estimation (1 constraint)
  • 10. Confidence interval theory - comparison of two means In some case we can have an experiment called (for example) case-control. Let’s imagine to have the population divided in 2: one is the treated group, the second is the non treated group. Suppose to extract two samples from them with aim to test if the two samples comes from a population with the same mean parameter (is the treatment effective?) The output of this test will be a confidence interval representing the difference between the two means. N.B. Here, the degree of freedom of the t-distribution are equal to min(n1, n2) − 1 (Chunk 7)
  • 11. Formulas Gaussian confidence interval: ˆµ = {µ1, µ2} = ˆXn ± z ∗ σ√ n T - student confidence interval: ˆµ = {µ1, µ2} = ˆXn ± tn−1 ∗ s√ n T-student confidence interval for two sample difference: ˆµdiff = {µdiff 1, µdiff2 } = ( ˆX1 − ˆX2) ± tn−1 ∗ s; where s = s1 ∗ s1 n1 + s2 ∗ s2 n2 Gussian confidence interval for proportion (bernoulli distribution): ˆρ = {ρ1, ρ2} = ˆf1 ± z ∗ s; where s = ρ(1−ρ) n2
  • 12. Hypotesis testing Researchers retain or reject hypothesis based on measurements of observed samples. The decision is often based on a statistical mechanism called hypothesis testing. A type I error is the mishap of falsely rejecting a null hypothesis when the null hypothesis is true (see the image). The probability of committing a type I error is called the significance level of the hypothesis testing, and is denoted by the Greek letter α (the same used in the confidence intervals). We demonstrate the procedure of hypothesis testing in R first with the intuitive critical value approach. Then we discuss the popular p − value (and very quick) approach as alternative.
  • 13. Hypotesis testing - lower tail The alternative hypothesis of the lower tail test of the population mean can be expressed as follows: µ ≥ µ0; where µ0 is a hypothesized lower bound of the true population mean µ. Let us define the test statistic z in terms of the sample mean, the sample size and the population standard deviation σ: z = ˆXn−µ0 σ/ √ n Then the null hypothesis of the lower tail test is to be rejected if z ≤ zα , where zα is the 100(α) percentile of the standard normal distribution. (Chunk 9)
  • 14. Hypotesis testing - upper tail The alternative hypothesis of the upper tail test of the population mean can be expressed as follows: µ ≤ µ0; where µ0 is a hypothesized upper bound of the true population mean µ. Let us define the test statistic z in terms of the sample mean, the sample size and the population standard deviation σ: z = ˆXn−µ0 σ/ √ n Then the null hypothesis of the upper tail test is to be rejected if z ≥ z1−α , where z1−α is the 100(1 − α) percentile of the standard normal distribution. (Chunk 10)
  • 15. Hypotesis testing - two tailed The alternative hypothesis of the two-tailed test of the population mean can be expressed as follows: µ = µ0; where µ0 is a hypothesized value of the true population mean µ. Let us define the test statistic z in terms of the sample mean, the sample size and the population standard deviation σ: z = ˆXn−µ0 σ/ √ n Then the null hypothesis of the two-tailed test is to be rejected if z ≤ zα/2 or z ≥ z1−α/2 , where zα/2 is the 100(α/2) percentile of the standard normal distribution. (Chunk 11)
  • 16. Hypotesis testing - lower tail with Unknown variance The alternative hypothesis of the lower tail test of the population mean can be expressed as follows: µ ≥ µ0; where µ0 is a hypothesized lower bound of the true population mean µ. Let us define the test statistic t in terms of the sample mean, the sample size and the sample standard deviation ˆσ: t = ˆXn−µ0 s/ √ n Then the null hypothesis of the lower tail test is to be rejected if t ≤ tα , where tα is the 100(α) percentile of the Student t distribution with n − 1 degrees of freedom. (Chunk 12)
  • 17. Hypotesis testing - upper tail with Unknown variance The alternative hypothesis of the upper tail test of the population mean can be expressed as follows: µ ≤ µ0; where µ0 is a hypothesized upper bound of the true population mean µ. Let us define the test statistic t in terms of the sample mean, the sample size and the sample standard deviation ˆσ: t = ˆXn−µ0 s/ √ n Then the null hypothesis of the upper tail test is to be rejected if t ≥ t1−α , where t1−α is the 100(1 − α) percentile of the Student t distribution with n1 degrees of freedom. (Chunk 13)
  • 18. Hypotesis testing - two tailed with Unknown variance The alternative hypothesis of the two-tailed test of the population mean can be expressed as follows: µ ≥ µ0 or µ ≤ µ0 ; where µ0 is a hypothesized value of the true population mean µ. Let us define the test statistic t in terms of the sample mean, the sample size and the sample standard deviation ˆσ: t = ˆXn−µ0 ˆσ/ √ n Then the null hypothesis of the two-tailed test is to be rejected if t ≤ tα/2 or t ≥ t1−α/2 , where tα/2 is the 100(α/2) percentile of the Student t distribution with n − 1 degrees of freedom. (Chunk 14)
  • 19. Lower Tail Test of Population Proportion The alternative hypothesis of the lower tail test about population proportion can be expressed as follows: ρ ≥ ρ0; where ρ0 is a hypothesized lower bound of the true population proportion ρ. Let us define the test statistic z in terms of the sample proportion and the sample size: z = ˆρ−ρ0 ρ0(1−ρ0) n Then the null hypothesis of the lower tail test is to be rejected if z ≤ zα , where zα is the 100(α) percentile of the standard normal distribution. (Chunk 15)
  • 20. Upper Tail Test of Population Proportion The alternative hypothesis of the upper tail test about population proportion can be expressed as follows: ρ ≤ ρ0; where ρ0 is a hypothesized lower bound of the true population proportion ρ. Let us define the test statistic z in terms of the sample proportion and the sample size: z = ˆρ−ρ0 ρ0(1−ρ0) n Then the null hypothesis of the lower tail test is to be rejected if z ≥ z1−α , where z1−α is the 100(1 − α) percentile of the standard normal distribution. (Chunk 16)
  • 21. Two Tailed Test of Population Proportion The alternative hypothesis of the upper tail test about population proportion can be expressed as follows: ρ ≥ ρ0 or ρ ≤ ρ0; where ρ0 is a hypothesized true population proportion. Let us define the test statistic z in terms of the sample proportion and the sample size: z = ˆρ−ρ0 ρ0(1−ρ0) n Then the null hypothesis of the lower tail test is to be rejected if z ≤ zα/2 or z ≥ z1−α/2 (Chunk 17)
  • 22. Sample size definition The quality of a sample survey can be improved (worsened) by increasing (decreasing) the sample size. The formula below provide the sample size needed under the requirement of (1 − α) confidence level, margin of error E and planned parameter estimation. Here, z1−α/2 is the 100(1 − α/2) percentile of the standard normal distribution. For mean: n = z2 1−α/2 ∗σ2 E2 For proportion: n = z2 1−α/2 ρ∗(1−ρ) E2 n : {P( ˆXn − µ E) = 1 − α}
  • 23. Sample size definition - Exercises Mean: Assume the population standard deviation σ of the student height in survey is 9.48. Find the sample size needed to achieve a 1.2 centimeters margin of error at 95 per cent confidence level. Since there are two tails of the normal distribution, the 95 per cent confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, z1−α/2 is given by qnorm(.975). Proportion: Using a 50 per cent planned proportion estimate, find the sample size needed to achieve 5 per cent margin of error for the female student survey at 95 per cent confidence level. Since there are two tails of the normal distribution, the 95 per cent confidence level would imply the 97.5th percentile of the normal distribution at the upper tail. Therefore, z1−α/2 is given by qnorm(.975). (Chunk 18-19)
  • 24. Homeworks 1: Confidence interval for the proportion. Suppose we have a sample of size n = 25 of births. 15 of that are female. Define the interval (at 99 per cent) for the proportion of female in the population. HINT: Apply with the proper functions in R, the formula in slide 11. 2: Hypothesis test to compare two proportions. Suppose we have two schools. Sampling from the first, n = 20 and the Hispanics students are 8. Sampling from the second, n = 18 and Hispanics students are 4. Can we state (at 95 per cent) the frequency of Hispanics are the same in the two schools? N.B.: the test here is two tailed. The hypothesis test here is: z = ˆρ1−ˆρ2 sd ; where sd = ρ(1 − ρ)[ 1 n1 + 1 n2 ]; ρ = (ρ1∗n1+ρ2+n2) n1+n2
  • 25. Charts - 1 Figure: Representation of the critical point for the upper tail hypothesis test
  • 26. Charts - 2 Figure: Representation of the critical point for the lower tail hypothesis test
  • 27. Charts - 3 Figure: Representation of the critical point for the two-tailed hypothesis test
  • 28. Charts - 4 Figure: Type I and Type II errors in hypothesis testing