The document defines key concepts in probability and hypothesis testing. It discusses probability as a numerical quantity between 0 and 1 that expresses the likelihood of an event. Different probability distributions are covered, including binomial, normal, and Poisson distributions. Hypothesis testing is defined as a methodology to either accept or reject a null hypothesis based on sample data. Types of hypotheses, terms used in testing like test statistics and p-values, and types of errors are also summarized.
2. DEFINATION
A probability is a numerical quantity that expresses the likelihood
of an event.
The probability of an event E is written as
• the probability Pr{E}
• It is always a number between 0 and 1, inclusive.
3. The chance operation must be defined in such a way that
each time the chance operation is performed, the event E either occurs
or does not occur
The following examples illustrate these ideas.
4. Coin Tossing Consider the familiar chance operation of
tossing a coin, and define the
Event E: Heads
Each time the coin is tossed, either it falls heads or it does
not. If the coin is equally likely to fall heads or tails, then
Pr{E} =1/2 = 0.5
Such an ideal coin is called a “fair” coin.
5. FREQUENCY INTERPRETATION OF
PROBABILITY
The frequency interpretation of probability provides a link
between probability and the real world by relating the probability of
an event to a measurable quantity, namely, the long-run relative
frequency of occurrence of the event.
The probability Pr{E} is interpreted as the relative frequency
of occurrence of E in an indefinitely long series of repetitions
of the chance operation.
6. Specifically, suppose that the chance operation is repeated a large number
of times, and that for each repetition the occurrence or non occurrence of E
is noted.
Then we may write:
The arrow in the preceding expression indicates “approximate equality in
the long run”; that is, if the chance operation is repeated many times, the two
sides of the expression will be approximately equal.
7. PROBABILITY TREES
Often it is helpful to use a probability tree to analyze a probability problem.
A probability tree provides a convenient way to break a problem into parts
and to organize the information available.
The following examples show some applications of this idea.
Coin Tossing If a fair coin is tossed twice, then the probability of heads is 0.5
on each toss.
The first part of a probability tree for this scenario shows that there are two
possible outcomes for the first toss and that they have probability 0.5 each.
8. Then the tree shows that, for either outcome of the first toss, the
second toss can be either heads or tails, again with probabilities 0.5
each.
To find the probability of getting heads on both tosses, we
consider the path through the tree that produces this event. We
multiply together the probabilities that we encounter along the path
9. LAWS OF PROBABILITY
Multiplication Law: If A1, · · · ,Ak are independent events, then
Pr(A1 A2 · · · Ak) = Pr(A1) Pr(A2) · · ·Pr(Ak).
Addition Law: If A and B are any events, then
Pr(A [ B) = Pr(A) + Pr(B) − Pr(A B)
Note: This law can be extended to more than 2 events. An
introduction to Basic Statistics and Probability
11. PROBABILITY DISTRIBUTION
The probability distribution of a random variable provides the required
information to the probability of its possible values.
The probability distributions discussed here are characterized by one or more
parameters. The parameters of probability distributions we assume for random
variables are usually unknown.
Typically, we use Greek alphabets such as µ and σ to denote these parameters and
distinguish them from known values.
We usually use µ to denote the mean of a random variable and use σ 2 to denote
its variance.
12. TYPES OF DISTRIBUTION:
Binomial distribution
o A sequence of binary random variables X1;X2; : : : ;Xn is
called Bernoulli trials if they all have the same Bernoulli
distribution and are independent.
o The random variable Y representing the number of times
the outcome of interest occurs in n Bernoulli trials (i.e., the
sum of Bernoulli trials) has a Binomial(n,ɵ) distribution.
o The probability mass function of a binomial(n,ɵ) specifies
the probability of each possible value (integers from 0
through n) of the random variable.
o The theoretical (population) mean of a random variable Y
with Binomial(n;ɵ) distribution is ɵ= n ɵ.
o The theoretical (population) variance of Y is ɵ2 = nɵ (1-ɵ).
13. NORMAL DISTRIBUTION
A distribution represented by a normal curve is called a
normal distribution., which is one of the most widely
used distributions for continuous random variables.
Random variables with this distribution (or very close to it)
occur often in nature.
A normal distribution and its corresponding probability
distribution function are fully specified by the mean µ and
variance σ2.
A random variable X with normal distribution is
denoted X~N(µ, σ2).
N(0,1) is called the standard normal distribution.
14. POISSON DISTRIBUTION:
The probability distribution of a Poisson random variable is called a Poisson
distribution.
The Poisson probability distribution is defined by the following formula:
Poisson Formula. Suppose we conduct a Poisson experiment, in which the average
number of successes within a given region is m. Then, the Poisson probability is:
P(x; m) = (e-m) (mx) / x!
where x is the actual number of successes that result from the experiment, and e is
approximately equal to 2.71828.
A Poisson random variable is the number of successes that result from a Poisson
experiment.
15. A Poisson experiment is a statistical experiment that has the following properties:
The experiment results in outcomes that can be classified as successes or failures.
The average number of successes (m) that occurs in a specified region is known.
The probability that a success will occur is proportional to the size of the region.
The probability that a success will occur in an extremely small region is virtually
zero.
Note that the specified region could take many forms. For instance, it could be a
length, an area, a volume, a period of time, etc.
16. HYPOTHESIS TESTING
Hypothesis testing is one of the most important concepts in statistics.
A statistical hypothesis is an assumption about a population parameter. This
assumption may or may not be true.
The methodology employed by the analyst depends on the nature of the data
used and the goals of the analysis.
The goal is to either accept or reject the null hypothesis.
17. HYPOTHESIS
A supposition or proposed explanation made on the basis of
limited evidence as a starting point for further investigation.
18. 2.Hypothesis Testing Formula:
z test statistic is used for testing the mean of the large sample. The test statistic is given by
z = ˉx−μσ√n where, ˉx is the sample mean, μ is the population mean, σ is the population
standard deviation and n is the sample size.
2. Level of Significance
The confidence at which a null hypothesis is accepted or rejected is called level of significance.
The level of significance is denoted by α
19. Hypothesis testing begins with the hypothesis made about the
population parameter. Then, collect data from appropriate sample and
obtained information from the sample is used to decide how likely it
is that the hypothesized population parameter is correct. The purpose
of hypothesis testing is not to question the computed value of the
sample statistic but to make a judgement about the difference
between two samples and a hypothesized population parameter.
20. Different Types of Hypothesis:
There are 5 different types of hypothesis as follows:
1) Simple Hypothesis
If a hypothesis is concerned with the population completely such as functional form and the parameter,
it is called simple hypothesis.
Example:
The hypothesis “Population is normal with mean as 15 and standard deviation as 5" is a simple
hypothesis
21. 2) Composite Hypothesis or Multiple Hypothesis
If the hypothesis concerning the population is not explicitly defined based on the
parameters, then it is composite hypothesis or multiple hypothesis.
Example:
The hypothesis “population is normal with mean isis a composite or multiple hypothesis.
22. 3) Parametric Hypothesis
A hypothesis, which specifies only the parameters of the probability density
function, is called parametric hypothesis.
Example:
The hypothesis “Mean of the population is 15" is parametric hypothesis.
23. 4) Non Parametric Hypothesis
If a hypothesis specifies only the form of the density function in the
population, it is called a non- parametric hypothesis.
Example:
24. 5) Null and Alternative Hypothesis
A null hypothesis can be defined as a statistical hypothesis, which is stated for acceptance. It is the original
hypothesis. Any other hypothesis other than null hypothesis is called Alternative hypothesis. When null hypothesis
is rejected we accept the alternative hypothesis. Null hypothesis is denoted by H0 and alternative hypothesis is
denoted by H1.
Example:
When we want to test if the population mean is 30, then null hypothesis is “Population mean is 30'' and alternative
Hypothesis is “Population mean is not 30".
25. The logic underlying the hypothesis testing procedure as follow:
The probability of rejecting the null hypothesis, when it is true, is called Type I error whereas the probability of
accepting the null hypothesis is called Type II error. Probability of Type II error is denoted by β.
Example:
Suppose a toy manufacturer and its main supplier agreed that the quality of each shipment will meet a particular
benchmark. Our null hypothesis is that the quality is 90%. If we accept the shipment, given the quality is less than
90%, then we have committed Type I error. If we reject the shipment, given the the quality is greater than 90%,
we have committed Type II error.
26. TERMS USED IN HYPOTHESIS
TESTING
1. Test Statistic
The decision, whether to accept and reject the null hypothesis is made
based on this value. The test statistic is a defined formula based on
the distribution t, z, F etc. If the calculated test statistic value is less
than the critical value, we accept the hypothesis, otherwise, we reject
the hypothesis.
27. 3. Critical Value
Critical value is the value that divides the regions into two-Acceptance
region and rejection region. If the computed test statistic falls in the
rejection region, we reject the hypothesis. Otherwise, we accept the
hypothesis. The critical value depends upon the level of significance and
alternative hypothesis.
28. 4. One Sided or Two Sided Hypothesis
The alternative hypothesis is one sided if the parameter is larger or
smaller than the null hypothesis value. It is two sided when the parameter
is different from the null hypothesis value. The null hypothesis is usually
tested against an alternative hypothesis(H). The alternative hypothesis can
take one of three forms:
29. 5. P - Value
The probability that the statistic takes a value as extreme or more than extreme
assuming that the null hypothesis is true is called P- value. The P-value is the
probability of observing a sample statistic as extreme as the test statistic, assuming
the null hypothesis is true. The P value is the probability of seeing the observed
difference, or greater, just by chance if the null hypothesis is true. The larger the P
value, the smaller will be the evidence against the null hypothesis.
30. A hypothesis testing gives the following benefits:
Process of Hypothesis Testing
Errors in Research Testing:
It is common to make two types of errors while drawing conclusions in research:
Type 1: When we recognize the research hypothesis and the null hypothesis is
supposed to be correct.
Type 2: When we refuse the research hypothesis even if the null hypothesis is
incorrect.
31. DECISION ERRORS
Two types of errors can result from a hypothesis test.
Type I error. A Type I error occurs when the researcher rejects a null
hypothesis when it is true. The probability of committing a Type I error is called the
significance level. This probability is also called alpha, and is often denoted by α.
Type II error. A Type II error occurs when the researcher fails to reject a null
hypothesis that is false. The probability of committing a Type II error is called
Beta, and is often denoted by β. The probability of not committing a Type II error
is called the Power of the test.