SlideShare una empresa de Scribd logo
1 de 92
INSTRUCTOR: DR. TUNG NGUYE
GROUP 7
MEMEMBER:
Ly Ngoc Tra An
Ngo Huong Giang
Tran Nhu Hanh
Tran Thi My Hanh
Nguyen Thi Hong Tham
Nguyen Thi Thao Tien
OUT LINE
Data analysis
 Central Tendency : Mean,Median,Mode
 Spread of distribution : Range, Variance,
  Standard Deviation
 Experimental :
        Paired T-Test
         Anova
CENTRAL TENDENCY
The term central tendency refers to the "middle" value or perhaps a
   typical value of the data, and is measured using the mean, median,
   or mode. Each of these measures is calculated differently, and the
   one that is best to use depends upon the situation.

In statistics, the term central tendency relates to the way in which
    quantitative data tend to cluster around some value

In the simplest cases, the measure of central tendency is an average of
   a set of measurements, the word average being variously construed
   as mean, median, or other measure of location, depending on the
   context.

Both "central tendency" and "measure of central tendency" apply to
   either statistical populations or to samples from a population.
MEASURES OF CENTRAL
           TENDENCY
Arithmetic mean: (or simply, mean) – the sum of all
measurements divided by the number of observations in
the data set

The mean is the most commonly-used measure of central tendency.
  When we talk about an "average", we usually are referring to
  the mean. The mean is simply the sum of the values divided by
  the total number of items in the set. The result is referred to as
  the arithmetic mean. Sometimes it is useful to give more
  weighting to certain data points, in which case the result is
  called the weighted arithmetic mean.
The mean is valid only for interval data or ratio data. Since it uses
  the values of all of the data points in the population or sample,
  the mean is influenced by outliers that may be at the extremes of
  the data set.
MEDIAN: THE MIDDLE VALUE
    THAT SEPARATES THE HIGHER
     HALF FROM THE LOWER HALF
                       OF THEsetDATA highest values and taking the data
The median is determined by sorting the data from lowest to
                                                            SET
    point in the middle of the sequence. There is an equal number of points above and below the
    median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points
    greater than this value and two data points less than this value. In this case, the median is
    equal to the mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is
    three, but the mean is equal to 4. If there is an even number of data points in the set, then there
    is no single point at the middle and the median is calculated by taking the mean of the two
    middle points.
The median can be determined for ordinal data as well as interval and ratio data. Unlike the mean,
    the median is not influenced by outliers at the extremes of the data set. For this reason, the
    median often is used when there are a few extreme values that could greatly influence the
    mean and distort what might be considered typical. This often is the case with home prices
    and with income data for a group of people, which often is very skewed. For such data, the
    median often is reported instead of the mean. For example, in a group of people, if the salary
    of one person is 10 times the mean, the mean salary of the group will be higher because of the
    unusually large salary. In this case, the median may better represent the typical salary level of
    the group.
MODE (STATISTICS): THE MOST
FREQUENT VALUE IN THE DATA
            SET
The mode is the most frequently occurring value in the data set.
  For example, in the data set {1,2,3,4,4}, the mode is equal to
  4. A data set can have more than a single mode, in which case
  it is multimodal. In the data set {1,1,2,3,3} there are two
  modes: 1 and 3.
The mode can be very useful for dealing with categorical data.
  For example, if a sandwich shop sells 10 different types of
  sandwiches, the mode would represent the most popular
  sandwich. The mode also can be used with ordinal, interval,
  and ratio data. However, in interval and ratio scales, the data
  may be spread thinly with no data points having the same
  value. In such cases, the mode may not exist or may not be
  very meaningful.
WHEN TO USE MEAN, MEDIAN,
         AND MODE

Measurement Scale   Best Measure of the
                    "Middle"
Nominal             Mode
(Categorical)
Ordinal             Median

Interval            Symmetrical data: Mean
                    Skewed data: Median
Ratio               Symmetrical data: Mean
                    Skewed data: Median
A RANGE, A VARIANCE, AND A
      STANDARD DEVIATION


RANGE
Range = The range indicates the distance
 between the two most extreme scores in a
 distribution
>>> Range = highest score – lowest score
VARIANCE AND STANDARD DEVIATION
•The variance and standard deviation are two
measures of variability that indicate how
much the scores are spread out around the p
mean
• We use the mean as our reference point since
it is at the center of the distribution
Variance = how spread out (far away) a
  number is from the mean
Standard Deviation = loosely defined as
  the average amount a number differs
  from the mean
We will use the following sample data
 set to explain the range, variance, and
 standard deviation:
        4, 6, 3, 7, 9, 4, 2, 1, 4, 2
SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2
Range:
R = maximum score - minimum score
In order to figure out the range, A) arrange your data
   set in order from lowest to highest and B) subtract
   the lowest number from the highest number.

A) When arranged in order, 4, 6, 3, 7, 9, 4, 2, 1, 4,
  2 becomes: 1, 2, 2, 3, 4, 4, 4, 6, 7, 9

B) The lowest number is 1 and the highest number
  is 9. Therefore, R = 9-1 = 8
SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2
The Computational Formula:

From the above formula:
S2 = variance
Σ = sigma = the sum of (add up all the numbers)
X = the numbers from your data set
X2 = the numbers from your data set squared
N = the total number of numbers you have in your data
   set
SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2
The easiest way to compute variance with the     A):        C):
    computational formula is as follows:         X            X2
A) List each of the numbers in your data set     4           42=16
    vertically & get the sum of that column
B) Figure out n (count how many numbers you      6           62=36
    have in your data set)                       3           32=9
C) Square each number in your data set and get   7           72=49
    the sum of that column                       9           92=81
                                                 4           42=16
                                                 2           22=4
                                                 1           12=1
                                                 4           42=16
                                                 2           22=4
                                                 Σ=42       Σ=232

                                                 B): N=10
SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2

Now use the sum for part A) and C), as
 well as the value for N which you
 found in part B) to fill in the formula:
Do the math and S2 = 5.56
SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2
The Conceptual Formula:

From the above formula:
S2 = variance
Σ = sigma = the sum of (add up all the numbers)
X = the numbers from your data set
M = the mean
N = the total number of numbers you have in your data
   set
SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2
The easiest way to compute variance with the      A):    D):                 E):
    computational formula is as follows:          X       (X-M)              (X-M)2
A) List each of the numbers in your data set      4       (4-4.2)= -0.2   (-0.2)2= 0.04
    vertically & get the sum of that column
B) Figure out n (count how many numbers you 6             (6-4.2)= 1.8     (1.8)2= 3.24
    have in your data set)                        3       (3-4.2)= -1.2     (-1.2)2= 1.44
C) Figure out M                                   7       (7-4.2)= 2.8      (2.8)2= 7.84
D) Subtract M from each number in your data       9       (9-4.2)= 4.8      (4.8)2= 23.04
    set (Notice how the sum is zero)              4       (4-4.2)= -0.2     (-0.2)2= 0.04
E) Square the numbers you got for part D) and get 2       (2-4.2)= -2.2     (-2.2)2= 4.84
    the sum of that column
                                                  1       (1-4.2)= -3.2      (-3.2)2= 10.24
                                                  4       (4-4.2)= -0.2     (-0.2)2= 0.04
                                                  2       (2-4.2)= -2.2     (-2.2)2= 4.84
                                                  Σ=42   Σ=0                 Σ=55.6

                                                B): N=10
                                                C): M= 42/10=4.2
Now use the sum for part E), as well as
 the value for N which you found in
 part B) to fill in the formula:




Do the math and S2 = 5.56
STANDARD DEVIATION:

Standard deviation is simply the square root
  of the variance. Therefore, it does not
  matter if you use the computational formula
  or the conceptual formula to compute
  variance.
For our sample data set, our variance came
  out to be 5.56, regardless of the formula
  used. The standard deviation for our data
  set then becomes: S =           = 2.36
INDEPENDENT SAMPLES
• The independent samples t-test is used when two separate
  sets of independent and identically distributed samples are
  obtained, one from each of the two populations being
  compared.
• E.g: suppose we are evaluating the effect of a medical
  treatment, and we enroll 100 subjects into our study, then
  randomize 50 subjects to the treatment group and 50 subjects
  to the control group. In this case, we have two independent
  samples and would use the unpaired form of the t-test. The
  randomization is not essential here—if we contacted 100
  people by phone and obtained each person's age and gender,
  and then used a two-sample t-test to see whether the mean
  ages differ by gender, this would also be an independent
  samples t-test, even though the data are observational.
INDEPENDENT DATA ANALYSIS

Calculations:
a. Equal sample sizes, equal variance
b. Unequal sample sizes, equal
   variance
c. Unequal sample sizes, unequal
   variance
A. EQUAL SAMPLE SIZES, EQUAL VARIANCE


This test is only used when both:
the two sample sizes (that is, the
   number, n, of participants of each
   group) are equal;
it can be assumed that the two
   distributions have the same variance.
B. UNEQUAL SAMPLE SIZES, EQUAL VARIANCE

This test is used only when it can be
 assumed that the two distributions
 have the same variance.
C. UNEQUAL SAMPLE SIZES, UNEQUAL
VARIANCE

This test, also known as Welch's t-test,
 is used only when the two population
 variances are assumed to be different
 (the two sample sizes may or may not
 be equal) and hence must be
 estimated separately.
WORKED EXAMPLE
• A study of the effect of caffeine on muscle metabolism used
  eighteen male volunteers who each underwent arm exercise
  tests. Nine of the men were randomly selected to take a
  capsule containing pure caffeine one hour before the test.
  The other men received a placebo capsule. During each
  exercise the subject's respiratory exchange ratio (RER) was
  measured. (RER is the ratio of CO2 produced to O2
  consumed and is an indicator of whether energy is being
  obtained from carbohydrates or fats).
• Question: whether, on average, caffeine changes RER.
• Populations: ―men who have not taken caffeine‖ and ―men
  who have taken caffeine‖. (If caffeine has no effect on RER
  the two sets of data can be regarded as having come from
  the same population.)
• The means show that, on average,
                               caffeine appears to have altered RER from
                               about 100.6% to 94.2%, a change of 6.4%
                               •. However, there is a great deal of
Placebo         Caffeine       variation between the data values in both
                               samples and considerable overlap
105             96
                               between them.
119             99             • Is the difference between the two means
100             94             simply due sampling variation, or does the
                               data provide evidence that caffeine does,
97              89
                               on average, reduce RER? >> p-value
96              96             answers this question.
101             93             •The t-test tests the null hypothesis that the
                               mean of the caffeine treatment equals the
94              88
                               mean of the placebo versus the alternative
95              105            hypothesis that the mean of caffeine
98              88             treatment is not equal to the mean of the
                               placebo treatment.
Mean = 100.56   Mean = 94.22
                               •Computer output obtained for the RER
SD = 7.70       SD = 5.61      data gives the sample means and the 95%
                               confidence interval for the difference
                               between the means.
COMPUTER OUTPUT




The p-value is 0.063 and, therefore, the difference between the two means is not
statistically significantly different from zero at the 5% level of significance. There is
an estimated change of 6.4% (SE = 3.17%). However, there is insufficient evidence
(p = 0.063) to suggest that caffeine does change the mean RER.
Alternative suggestion
   It could be argued, however, that the researcher might only be interested in whether
   'caffeine reduces RER'. That is, the researcher is looking for a specific direction for
   the difference between the two population means. This is an example of a one-tailed
   t-test as opposed to a two-tailed t-test outlined above.


SPSS only performs a 2-tailed test (the non-directional alternative hypothesis) and to
   obtain the p-value for the directional alternative hypothesis (one-tailed test) the p-
   value should be halved. Hence, in this example, p = 0.032.


Report: The mean RER in the caffeine group (94.2 1.9) was significantly lower (t =
   1.99, 16 df, one-tailed t-test, p = 0.032) than the mean of the placebo group (100.6
     2.6).


Note: It is important to decide whether a one- or two-tailed test is being carried-out,
   before analysis takes place.
   Otherwise it might be tempting to see what the p-value is before making your
   decision!
A suitable null hypothesis in both cases is
H0: On average, caffeine has no effect on
 RER, with an alternative (or
 experimental) hypothesis,
H1: On average, caffeine changes RER (2-
 tail test), or H1: On average, caffeine
 reduces RER (1-tail case).
2. ONE SAMPLE T-TEST

Compare the mean score of a sample to
 a known value. Usually, the known
 value is a population mean.
Assumption:
    The dependent variable is
 normally distributed.
In testing the null hypothesis that the
  population mean is equal to a specified
  value μ, use the statistic:




: sample mean   S: sample standard deviation   n: sample size
2. PAIRED SAMPLES T-TEST
What it does:
compare the means of two variables
compute the difference between the two variables
 for each case, and test to see if the average
 difference is significantly different from zero
Assumption:
  Both variables should be normally distributed.
Hypothesis:
Null: There is no significant difference
 between the means of the two variables.
Alternate: There is a significant difference
 between the means of the two variables.
 Difference between a paired samples t-test
  and an independent samples t-test?
Both tests are used to find significant differences between
 groups, but the independent samples t-test assumes the
 groups are not related to each other, while the dependent
 samples t-test or paired samples t-test assumes the groups
 are related to each other.
A dependent samples t-test or paired samples t-test would
 be used to find differences within groups, while the
 independent samples t-test would be used to find
 differences between groups.
 Independent variable and dependent
   variable:
       The independent variable and the dependent
  variable is the same in both the dependent
  samples t-test and the independent samples t-test.
 The variable of measure of the variable of
  interest is the dependent variable and the
  grouping variable is the independent variable.
The most common use of the dependent samples t-test
 is in a pretreatment vs. posttreatment scenario where the
 researcher wants to test the effectiveness of a treatment.

1.   The participants are tested pretreatment, to establish
     some kind of a baseline measure
2.   The participants are then exposed to some kind of
     treatment
3.   The participants are then tested posttreatment, for the
     purposes of comparison with the pretreatment scores
For this equation, the differences between all pairs
must be calculated. The pairs are either one person's
pre-test and post-test scores or between pairs of
persons matched into meaningful groups. The average
and standard deviation       of those differences are
used in the equation. The degree of freedom used
is n − 1.
EXAMPLE: SPSS OUTPUT

We compared the mean test scores before
 (pre-test) and after (post-test) the subjects
 completed a test preparation course.
We want to see if our test preparation
 course improved people's score on the
 test
The post-test mean scores are higher.
There is a strong positive correlation.
People who did well on the pre-test also
did well on the post-test.
Remember, this test is based on the difference
between the two variables. Under "Paired
Differences" we see the descriptive statistics for the
difference between the two variables.
The T value = -2.171
We have 11 degrees of freedom
Our significance is .053
If the significance value is less than .05,
   there is a significant difference.
   If the significance value is greater than.
   05, there is no significant difference.
Conclusion: There is no difference between
   pre- and post-test scores. Our test
   preparation course did not help!
PRESENTER: TRAN NHU HANH
WHAT IS ANOVA?

• ANOVA is an analysis of the variation present
  in an experiment. It is a test of the hypothesis
  that the variation in an experiment is no greater
  than that due to normal variation of individuals'
  characteristics and error in their measurement.
• ANOVA, is a technique from statistical
  interference that allows us to deal with several
  populations
TYPES OF ANOVA


1. One-way ANOVA
2. Two-way ANOVA
ONE-WAY ANOVA DEFINITION


•   A One-way ANOVA is used when
    comparing two or more group means on
    a continuous dependent variable. In other
    words, one-way ANOVA techniques can
    be used to study the effect of k(>2) levels
    of a single factor.
•   The independent T-Test is a special case
    of the One-way ANOVA for situatiosn
    where there are only two group means
MAJOR CONCEPTS:

1. CALCULATING SUMS OF SQUARES
• The One-way ANOVA separates the total variance
   in the continuous dependent variable into two
   components: Variability between the groups and
   Variability within the groups
• Variability between the groups is calculated by
   first obtaining the sums of squares between groups
   (SSb), or the sum of the square differences between
   each indibidual group mean from the grand mean
• Variability within the groups is calculated by
   first obtaining the sums of squares within groups
   (SSw) or the sum of the squared differences
   beyween each individual score and that individual’s
   group mean.
TYPES OF VARIABLES
          FOR ONE-WAY ANOVA

• The IV (Independent Variable) is
  categorical. The categorical IV can
  be two groups or it can have more
  than two groups.
• The DV (Dependent Variable) is
  continuous
• Data are collected on both variables
  for each person in the study.
EXAMPLES OF RESEARCH QUESTIONS FOR
              ONE-WAY ANOVA
1. Is there a significant difference in student attitudes
toward the course between students who pass or fail a
course?
• Student attitude is continuous
• Passing a course is categorical (pass/fail)
 Because the IV has only 2 groups, we can use
independent T-Test
2. Does student satisfaction significantly differ by location
of institution (rural, urban, suburban)?
• Student satisfaction is continuous
• Institution location is categorical
The linear model, conceptually, is:

SSt = SSb + SSw
SSt: total sums of squares
SSb: sums of squares between groups
SSw: sums of squares within groups
ONE-WAY ANOVA AS A RATIO OF VARIANCES:



Formula for variance:



Numerator: a sum of squared values (or a
 sums of squares)
Denominator: degrees of freedom
• The ANOVA analyzes the ratio of the
  variance between groups the variance
  within the groups
• In ANOVA, these variances, formerly
  known to us as , are referred as mean
  squares (MS). Mean squares are
  calculated by dividing each sum of
  squares by the degrees of freedom
  associated with it.
•   Thus, a mean square between is simply the
    variance between groups obtained by a sums of
    squares divided by degrees of freedom
•   Likewise, a mean square within is simply the
    variance between groups obtained by a sums of
    squares divided by degrees of freedom
FACTORS THAT AFFECT SIGNIFICANCE
 F -ratio: the variation due to an experimental treatment or
  effect divided by the variation due to experimental error. The
  null hypothesis is this ratio equals 1.0, or the treatment effect
  is the same as the experimental error. This hypothesis is
  rejected if the F-ratio is significantly large enough that the
  possibility of it equaling 1.0 is smaller than some pre-
  assigned criteria such as 0.05 (one in twenty)
 The MSb and the MSw are then divided to obtain the F ratio
  for hypothesis testing
DISTRIBUTION OF F - RATIO


•   F distribution is positively skewed
•   If F statistic falls near 1.0, then
    most likely the null is true
•   If F statistic is large, expect null is
    false. Thus, signigicant F ratios will
    be in the tail of the F distribution
P VALUE

In statistical hypothesis testing, the p-
  value is the probability of obtaining
  a test statistic at least as extreme as
  the one that was actually observed,
  assuming that the null hypothesis is
  true. One often "rejects the null
  hypothesis" when the p-value is less
  than the significance level α, which is
  often 0.05 or 0.01.
t2= F
•   The larger the value of t, the more liley we are
    to find significant results
•   t is a special case of ANOVA when only two
    groups comprise the independent variable
•   We’re famimilar with the t distribution as
    normally distributed (for large df), with
    positive and negative values. The F statistics,
    on the other hand, is positively skewed, and is
    comprised of squared values. Thus, for any two
    group situation, t2= F
CALCULATIONS

• dfb = k-1(k: numbers of samples/
  groups/ levels)
• dfw = N- k (total of individuals in
  groups)
• dfT = N -1
• MSb = SSb/ dfb
• MSw = SSw/ dfw
• F = MSb/ MSw
STEPS IN ONE-WAY ANOVA


STEP 1: STATE HYPOTHESES
To determine if different levels of factor affect measured
observations differently, the following hypotheses are tested.
• There is no significant difference among groups in variable X
• There is a significant difference between at least two of the
  groups in the variable X. In other words, at least one mean will
  significantly differ.
STEP 2: SET THE CRITERION FOR REJECTING
                    HO
STEP 3: COMPUTE TEST STATISTIC
STEP 4: COMPARE TEST STATISTIC TO
CRITERION
STEP 5: MAKE DECISION


• Fail to reject the null hypothesis and
  conclude tha there is no significant
  different among the group F(dfb, dfw) =
  insert F statistic, p> insert α
• Reject the null hypothesis and
  conclude that there is a significant
  difference among the grou F(dfb, dfw)
  = insert F statistic, p <insert α
TWO-WAY ANOVA



Difference between one-way and
 two-way ANOVA
ANOVA Test
ONE-WAY ANOVA
• One-Way ANOVA has one independent
  variable (1 factor) with > 2 conditions
  – conditions = levels = treatments
  – e.g., for a brand of cola factor, the
  levels are:
   Coke, Pepsi, RC Cola
• Independent variables = factors
TWO-WAY ANOVA
• Two-Way ANOVA has 2 independent variables
  (factors)
  – each can have multiple conditions
Example
  • Two Independent Variables (IV’s)
  – IV1: Brand; and IV2: Calories
  – Three levels of Brand:
       • Coke, Pepsi, RC Cola
  - Two levels of Calories:
       • Regular, Diet
WHEN TO USE
• One-way ANOVA: you have more than two levels
  (conditions) of a single IV
– EXAMPLE: studying effectiveness of three types of
  pain reliever
       aspirin vs. tylenol vs. ibuprofen
• Two-way ANOVA: you have more than one IV
  (factor)
  – EXAMPLE: studying pain relief based on pain
  reliever and type of pain
  • Factor A: Pain reliever (aspirin vs. tylenol)
  • Factor B: type of pain (headache vs. back pain
NOTATION

Factor A
Factor B.
a : the number of categories of Factor A,
b : the number of categories of Factor B.
Total number of groups is ab.T
The total number of observations N .
The response/dependent variable value for each observation :Yijk
   ,
where i : the subject’s category for Factor A, and j : the subject’s
   category for Factor B. Then i and j together : a group, and k
   denotes which individual we’re talking about within this
   particular group.
The number of observations in each group n and N = abn.
How the number of hours of TV people
 watch per week depends on two variables:
 gender and age. Each person is classified
 according to gender (male, female) and age
 (18–24, 25–54,55+).
There are six groups—one for each
 combination of gender and age. We
 randomly sample five people from each
 group, and each person reports the time, in
 hours, that he or she watches TV per week.
 The data is shown in
Age 18– Age 25– Age 55+
         24      54
Male   20      23      33
       27      21      33
       20      23      39
       22      28      33
       28      28      37
Female 25      32      44
       19      26      43
       27      33      52
       32      33      43
       31      24      54
TWO-WAY ANOVA TABLE

1. Sums of squares.
2. Degrees of freedom.
3. Mean squares.
There are three main           Whether TV viewing time
  questions that we might      depends on age and
  ask in two-way ANOVA:        gender.
• Does the response variable   The third question asks
  depend on Factor A?          whether TV viewing time
• Does the response variable   depends        on gender
  depend on Factor B?          differently for people of
                               different ages, or whether
• Does the response variable   TV viewing time depends
  depend on Factor        A    on age differ- ently for
  differently for different    men than for women.
  values of Factor B, and
  vice versa?                  (For example, perhaps it’s
                               true that women 55+
                               watch more TV than men
                               55+, but women 18–24
                               watch less TV than men
                               18–24.)
1.Sums of Squares
Two-way ANOVA involves five different
  sums of squares:
  • The total sum of squares, SS Tot , measures the total
  variability in the response variable values. Its formula is




  • The Factor A sum of squares, SS A, measures the
  variability that can be explained by differences in Factor
  A. Its formula is
_
Yij represents the sample mean of the group in category i of
   Factor A and category j of Factor B (always an average of n
   observations).
   _
Yi represents the sample mean of all the data in category i of
   Factor A combined (always an average of bn observations).
   _
Y j represents the sample mean of all the data in category j of
   Factor B combined (always an average of an observations).
   _
Y     represents the overall sample mean of all the data from all
   groups combined (always an average of all abn = N
   observations).
• The Factor B sum of squares, SS B , measures the
variability that can be explained by differences in Factor
B. Its formula is



•The interaction sum of squares, SS AB , measures the
variability that can be explained by interaction between the
effects of Factors A and B. (We’ll talk more about what this
means later.) Its formula is



 •The error sum of squares, SS E , measures the variability
 of the ob- servations around their group sample means. Its
 formula is
•If we call the sample standard deviation within
each group sij , then another formula for SS E is
Degrees of freedom
Mean squares.
ANOVA TABLE
Using statistical software-
TWO-WAY ANOVA
  HYPOTHESIS TESTS
• Does the response
   variable depend on
   Factor A?
• Does the response           Main effects
   variable depend on
   Factor B?
• Does the response
   variable depend on
   Factor A differently for   Interaction
   different values of
   Factor B, and vice
   versa?
Interaction :
We say that there is interaction if Y
depends on Factor A differently for
different values of Factor B, and vice
versa.
Similarly, we say that there is NO
interaction if Y depends on Factor A
in the same way for all values of Factor
B, and vice versa.
HYPOTHESES


In the test for interaction, the null hypothesis
  (Ho) is that there is no interaction, while the
  alternative hypothesis (Ha) is that there is
  interaction.
There is no interaction on the left. For each age group, women
 average watching five more hours of TV per week than men. For
 each gender, the middle age group averages watching six
 more hours of TV per week than the youngest age group, and
 the oldest age group averages watching nine more hours of TV
 per week than the middle age group.




• There   is interaction on the right. For each age group, women
average watching more TV than men, but how much more varies
for the different age groups. Also, for each gender, older people
average watching more TV, but how much more varies by gender.
ASSUMPTIONS

The assumptions for the two-way
 ANOVA F test for interaction are
 exactly the same as those of the one-
 way ANOVA F test, with one
 additional re- quirement: the number
 of observations should be the the
 same for all groups.
TEST STATISTIC
P-VALUE
DECISION
• If we believe there is interaction, then we don’t bother
to ask whether the response depends on Factor A or
Factor B separately—the fact that there is interaction
means that        the response depends on Factor           A
differently for different values of Factor B, and vice
versa. So we stop here and do not perform the tests
for main effects (which we’ll talk about in the next
subsection).
• If we believe it’s reasonable that there is no interaction,
then that means we can look at the effects of Factor A
and Factor B separately, so we proceed to the tests
for main effects.

Más contenido relacionado

La actualidad más candente

THESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHON
THESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHONTHESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHON
THESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHONMi L
 
Unang wika at Pangalawang wika
Unang wika at Pangalawang wikaUnang wika at Pangalawang wika
Unang wika at Pangalawang wikaAr Jay Bolisay
 
Definition of terms
Definition of termsDefinition of terms
Definition of termsZy x Riaru
 
Kasaysayan ng Wikang Pambansa
Kasaysayan ng Wikang PambansaKasaysayan ng Wikang Pambansa
Kasaysayan ng Wikang PambansaRichelle Serano
 
Statistical treatment of data
Statistical treatment of dataStatistical treatment of data
Statistical treatment of datasenseiDelfin
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copySWEET PEARL GAMAYON
 
Tungkulin ng wika
Tungkulin ng wikaTungkulin ng wika
Tungkulin ng wikasaraaaaah
 
Slovin's formula
Slovin's formulaSlovin's formula
Slovin's formulaschool
 
Ang wika at ang pakikipagtalastasan
Ang wika at ang pakikipagtalastasanAng wika at ang pakikipagtalastasan
Ang wika at ang pakikipagtalastasanMarygrace Cagungun
 
Mga tungkulin ng wika
Mga tungkulin ng wikaMga tungkulin ng wika
Mga tungkulin ng wikaMj Aspa
 
CRITICAL APPROACHES TO LITERATURE
CRITICAL APPROACHES TO LITERATURECRITICAL APPROACHES TO LITERATURE
CRITICAL APPROACHES TO LITERATURELeah Condina
 
Pananaliksik sa filipino 11 final
Pananaliksik sa filipino 11 finalPananaliksik sa filipino 11 final
Pananaliksik sa filipino 11 finaljaszh12
 

La actualidad más candente (20)

THESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHON
THESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHONTHESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHON
THESIS - WIKANG FILIPINO, SA MAKABAGONG PANAHON
 
wikang pambansa
wikang pambansawikang pambansa
wikang pambansa
 
Dahon ng pagpapatibay
Dahon ng pagpapatibayDahon ng pagpapatibay
Dahon ng pagpapatibay
 
Unang wika at Pangalawang wika
Unang wika at Pangalawang wikaUnang wika at Pangalawang wika
Unang wika at Pangalawang wika
 
Definition of terms
Definition of termsDefinition of terms
Definition of terms
 
Kasaysayan ng Wikang Pambansa
Kasaysayan ng Wikang PambansaKasaysayan ng Wikang Pambansa
Kasaysayan ng Wikang Pambansa
 
Statistical treatment of data
Statistical treatment of dataStatistical treatment of data
Statistical treatment of data
 
Statistical treatment and data processing copy
Statistical treatment and data processing   copyStatistical treatment and data processing   copy
Statistical treatment and data processing copy
 
Tungkulin ng wika
Tungkulin ng wikaTungkulin ng wika
Tungkulin ng wika
 
Mean of discrete probability
Mean of discrete probabilityMean of discrete probability
Mean of discrete probability
 
Uri ng komunikasyon
Uri ng komunikasyonUri ng komunikasyon
Uri ng komunikasyon
 
Slovin's formula
Slovin's formulaSlovin's formula
Slovin's formula
 
Pa nahon ng aktibismo
Pa nahon ng aktibismoPa nahon ng aktibismo
Pa nahon ng aktibismo
 
Module 2 statistics
Module 2   statisticsModule 2   statistics
Module 2 statistics
 
Ang wika at ang pakikipagtalastasan
Ang wika at ang pakikipagtalastasanAng wika at ang pakikipagtalastasan
Ang wika at ang pakikipagtalastasan
 
Mga tungkulin ng wika
Mga tungkulin ng wikaMga tungkulin ng wika
Mga tungkulin ng wika
 
Basic Terms in Statistics
Basic Terms in StatisticsBasic Terms in Statistics
Basic Terms in Statistics
 
CRITICAL APPROACHES TO LITERATURE
CRITICAL APPROACHES TO LITERATURECRITICAL APPROACHES TO LITERATURE
CRITICAL APPROACHES TO LITERATURE
 
Barayti ng wika
Barayti ng wikaBarayti ng wika
Barayti ng wika
 
Pananaliksik sa filipino 11 final
Pananaliksik sa filipino 11 finalPananaliksik sa filipino 11 final
Pananaliksik sa filipino 11 final
 

Similar a G7-quantitative

Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviationAdebanji Ayeni
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic StatisticsYee Bee Choo
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersionGilbert Joseph Abueg
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsBurak Mızrak
 
Statistical methods
Statistical methods Statistical methods
Statistical methods rcm business
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendencykreshajay
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersionAbhinav yadav
 
First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1OmotaraAkinsowon
 
descriptive statistics- 1.pptx
descriptive statistics- 1.pptxdescriptive statistics- 1.pptx
descriptive statistics- 1.pptxSylvia517203
 
LESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptx
LESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptxLESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptx
LESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptxMarjoriAnneDelosReye
 
A Brief Overview On Central Tendency.pptx
A Brief Overview On Central Tendency.pptxA Brief Overview On Central Tendency.pptx
A Brief Overview On Central Tendency.pptxCHIRANTANMONDAL2
 
Mean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptxMean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptxssuserb9172b1
 
Mean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptxMean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptxkrishan425
 
Stattistic ii - mode, median, mean
Stattistic ii - mode, median, meanStattistic ii - mode, median, mean
Stattistic ii - mode, median, meanamsy1224
 

Similar a G7-quantitative (20)

Empirics of standard deviation
Empirics of standard deviationEmpirics of standard deviation
Empirics of standard deviation
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
 
Central Tendency.pptx
Central Tendency.pptxCentral Tendency.pptx
Central Tendency.pptx
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt5.DATA SUMMERISATION.ppt
5.DATA SUMMERISATION.ppt
 
Statistical methods
Statistical methods Statistical methods
Statistical methods
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Measures of central tendency and dispersion
Measures of central tendency and dispersionMeasures of central tendency and dispersion
Measures of central tendency and dispersion
 
First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1First term notes 2020 econs ss2 1
First term notes 2020 econs ss2 1
 
descriptive statistics- 1.pptx
descriptive statistics- 1.pptxdescriptive statistics- 1.pptx
descriptive statistics- 1.pptx
 
Statistics
StatisticsStatistics
Statistics
 
LESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptx
LESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptxLESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptx
LESSON-8-ANALYSIS-INTERPRETATION-AND-USE-OF-TEST-DATA.pptx
 
A Brief Overview On Central Tendency.pptx
A Brief Overview On Central Tendency.pptxA Brief Overview On Central Tendency.pptx
A Brief Overview On Central Tendency.pptx
 
Mean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptxMean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptx
 
Mean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptxMean-Median-Mode-Range-Demonstration.pptx
Mean-Median-Mode-Range-Demonstration.pptx
 
central tendency.pptx
central tendency.pptxcentral tendency.pptx
central tendency.pptx
 
Mod mean quartile
Mod mean quartileMod mean quartile
Mod mean quartile
 
Stattistic ii - mode, median, mean
Stattistic ii - mode, median, meanStattistic ii - mode, median, mean
Stattistic ii - mode, median, mean
 
Statistics
StatisticsStatistics
Statistics
 

Último

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 

Último (20)

Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 

G7-quantitative

  • 1. INSTRUCTOR: DR. TUNG NGUYE GROUP 7 MEMEMBER: Ly Ngoc Tra An Ngo Huong Giang Tran Nhu Hanh Tran Thi My Hanh Nguyen Thi Hong Tham Nguyen Thi Thao Tien
  • 2. OUT LINE Data analysis  Central Tendency : Mean,Median,Mode  Spread of distribution : Range, Variance, Standard Deviation  Experimental : Paired T-Test Anova
  • 3. CENTRAL TENDENCY The term central tendency refers to the "middle" value or perhaps a typical value of the data, and is measured using the mean, median, or mode. Each of these measures is calculated differently, and the one that is best to use depends upon the situation. In statistics, the term central tendency relates to the way in which quantitative data tend to cluster around some value In the simplest cases, the measure of central tendency is an average of a set of measurements, the word average being variously construed as mean, median, or other measure of location, depending on the context. Both "central tendency" and "measure of central tendency" apply to either statistical populations or to samples from a population.
  • 4. MEASURES OF CENTRAL TENDENCY Arithmetic mean: (or simply, mean) – the sum of all measurements divided by the number of observations in the data set The mean is the most commonly-used measure of central tendency. When we talk about an "average", we usually are referring to the mean. The mean is simply the sum of the values divided by the total number of items in the set. The result is referred to as the arithmetic mean. Sometimes it is useful to give more weighting to certain data points, in which case the result is called the weighted arithmetic mean. The mean is valid only for interval data or ratio data. Since it uses the values of all of the data points in the population or sample, the mean is influenced by outliers that may be at the extremes of the data set.
  • 5. MEDIAN: THE MIDDLE VALUE THAT SEPARATES THE HIGHER HALF FROM THE LOWER HALF OF THEsetDATA highest values and taking the data The median is determined by sorting the data from lowest to SET point in the middle of the sequence. There is an equal number of points above and below the median. For example, in the data set {1,2,3,4,5} the median is 3; there are two data points greater than this value and two data points less than this value. In this case, the median is equal to the mean. But consider the data set {1,2,3,4,10}. In this dataset, the median still is three, but the mean is equal to 4. If there is an even number of data points in the set, then there is no single point at the middle and the median is calculated by taking the mean of the two middle points. The median can be determined for ordinal data as well as interval and ratio data. Unlike the mean, the median is not influenced by outliers at the extremes of the data set. For this reason, the median often is used when there are a few extreme values that could greatly influence the mean and distort what might be considered typical. This often is the case with home prices and with income data for a group of people, which often is very skewed. For such data, the median often is reported instead of the mean. For example, in a group of people, if the salary of one person is 10 times the mean, the mean salary of the group will be higher because of the unusually large salary. In this case, the median may better represent the typical salary level of the group.
  • 6. MODE (STATISTICS): THE MOST FREQUENT VALUE IN THE DATA SET The mode is the most frequently occurring value in the data set. For example, in the data set {1,2,3,4,4}, the mode is equal to 4. A data set can have more than a single mode, in which case it is multimodal. In the data set {1,1,2,3,3} there are two modes: 1 and 3. The mode can be very useful for dealing with categorical data. For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the most popular sandwich. The mode also can be used with ordinal, interval, and ratio data. However, in interval and ratio scales, the data may be spread thinly with no data points having the same value. In such cases, the mode may not exist or may not be very meaningful.
  • 7. WHEN TO USE MEAN, MEDIAN, AND MODE Measurement Scale Best Measure of the "Middle" Nominal Mode (Categorical) Ordinal Median Interval Symmetrical data: Mean Skewed data: Median Ratio Symmetrical data: Mean Skewed data: Median
  • 8. A RANGE, A VARIANCE, AND A STANDARD DEVIATION RANGE Range = The range indicates the distance between the two most extreme scores in a distribution >>> Range = highest score – lowest score
  • 9. VARIANCE AND STANDARD DEVIATION •The variance and standard deviation are two measures of variability that indicate how much the scores are spread out around the p mean • We use the mean as our reference point since it is at the center of the distribution
  • 10. Variance = how spread out (far away) a number is from the mean Standard Deviation = loosely defined as the average amount a number differs from the mean
  • 11. We will use the following sample data set to explain the range, variance, and standard deviation: 4, 6, 3, 7, 9, 4, 2, 1, 4, 2
  • 12. SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 Range: R = maximum score - minimum score In order to figure out the range, A) arrange your data set in order from lowest to highest and B) subtract the lowest number from the highest number. A) When arranged in order, 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 becomes: 1, 2, 2, 3, 4, 4, 4, 6, 7, 9 B) The lowest number is 1 and the highest number is 9. Therefore, R = 9-1 = 8
  • 13. SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 The Computational Formula: From the above formula: S2 = variance Σ = sigma = the sum of (add up all the numbers) X = the numbers from your data set X2 = the numbers from your data set squared N = the total number of numbers you have in your data set
  • 14. SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 The easiest way to compute variance with the A): C): computational formula is as follows: X X2 A) List each of the numbers in your data set 4 42=16 vertically & get the sum of that column B) Figure out n (count how many numbers you 6 62=36 have in your data set) 3 32=9 C) Square each number in your data set and get 7 72=49 the sum of that column 9 92=81 4 42=16 2 22=4 1 12=1 4 42=16 2 22=4 Σ=42 Σ=232 B): N=10
  • 15. SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 Now use the sum for part A) and C), as well as the value for N which you found in part B) to fill in the formula: Do the math and S2 = 5.56
  • 16. SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 The Conceptual Formula: From the above formula: S2 = variance Σ = sigma = the sum of (add up all the numbers) X = the numbers from your data set M = the mean N = the total number of numbers you have in your data set
  • 17. SAMPLE DATA : 4, 6, 3, 7, 9, 4, 2, 1, 4, 2 The easiest way to compute variance with the A): D): E): computational formula is as follows: X (X-M) (X-M)2 A) List each of the numbers in your data set 4 (4-4.2)= -0.2 (-0.2)2= 0.04 vertically & get the sum of that column B) Figure out n (count how many numbers you 6 (6-4.2)= 1.8 (1.8)2= 3.24 have in your data set) 3 (3-4.2)= -1.2 (-1.2)2= 1.44 C) Figure out M 7 (7-4.2)= 2.8 (2.8)2= 7.84 D) Subtract M from each number in your data 9 (9-4.2)= 4.8 (4.8)2= 23.04 set (Notice how the sum is zero) 4 (4-4.2)= -0.2 (-0.2)2= 0.04 E) Square the numbers you got for part D) and get 2 (2-4.2)= -2.2 (-2.2)2= 4.84 the sum of that column 1 (1-4.2)= -3.2 (-3.2)2= 10.24 4 (4-4.2)= -0.2 (-0.2)2= 0.04 2 (2-4.2)= -2.2 (-2.2)2= 4.84 Σ=42 Σ=0 Σ=55.6 B): N=10 C): M= 42/10=4.2
  • 18. Now use the sum for part E), as well as the value for N which you found in part B) to fill in the formula: Do the math and S2 = 5.56
  • 19. STANDARD DEVIATION: Standard deviation is simply the square root of the variance. Therefore, it does not matter if you use the computational formula or the conceptual formula to compute variance. For our sample data set, our variance came out to be 5.56, regardless of the formula used. The standard deviation for our data set then becomes: S = = 2.36
  • 20. INDEPENDENT SAMPLES • The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. • E.g: suppose we are evaluating the effect of a medical treatment, and we enroll 100 subjects into our study, then randomize 50 subjects to the treatment group and 50 subjects to the control group. In this case, we have two independent samples and would use the unpaired form of the t-test. The randomization is not essential here—if we contacted 100 people by phone and obtained each person's age and gender, and then used a two-sample t-test to see whether the mean ages differ by gender, this would also be an independent samples t-test, even though the data are observational.
  • 21. INDEPENDENT DATA ANALYSIS Calculations: a. Equal sample sizes, equal variance b. Unequal sample sizes, equal variance c. Unequal sample sizes, unequal variance
  • 22. A. EQUAL SAMPLE SIZES, EQUAL VARIANCE This test is only used when both: the two sample sizes (that is, the number, n, of participants of each group) are equal; it can be assumed that the two distributions have the same variance.
  • 23. B. UNEQUAL SAMPLE SIZES, EQUAL VARIANCE This test is used only when it can be assumed that the two distributions have the same variance.
  • 24. C. UNEQUAL SAMPLE SIZES, UNEQUAL VARIANCE This test, also known as Welch's t-test, is used only when the two population variances are assumed to be different (the two sample sizes may or may not be equal) and hence must be estimated separately.
  • 25. WORKED EXAMPLE • A study of the effect of caffeine on muscle metabolism used eighteen male volunteers who each underwent arm exercise tests. Nine of the men were randomly selected to take a capsule containing pure caffeine one hour before the test. The other men received a placebo capsule. During each exercise the subject's respiratory exchange ratio (RER) was measured. (RER is the ratio of CO2 produced to O2 consumed and is an indicator of whether energy is being obtained from carbohydrates or fats). • Question: whether, on average, caffeine changes RER. • Populations: ―men who have not taken caffeine‖ and ―men who have taken caffeine‖. (If caffeine has no effect on RER the two sets of data can be regarded as having come from the same population.)
  • 26. • The means show that, on average, caffeine appears to have altered RER from about 100.6% to 94.2%, a change of 6.4% •. However, there is a great deal of Placebo Caffeine variation between the data values in both samples and considerable overlap 105 96 between them. 119 99 • Is the difference between the two means 100 94 simply due sampling variation, or does the data provide evidence that caffeine does, 97 89 on average, reduce RER? >> p-value 96 96 answers this question. 101 93 •The t-test tests the null hypothesis that the mean of the caffeine treatment equals the 94 88 mean of the placebo versus the alternative 95 105 hypothesis that the mean of caffeine 98 88 treatment is not equal to the mean of the placebo treatment. Mean = 100.56 Mean = 94.22 •Computer output obtained for the RER SD = 7.70 SD = 5.61 data gives the sample means and the 95% confidence interval for the difference between the means.
  • 27. COMPUTER OUTPUT The p-value is 0.063 and, therefore, the difference between the two means is not statistically significantly different from zero at the 5% level of significance. There is an estimated change of 6.4% (SE = 3.17%). However, there is insufficient evidence (p = 0.063) to suggest that caffeine does change the mean RER.
  • 28. Alternative suggestion It could be argued, however, that the researcher might only be interested in whether 'caffeine reduces RER'. That is, the researcher is looking for a specific direction for the difference between the two population means. This is an example of a one-tailed t-test as opposed to a two-tailed t-test outlined above. SPSS only performs a 2-tailed test (the non-directional alternative hypothesis) and to obtain the p-value for the directional alternative hypothesis (one-tailed test) the p- value should be halved. Hence, in this example, p = 0.032. Report: The mean RER in the caffeine group (94.2 1.9) was significantly lower (t = 1.99, 16 df, one-tailed t-test, p = 0.032) than the mean of the placebo group (100.6 2.6). Note: It is important to decide whether a one- or two-tailed test is being carried-out, before analysis takes place. Otherwise it might be tempting to see what the p-value is before making your decision!
  • 29. A suitable null hypothesis in both cases is H0: On average, caffeine has no effect on RER, with an alternative (or experimental) hypothesis, H1: On average, caffeine changes RER (2- tail test), or H1: On average, caffeine reduces RER (1-tail case).
  • 30. 2. ONE SAMPLE T-TEST Compare the mean score of a sample to a known value. Usually, the known value is a population mean. Assumption: The dependent variable is normally distributed.
  • 31. In testing the null hypothesis that the population mean is equal to a specified value μ, use the statistic: : sample mean S: sample standard deviation n: sample size
  • 32. 2. PAIRED SAMPLES T-TEST What it does: compare the means of two variables compute the difference between the two variables for each case, and test to see if the average difference is significantly different from zero Assumption: Both variables should be normally distributed.
  • 33. Hypothesis: Null: There is no significant difference between the means of the two variables. Alternate: There is a significant difference between the means of the two variables.
  • 34.  Difference between a paired samples t-test and an independent samples t-test? Both tests are used to find significant differences between groups, but the independent samples t-test assumes the groups are not related to each other, while the dependent samples t-test or paired samples t-test assumes the groups are related to each other. A dependent samples t-test or paired samples t-test would be used to find differences within groups, while the independent samples t-test would be used to find differences between groups.
  • 35.  Independent variable and dependent variable:  The independent variable and the dependent variable is the same in both the dependent samples t-test and the independent samples t-test.  The variable of measure of the variable of interest is the dependent variable and the grouping variable is the independent variable.
  • 36. The most common use of the dependent samples t-test is in a pretreatment vs. posttreatment scenario where the researcher wants to test the effectiveness of a treatment. 1. The participants are tested pretreatment, to establish some kind of a baseline measure 2. The participants are then exposed to some kind of treatment 3. The participants are then tested posttreatment, for the purposes of comparison with the pretreatment scores
  • 37. For this equation, the differences between all pairs must be calculated. The pairs are either one person's pre-test and post-test scores or between pairs of persons matched into meaningful groups. The average and standard deviation of those differences are used in the equation. The degree of freedom used is n − 1.
  • 38. EXAMPLE: SPSS OUTPUT We compared the mean test scores before (pre-test) and after (post-test) the subjects completed a test preparation course. We want to see if our test preparation course improved people's score on the test
  • 39. The post-test mean scores are higher.
  • 40. There is a strong positive correlation. People who did well on the pre-test also did well on the post-test.
  • 41. Remember, this test is based on the difference between the two variables. Under "Paired Differences" we see the descriptive statistics for the difference between the two variables.
  • 42. The T value = -2.171 We have 11 degrees of freedom Our significance is .053
  • 43.
  • 44. If the significance value is less than .05, there is a significant difference. If the significance value is greater than. 05, there is no significant difference. Conclusion: There is no difference between pre- and post-test scores. Our test preparation course did not help!
  • 46. WHAT IS ANOVA? • ANOVA is an analysis of the variation present in an experiment. It is a test of the hypothesis that the variation in an experiment is no greater than that due to normal variation of individuals' characteristics and error in their measurement. • ANOVA, is a technique from statistical interference that allows us to deal with several populations
  • 47. TYPES OF ANOVA 1. One-way ANOVA 2. Two-way ANOVA
  • 48. ONE-WAY ANOVA DEFINITION • A One-way ANOVA is used when comparing two or more group means on a continuous dependent variable. In other words, one-way ANOVA techniques can be used to study the effect of k(>2) levels of a single factor. • The independent T-Test is a special case of the One-way ANOVA for situatiosn where there are only two group means
  • 49. MAJOR CONCEPTS: 1. CALCULATING SUMS OF SQUARES • The One-way ANOVA separates the total variance in the continuous dependent variable into two components: Variability between the groups and Variability within the groups • Variability between the groups is calculated by first obtaining the sums of squares between groups (SSb), or the sum of the square differences between each indibidual group mean from the grand mean • Variability within the groups is calculated by first obtaining the sums of squares within groups (SSw) or the sum of the squared differences beyween each individual score and that individual’s group mean.
  • 50. TYPES OF VARIABLES FOR ONE-WAY ANOVA • The IV (Independent Variable) is categorical. The categorical IV can be two groups or it can have more than two groups. • The DV (Dependent Variable) is continuous • Data are collected on both variables for each person in the study.
  • 51. EXAMPLES OF RESEARCH QUESTIONS FOR ONE-WAY ANOVA 1. Is there a significant difference in student attitudes toward the course between students who pass or fail a course? • Student attitude is continuous • Passing a course is categorical (pass/fail)  Because the IV has only 2 groups, we can use independent T-Test 2. Does student satisfaction significantly differ by location of institution (rural, urban, suburban)? • Student satisfaction is continuous • Institution location is categorical
  • 52. The linear model, conceptually, is: SSt = SSb + SSw SSt: total sums of squares SSb: sums of squares between groups SSw: sums of squares within groups
  • 53. ONE-WAY ANOVA AS A RATIO OF VARIANCES: Formula for variance: Numerator: a sum of squared values (or a sums of squares) Denominator: degrees of freedom
  • 54. • The ANOVA analyzes the ratio of the variance between groups the variance within the groups • In ANOVA, these variances, formerly known to us as , are referred as mean squares (MS). Mean squares are calculated by dividing each sum of squares by the degrees of freedom associated with it.
  • 55. Thus, a mean square between is simply the variance between groups obtained by a sums of squares divided by degrees of freedom • Likewise, a mean square within is simply the variance between groups obtained by a sums of squares divided by degrees of freedom
  • 56. FACTORS THAT AFFECT SIGNIFICANCE  F -ratio: the variation due to an experimental treatment or effect divided by the variation due to experimental error. The null hypothesis is this ratio equals 1.0, or the treatment effect is the same as the experimental error. This hypothesis is rejected if the F-ratio is significantly large enough that the possibility of it equaling 1.0 is smaller than some pre- assigned criteria such as 0.05 (one in twenty)  The MSb and the MSw are then divided to obtain the F ratio for hypothesis testing
  • 57. DISTRIBUTION OF F - RATIO • F distribution is positively skewed • If F statistic falls near 1.0, then most likely the null is true • If F statistic is large, expect null is false. Thus, signigicant F ratios will be in the tail of the F distribution
  • 58. P VALUE In statistical hypothesis testing, the p- value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α, which is often 0.05 or 0.01.
  • 59.
  • 60. t2= F • The larger the value of t, the more liley we are to find significant results • t is a special case of ANOVA when only two groups comprise the independent variable • We’re famimilar with the t distribution as normally distributed (for large df), with positive and negative values. The F statistics, on the other hand, is positively skewed, and is comprised of squared values. Thus, for any two group situation, t2= F
  • 61. CALCULATIONS • dfb = k-1(k: numbers of samples/ groups/ levels) • dfw = N- k (total of individuals in groups) • dfT = N -1 • MSb = SSb/ dfb • MSw = SSw/ dfw • F = MSb/ MSw
  • 62. STEPS IN ONE-WAY ANOVA STEP 1: STATE HYPOTHESES To determine if different levels of factor affect measured observations differently, the following hypotheses are tested. • There is no significant difference among groups in variable X • There is a significant difference between at least two of the groups in the variable X. In other words, at least one mean will significantly differ.
  • 63. STEP 2: SET THE CRITERION FOR REJECTING HO
  • 64. STEP 3: COMPUTE TEST STATISTIC
  • 65. STEP 4: COMPARE TEST STATISTIC TO CRITERION
  • 66. STEP 5: MAKE DECISION • Fail to reject the null hypothesis and conclude tha there is no significant different among the group F(dfb, dfw) = insert F statistic, p> insert α • Reject the null hypothesis and conclude that there is a significant difference among the grou F(dfb, dfw) = insert F statistic, p <insert α
  • 67. TWO-WAY ANOVA Difference between one-way and two-way ANOVA ANOVA Test
  • 68. ONE-WAY ANOVA • One-Way ANOVA has one independent variable (1 factor) with > 2 conditions – conditions = levels = treatments – e.g., for a brand of cola factor, the levels are: Coke, Pepsi, RC Cola • Independent variables = factors
  • 69. TWO-WAY ANOVA • Two-Way ANOVA has 2 independent variables (factors) – each can have multiple conditions Example • Two Independent Variables (IV’s) – IV1: Brand; and IV2: Calories – Three levels of Brand: • Coke, Pepsi, RC Cola - Two levels of Calories: • Regular, Diet
  • 70. WHEN TO USE • One-way ANOVA: you have more than two levels (conditions) of a single IV – EXAMPLE: studying effectiveness of three types of pain reliever aspirin vs. tylenol vs. ibuprofen • Two-way ANOVA: you have more than one IV (factor) – EXAMPLE: studying pain relief based on pain reliever and type of pain • Factor A: Pain reliever (aspirin vs. tylenol) • Factor B: type of pain (headache vs. back pain
  • 71. NOTATION Factor A Factor B. a : the number of categories of Factor A, b : the number of categories of Factor B. Total number of groups is ab.T The total number of observations N . The response/dependent variable value for each observation :Yijk , where i : the subject’s category for Factor A, and j : the subject’s category for Factor B. Then i and j together : a group, and k denotes which individual we’re talking about within this particular group. The number of observations in each group n and N = abn.
  • 72. How the number of hours of TV people watch per week depends on two variables: gender and age. Each person is classified according to gender (male, female) and age (18–24, 25–54,55+). There are six groups—one for each combination of gender and age. We randomly sample five people from each group, and each person reports the time, in hours, that he or she watches TV per week. The data is shown in
  • 73. Age 18– Age 25– Age 55+ 24 54 Male 20 23 33 27 21 33 20 23 39 22 28 33 28 28 37 Female 25 32 44 19 26 43 27 33 52 32 33 43 31 24 54
  • 74. TWO-WAY ANOVA TABLE 1. Sums of squares. 2. Degrees of freedom. 3. Mean squares.
  • 75. There are three main Whether TV viewing time questions that we might depends on age and ask in two-way ANOVA: gender. • Does the response variable The third question asks depend on Factor A? whether TV viewing time • Does the response variable depends on gender depend on Factor B? differently for people of different ages, or whether • Does the response variable TV viewing time depends depend on Factor A on age differ- ently for differently for different men than for women. values of Factor B, and vice versa? (For example, perhaps it’s true that women 55+ watch more TV than men 55+, but women 18–24 watch less TV than men 18–24.)
  • 76. 1.Sums of Squares Two-way ANOVA involves five different sums of squares: • The total sum of squares, SS Tot , measures the total variability in the response variable values. Its formula is • The Factor A sum of squares, SS A, measures the variability that can be explained by differences in Factor A. Its formula is
  • 77. _ Yij represents the sample mean of the group in category i of Factor A and category j of Factor B (always an average of n observations). _ Yi represents the sample mean of all the data in category i of Factor A combined (always an average of bn observations). _ Y j represents the sample mean of all the data in category j of Factor B combined (always an average of an observations). _ Y represents the overall sample mean of all the data from all groups combined (always an average of all abn = N observations).
  • 78. • The Factor B sum of squares, SS B , measures the variability that can be explained by differences in Factor B. Its formula is •The interaction sum of squares, SS AB , measures the variability that can be explained by interaction between the effects of Factors A and B. (We’ll talk more about what this means later.) Its formula is •The error sum of squares, SS E , measures the variability of the ob- servations around their group sample means. Its formula is
  • 79. •If we call the sample standard deviation within each group sij , then another formula for SS E is
  • 84. TWO-WAY ANOVA HYPOTHESIS TESTS • Does the response variable depend on Factor A? • Does the response Main effects variable depend on Factor B? • Does the response variable depend on Factor A differently for Interaction different values of Factor B, and vice versa?
  • 85. Interaction : We say that there is interaction if Y depends on Factor A differently for different values of Factor B, and vice versa. Similarly, we say that there is NO interaction if Y depends on Factor A in the same way for all values of Factor B, and vice versa.
  • 86. HYPOTHESES In the test for interaction, the null hypothesis (Ho) is that there is no interaction, while the alternative hypothesis (Ha) is that there is interaction.
  • 87. There is no interaction on the left. For each age group, women average watching five more hours of TV per week than men. For each gender, the middle age group averages watching six more hours of TV per week than the youngest age group, and the oldest age group averages watching nine more hours of TV per week than the middle age group. • There is interaction on the right. For each age group, women average watching more TV than men, but how much more varies for the different age groups. Also, for each gender, older people average watching more TV, but how much more varies by gender.
  • 88. ASSUMPTIONS The assumptions for the two-way ANOVA F test for interaction are exactly the same as those of the one- way ANOVA F test, with one additional re- quirement: the number of observations should be the the same for all groups.
  • 92. • If we believe there is interaction, then we don’t bother to ask whether the response depends on Factor A or Factor B separately—the fact that there is interaction means that the response depends on Factor A differently for different values of Factor B, and vice versa. So we stop here and do not perform the tests for main effects (which we’ll talk about in the next subsection). • If we believe it’s reasonable that there is no interaction, then that means we can look at the effects of Factor A and Factor B separately, so we proceed to the tests for main effects.